Lab 4: The Vector Database Pipeline

Step through the full indexing pipeline — upload, chunk, embed, index, then search.

📄
Upload & Extract
PDF → Plain Text
✂️
Chunking
Long text → Pieces
🔢
Embedding
Each chunk → Vector
🗄️
Indexing
Store in vector DB
🔍
Search
Find matching chunks

📄 Step 1: Upload a PDF or use sample

We'll extract the raw text from the PDF to begin the pipeline.

Takeaway: The pipeline — Load → Chunk → Embed → Index — is what prepares your document for intelligent search. Each chunk becomes a point in vector space. Searching means finding the nearest points to your question. Lab 5 will use this index to answer questions with the full RAG pipeline.