Lab 4: The Vector Database Pipeline
Step through the full indexing pipeline — upload, chunk, embed, index, then search.
📄
Upload & Extract
PDF → Plain Text
✂️
Chunking
Long text → Pieces
🔢
Embedding
Each chunk → Vector
🗄️
Indexing
Store in vector DB
🔍
Search
Find matching chunks
📄 Step 1: Upload a PDF or use sample
We'll extract the raw text from the PDF to begin the pipeline.
Takeaway: The pipeline — Load → Chunk → Embed → Index — is what prepares your document for intelligent search. Each chunk becomes a point in vector space. Searching means finding the nearest points to your question. Lab 5 will use this index to answer questions with the full RAG pipeline.