Lab 4: The Vector Database Pipeline

Step through the full indexing pipeline — upload, chunk, embed, index, then search.

📄

Upload & Extract

PDF → Plain Text

✂️

Chunking

Long text → Pieces

🔢

Embedding

Each chunk → Vector

🗄️

Indexing

Store in vector DB

🔍

Find matching chunks

📄 Step 1: Upload a PDF or use sample

We'll extract the raw text from the PDF to begin the pipeline.

Upload your own PDF

Takeaway: The pipeline — Load → Chunk → Embed → Index — is what prepares your document for intelligent search. Each chunk becomes a point in vector space. Searching means finding the nearest points to your question. Lab 5 will use this index to answer questions with the full RAG pipeline.