Skip to main content
This isn’t a fake “upload a PDF” button. It’s a real document-chat pipeline. Users upload documents, the app indexes them, and they get answers with citations. Here’s how it all works.

What Is RAG?

RAG stands for Retrieval-Augmented Generation. In plain English:
  1. Your app stores source material (like PDFs)
  2. When a user asks a question, the app finds the most relevant pieces
  3. Those pieces get passed into the model as context
  4. The model answers using that context
This is how the chat app answers questions about your documents — not just what the model already knew.

The Pipeline, Step by Step

1

Upload

The user uploads a PDF in chat. The file is stored and tracked in the database.
2

Text extraction

The repo extracts readable text from the PDF so the AI system has something to work with.
3

Chunking

The text is split into smaller sections. Models and vector search work much better on focused chunks than one giant blob.
4

Embedding

Each chunk is turned into a vector embedding using OpenAI’s embedding model. Think of it as converting meaning into numbers.
5

Storage

Those embeddings are stored in PostgreSQL in the embeddings table, alongside document metadata.
6

Retrieval

When the user asks a question, the question is embedded too. The app runs similarity search to find the closest matching chunks.
7

Prompt injection

The best chunks are assembled into prompt-ready context and passed into the chat generation flow.
8

Citations

The answer includes source metadata so the UI can show citations back to the user. No more “trust me, bro.”

The Vector Database

The vector database is what makes semantic search possible. Instead of just storing raw text, the app stores:
  • The original chunk of text
  • Metadata (document ID, page number, etc.)
  • A vector embedding for that chunk
That embedding is a numeric representation of meaning. It lets the app find chunks that are conceptually similar to a question, even when the wording is completely different.This is all implemented with PostgreSQL + pgvector + OpenAI embeddings.
pgvector is a Postgres extension enabled at the database level. Your host must support CREATE EXTENSION vector for this to work.

What You Can Build With This

This setup is perfect for:
  • PDF question answering
  • Internal knowledge assistants
  • Customer support bots grounded in your docs
  • Contract or policy lookup
  • Document-aware research workflows
It turns the chat app from “just another assistant” into something that works with user-provided knowledge.

Common Mistakes

The upload and indexing are separate steps. Make sure the indexing pipeline completes before expecting RAG answers.
You need object storage configured so the app can store the uploaded files. See the Storage guide.
The embedding pipeline uses OpenAI regardless of which chat model the user picks. Set your OPENAI_API_KEY.
Not every provider has pgvector enabled. Supabase supports it out of the box. Check your host’s docs if you’re using something else.
RAG improves grounded answers, but it doesn’t replace good prompts, reasonable chunking, or careful product design. It gives the model better context — it doesn’t magically make every answer perfect.