Skip to content
All terms
AI

RAG (Retrieval-Augmented Generation)

RAG is the architectural pattern for grounding LLM responses in your own data. The pipeline has three stages:

  1. Ingest: split documents into chunks, embed each chunk into a vector, store.
  2. Retrieve: embed the user's query; find the top-K most similar chunks.
  3. Generate: pass the retrieved chunks to an LLM as context; produce the answer.

Why RAG: LLMs don't know your private data, and they hallucinate when they have to guess. Retrieval grounds the model in actual, attributable content; the model's job becomes "summarise and cite", not "remember the answer".

2026 best practice combines vector similarity with lexical (full-text) search via reciprocal rank fusion, since neither signal alone catches every relevant chunk.

Read further