Term

Retrieval

The lookup stage of a RAG pipeline — fetching relevant text chunks from a corpus, given a query embedding. Quality of retrieval is usually the bottleneck on RAG quality, not the LLM itself.

Background

Retrieval is the lookup stage of a retrieval-augmented generation (RAG) pipeline: given a user query, it finds and returns the passages of text most likely to help answer that query, so they can be inserted into the model's prompt as grounding context. It exists to close the gap between a model's fixed, stale training weights and the specific, current, or private information an application needs. Mechanically, a corpus is first split into chunks and indexed. The dominant approach is dense retrieval: an embedding model maps each chunk to a vector, the vectors are stored in a vector index, and at query time the query is embedded and the index returns the chunks whose vectors are nearest (typically by cosine similarity), often via approximate nearest-neighbour search for speed. Sparse or lexical retrieval (such as BM25) instead matches on term overlap, and hybrid retrieval blends both; a re-ranking step frequently reorders the top candidates with a more expensive, more accurate model. A concrete example: for the query "how do I rotate my API key," retrieval returns the three most relevant paragraphs from your product docs, which are then handed to the generator. Retrieval matters because it is usually the quality bottleneck of a RAG system — if the right passage is never fetched, no amount of prompting or model capability will recover the correct answer, so chunking strategy, embedding choice, and top-k tuning have outsized impact.

Background

Tools that use it