RAG: Reliability and Specificity for LLMs

28 Jul, 2025

After a day of diving into resources, I think I've got a handle on what Retrieval-Augmented Generation (RAG) actually is. Spoiler alert: it's exactly what it sounds like, but with a few moving parts.

Why RAG ?

RAG is essentially a way to make large language models (LLMs) give better, more accurate answers by letting them look up trusted documents before responding. LLMs are powerful and trained on huge amounts of data, but they can still confidently hallucinate facts. RAG steps in as the reality check, pulling info from trusted sources so the model doesn't embarrass itself. I'm thinking of it as giving a forgetful genius Google access right before they start confidently making things up.

I can understand why this is useful, as it makes RAG particularly valuable for chat-bots or agents that need to work with specific topics or a company's internal data — without the expense and complexity of retraining the entire model.

How It Actually Works ?

The setup is surprisingly straightforward in concept: instead of an LLM answering questions based purely on its training data, I hook it up with specific documents (knowledge source). The system indexes these documents (indexing) so that when I have a query, it retrieves relevant information (retrieval) and feeds that context to the LLM for a more grounded response (generation).

The pipeline can be broken down into three stages:

Knowledge Source → Indexing → Retrieval → Generation

Stage	Description
Knowledge Source	Documents for a RAG system to work with
Indexing	Process and store documents in a searchable format (embedding, keywords, graph nodes)
Retrieval	Find relevant content based on my queries
Generation	Use an LLM to produce an answer using retrieved content

Simple enough, right? Well, if only things were that straightforward.

The RAG Variants

Apparently, there are multiple ways to implement a RAG system, each with its own approach to the pipeline stages. Here's what I discovered:

RAG Type	What Changes	Indexing Strategy	Retrieval Logic	Generation Context	Ideal For
Simple RAG	Minimalist	Flat vector DB (e.g., FAISS)	Retrieve Top-k based on dense similarity	Append all retrieved chunks in prompt	Fast prototyping, basic QA
Graph RAG	Structured retrieval	Graph/Tree/Hierarchical indexes	Navigate through nodes or related concepts	Selectively combine context from graph	Complex documents, reasoning tasks
Hybrid RAG	Enhanced retrieval	Dense + sparse (BM25 + vectors)	Combine dense and keyword-based results	Similar to Simple RAG, but broader context	High recall, improved coverage
Modular RAG	Architectural flexibility	Any combination	Pluggable—user defines strategy	Depends on orchestrated logic	Production systems, experimentation

The Plan

Why complicate things right out of the gate when I can complicate them gradually like a sensible person? I'm going to start with Simple RAG and work my way up from there. What could go wrong?

References:

#rag