RAG System with Guardrails
Retrieval-Augmented Generation (RAG) system with guardrails features using FastAPI, hybrid retrieval (BM25 + FAISS), and lightweight guardrails.
- Problem: Improve answer quality and trustworthiness by grounding responses in retrieved sources.
- Approach: Hybrid retrieval (BM25 + FAISS) + reranking (RRF/MMR) + citation-based answering.
- Outcome: A FastAPI RAG service with input/output filters and a path toward self-correction (WIP).
The system supports hybrid retrieval (BM25 + FAISS) with Reciprocal Rank Fusion (RRF) and Maximal Marginal Relevance (MMR), answer generation with citations, and a self-correction mechanism (WIP).
The key components and techniques used in this project include:
-
Service boundary: FastAPI endpoints
POST /retrieveandPOST /answerwith Pydantic request/response models for validation and serialization. - Ingestion & indexing: Wikivoyage ingestion with heading-aware chunking; SentenceTransformer embeddings; FAISS index for dense similarity search.
- Hybrid retrieval: BM25 (lexical) + FAISS (dense) combined via Reciprocal Rank Fusion (RRF), with MMR to diversify context and reduce redundancy.
- Answer generation with citations: LLM generation using retrieved passages; citations returned alongside answers for auditability.
- Guardrails: regex-based input/output filters to reduce unsafe or out-of-scope content (baseline approach).
Evaluation
- Answer quality: verified citations point to supporting passages; tracked common failure modes (missing citation, irrelevant retrieval, hallucinated detail).
Engineering Notes
- Chunking matters: heading-aware chunking improved retrieval relevance compared to naive fixed-size splits.
- Hybrid > single retriever: BM25 helped with exact terms; dense retrieval helped with paraphrases—fusion reduced misses.
- MMR tradeoff: diversification helped reduce repeated context, but required tuning to avoid losing critical passages.
- Guardrails scope: regex filters are lightweight and fast; more robust defenses (prompt-injection tests, policy-based filtering) are future work.