RAG System with Guardrails

Retrieval-Augmented Generation (RAG) system with guardrails features using FastAPI, hybrid retrieval (BM25 + FAISS), and lightweight guardrails.

GitHub · Back to Projects


  • Problem: Improve answer quality and trustworthiness by grounding responses in retrieved sources.
  • Approach: Hybrid retrieval (BM25 + FAISS) + reranking (RRF/MMR) + citation-based answering.
  • Outcome: A FastAPI RAG service with input/output filters and a path toward self-correction (WIP).

The system supports hybrid retrieval (BM25 + FAISS) with Reciprocal Rank Fusion (RRF) and Maximal Marginal Relevance (MMR), answer generation with citations, and a self-correction mechanism (WIP).

The key components and techniques used in this project include:

  • Service boundary: FastAPI endpoints POST /retrieve and POST /answer with Pydantic request/response models for validation and serialization.
  • Ingestion & indexing: Wikivoyage ingestion with heading-aware chunking; SentenceTransformer embeddings; FAISS index for dense similarity search.
  • Hybrid retrieval: BM25 (lexical) + FAISS (dense) combined via Reciprocal Rank Fusion (RRF), with MMR to diversify context and reduce redundancy.
  • Answer generation with citations: LLM generation using retrieved passages; citations returned alongside answers for auditability.
  • Guardrails: regex-based input/output filters to reduce unsafe or out-of-scope content (baseline approach).

Evaluation

  • Answer quality: verified citations point to supporting passages; tracked common failure modes (missing citation, irrelevant retrieval, hallucinated detail).

Engineering Notes

  • Chunking matters: heading-aware chunking improved retrieval relevance compared to naive fixed-size splits.
  • Hybrid > single retriever: BM25 helped with exact terms; dense retrieval helped with paraphrases—fusion reduced misses.
  • MMR tradeoff: diversification helped reduce repeated context, but required tuning to avoid losing critical passages.
  • Guardrails scope: regex filters are lightweight and fast; more robust defenses (prompt-injection tests, policy-based filtering) are future work.