Quick Reference
1. When to Use Each Retrieval Method¶
| Method | Use When |
|---|---|
| BM25 | Exact keyword matches matter; no training budget; queries contain IDs, codes, or rare terms; strong baseline |
| TF-IDF | Very simple baseline; interpretability is critical; no BM25 available |
| SPLADE | Want BM25 efficiency + semantic expansion; existing inverted-index infra; diverse query vocabulary |
| Dense bi-encoder | Natural language queries; paraphrase matching; fine-grained semantic similarity |
| Hybrid (dense + sparse) | Production systems; diverse query types; best overall recall and precision |
| Cross-encoder (reranker) | Second-stage precision boost over 50–100 candidates; latency is acceptable |
2. Chunking Strategy Decision Guide¶
| Document Type | Recommended Strategy |
|---|---|
| Uniform text (articles, reports) | Sentence-based or paragraph-based |
| Structured docs with headings (PDFs, wikis) | Recursive chunking |
| Long documents, multi-hop queries | Sliding window or recursive |
| High-precision, smaller corpus | Semantic/context-aware |
| Tables | Row-based with schema metadata |
| Source code | Function-level or class-level |
3. Chunk Size and Top-k Reference¶
| Chunk Size | Top-k (retrieval) | Top-m (after reranker) | Tradeoff |
|---|---|---|---|
| Small (100–300 tokens) | 20–50 | 5–10 | High recall, more context noise |
| Medium (300–700 tokens) | 10–20 | 5–10 | Balanced — good default |
| Large (700–1500 tokens) | 3–8 | 3–5 | High precision, risk of missing info |
4. Retrieval Failure Diagnostic Flow¶
Low answer quality?
│
├─ Check Recall@k
│ Is the relevant document in top-k?
│ NO → Fix the retriever:
│ - Better embedding model
│ - Add hybrid search (BM25 + dense)
│ - Add query expansion or HyDE
│ - Adjust chunk size
│ - Add metadata filters
│
├─ Check Context Precision
│ Are the retrieved documents actually relevant?
│ NO → Add cross-encoder reranker
│ Improve chunking to reduce noise
│
├─ Check Faithfulness
│ Is the answer grounded in retrieved text?
│ NO → Improve prompt (citation constraints)
│ Add explicit grounding instructions
│ Check for context window overflow
│
└─ Check Answer Correctness
Is the grounded answer actually right?
NO → Retrieved documents may be wrong, outdated, or insufficient
Update corpus; validate sources; improve coverage
5. Key Parameter Defaults¶
| Parameter | Default / Starting Point | Notes |
|---|---|---|
| Chunk size | 400–600 tokens | Tune based on query type and embedding model |
| Chunk overlap | 10–15% of chunk size | Mitigates boundary loss; increases index size |
| Top-k (first-stage retriever) | 50–100 | Higher for hybrid; lower in resource-constrained systems |
| Top-m (after reranker) | 5–10 | What the LLM actually sees |
| BM25 k₁ | 1.5 | Controls TF saturation speed |
| BM25 b | 0.75 | Controls length normalisation strength |
| RRF constant k | 60 | Robustly combines ranked lists; rarely needs tuning |
| HNSW M | 16–64 | Higher = better recall, more memory |
| HNSW ef_search | 100–200 | Higher = better recall, slower queries |
6. Architecture Decision Matrix¶
| System Size | Latency Req. | Update Freq. | Recommended Architecture |
|---|---|---|---|
| Small (<100k docs) | Any | Any | Flat index or HNSW + BM25 hybrid |
| Medium (100k–10M docs) | Low | Frequent | HNSW + sparse, with cross-encoder reranker |
| Large (>10M docs) | Low | Infrequent | IVF+PQ + sparse, with reranker |
| Very large (>1B docs) | Very low | Infrequent | IVF+PQ, distributed (Milvus), managed (Pinecone) |
7. Bi-Encoder vs. Cross-Encoder Quick Reference¶
| Bi-Encoder | Cross-Encoder | |
|---|---|---|
| Encoding | Query and doc separately | Query + doc concatenated together |
| Pre-computation | Doc embeddings stored at index time | Must run at query time per candidate |
| Scalability | Millions of docs | ~50–100 candidates |
| Recall / Precision | High recall | High precision |
| Speed | Very fast (ANN search) | Slow (full forward pass per pair) |
| Used in | First-stage retrieval | Reranking |
8. Evaluation Metric Summary¶
| Metric | Stage | What It Measures |
|---|---|---|
| Recall@k | Retrieval | Is relevant doc in top k? |
| MRR | Retrieval | How early is first relevant doc? |
| nDCG | Retrieval | Full ranked list quality (graded relevance) |
| Precision@k | Retrieval | What fraction of top-k is relevant? |
| Exact Match (EM) | Generation | Exact string match with reference |
| F1 (token) | Generation | Token overlap with reference |
| BERTScore | Generation | Semantic similarity to reference |
| Faithfulness | End-to-end | Claims supported by retrieved context? |
| Answer Relevance | End-to-end | Does answer address the query? |
| Context Precision | End-to-end | Are retrieved docs relevant? |
| Context Recall | End-to-end | Does context contain required info? |
9. The Production RAG Checklist¶
-
Define evaluation metrics before building (Recall@k, faithfulness, answer correctness).
-
Choose chunk size and strategy; evaluate retrieval recall before touching the LLM.
-
Start with hybrid retrieval (BM25 + dense) as the first-stage retriever.
-
Add a cross-encoder reranker; measure Precision@5 improvement.
-
Attach metadata to all chunks; implement filtered retrieval.
-
Implement faithfulness and groundedness checks before shipping.
-
Run the oracle experiment to confirm failures are in retrieval, not generation.
-
Log queries, retrieved context, and answers in production for continuous evaluation.
-
Re-index when changing embedding models.
-
Build a held-out evaluation set; validate LLM-as-a-judge scores against humans.
10. Top Interview Topics by Frequency¶
| Topic | Frequency | Key Point to Know |
|---|---|---|
| RAG vs. fine-tuning | Very high | RAG = knowledge access; fine-tuning = behaviour change |
| Bi-encoder vs. cross-encoder | Very high | Bi = fast/recall; cross = slow/precision; use both in sequence |
| Chunking strategy choice | High | Recursive is the production default; tune size + top-k jointly |
| BM25 vs. dense retrieval | High | Complementary; BM25 = exact match; dense = semantics |
| Hybrid retrieval + RRF | High | Best overall quality; RRF is robust without weight tuning |
| Faithfulness vs. correctness | High | Faithfulness = grounded in context; correctness = matches truth |
| HyDE | Medium | Embed hypothetical answer instead of query to bridge vocab gap |
| Lost in the middle | Medium | LLMs miss context in the middle; put best docs first/last |
| HNSW vs. IVF | Medium | HNSW = recall/speed; IVF = memory efficiency |
| RAGAS / LLM-as-judge | Medium | Standard eval framework; validate against human labels |