Dartboard Retrieval
The Dartboard RAG process addresses a common challenge in large knowledge bases: ensuring the retrieved information is both relevant and non-redundant. By explicitly optimizing a combined relevance-diversity scoring function, it prevents multiple top-k documents from offering the same information. This approach is drawn from the elegant method in thepaper: *Better RAG using Relevant Information Gain* The paper outlines three variations of the core idea—hybrid RAG (dense + sparse), a cross-encoder version, and a vanilla approach. The vanilla approach conveys the fundamental concept most directly, and this implementation extends it with optional weights to control the balance between relevance and diversity.
1. Dense, Overlapping Knowledge Bases In large databases, documents may repeat similar content, causing redundancy in top-k retrieval. 2. Improved Information Coverage Combining relevance and diversity yields a richer set of documents, mitigating the “echo chamber” effect of overly similar content.
What you'll learn
- 1Relevance & Diversity Combination
- 2Computes a score factoring in both how pertinent a document is to the query and how distinct it is from already chosen documents.
- 3Weighted Balancing
- 4Introduces RELEVANCE_WEIGHT and DIVERSITY_WEIGHT to allow dynamic control of scoring.
- 5Helps in avoiding overly diverse but less relevant results.
- 6Production-Ready Code
About this tutorial
This hands-on Jupyter notebook is part of RAG Techniques, a free open-source repository by Nir Diamant covering rag techniques with runnable code examples and detailed explanations.
RAG Made Simple
The book that extends this repo: 22 RAG techniques with the intuition behind each, side-by-side comparisons of when each wins (and quietly fails), and original illustrations.
Get it on Amazon⭐ 4.4 stars · 1,500+ readers · Kindle $9.99 · Paperback $24.99 · Free with Kindle Unlimited
