Skip to content
    AI engineering roles via the DiamantAI Collective.See open roles
    EvaluationRAG Techniques

    Open-RAG-Eval

    Open-RAG-Eval is an open-source framework that lets teams evaluate RAG systems without needing predefined (golden) answers, making it faster and easier to compare solutions or configurations. With automated, research-backed metrics like UMBRELA and Hallucination, it brings transparency and rigor to RAG performance testing at any scale. This notebook provides a simple example of "how to use" Open-RAG-Eval. You can read more about the overall vision and metrics in this blog post Specifically, we run the evaluation framework on the fiqa dataset from the BEIR benchmark, which contains question and answer paris for the financial domain. In a typical RAG evaluation flow - one would need to create a RAG pipeline, run the queries againat that RAG pipeline adn collect the chunks and responses. Since this type of flow doesn't fit well in a notebook, we already pre-indexed the dataset in Vectara and ran a subset of queries (from the same fiqa dataset) against the indexed data, and stored the outputs in the fiqa_output.csv file, which contains the queries, retrieved results and LLM generated…

    About this tutorial

    This hands-on Jupyter notebook is part of RAG Techniques, a free open-source repository by Nir Diamant covering rag techniques with runnable code examples and detailed explanations.

    Free and open-sourceRunnable Jupyter notebookActive community support
    Go deeper · Amazon Bestseller in Generative AI

    RAG Made Simple

    The book that extends this repo: 22 RAG techniques with the intuition behind each, side-by-side comparisons of when each wins (and quietly fails), and original illustrations.

    Get it on Amazon

    ⭐ 4.4 stars · 1,500+ readers · Kindle $9.99 · Paperback $24.99 · Free with Kindle Unlimited

    More Evaluation tutorials

    More from RAG Techniques