Nir Diamant is an AI researcher, educator, and author based in Israel. He is the founder of DiamantAI, author of the Amazon Bestseller 'RAG Made Simple' (ASIN B0D76734SZ, hit #1 in Generative AI at launch), and creator of four flagship open-source GenAI repositories with over 70,000 combined GitHub stars. His tutorials and writing reach 500,000+ developers every month.

DiamantAI is Nir Diamant's educational platform, providing 130+ free open-source GenAI tutorials on AI agents, RAG (Retrieval-Augmented Generation), prompt engineering, and production AI deployment. It includes a 25,000+ subscriber Substack newsletter, a 4,000+ member Discord community, and the 10,000+ member r/EducationalAI subreddit.

What is RAG Made Simple?

RAG Made Simple is Nir Diamant's book on Retrieval-Augmented Generation, published in April 2026. It covers 22 RAG techniques with intuition, side-by-side comparisons, and illustrations, expanding on his 27,000+ star RAG Techniques open-source repository. It hit #1 in Generative AI on Amazon in its first week and has sold 1,500+ copies with a 4.4-star average rating. Available on Kindle ($9.99), Paperback ($24.99), and Free with Kindle Unlimited. Kindle ASIN B0D76734SZ.

What topics do the tutorials cover?

The tutorials cover Generative AI, AI Agents, RAG (Retrieval-Augmented Generation) systems, Prompt Engineering, Large Language Models (LLMs), LangChain, LangGraph, Model Context Protocol (MCP), and practical AI development techniques including agentic workflows and multi-agent systems.

Are the GenAI tutorials free?

Yes, all 130+ GenAI tutorials by Nir Diamant are completely free and open-source, available on GitHub with runnable Jupyter notebooks and code files.

RAG (Retrieval-Augmented Generation) is a technique that enhances AI responses by retrieving relevant information from external knowledge sources before a language model generates an answer. This grounds model responses in factual data and reduces hallucinations. Nir Diamant's RAG Techniques repository and his book 'RAG Made Simple' cover 22 production RAG techniques in depth.

AI agents are autonomous systems that use language models to perceive inputs, reason about next steps, and take actions toward goals in a loop. Nir Diamant's 'GenAI Agents' (19,000+ stars) and 'Agents Towards Production' (17,000+ stars) repositories cover agent architectures, multi-agent systems, memory, tool use, and production deployment.

How can I sponsor DiamantAI?

DiamantAI offers sponsorship options including GitHub repository sponsorship, newsletter sponsorship (25,000+ subscribers), social media promotion, and webinar partnerships. Visit diamant-ai.com/sponsorship for rate cards and details.

What is Nir Diamant's newsletter about?

The DiamantAI Substack newsletter has 25,000+ subscribers and covers GenAI, AI agents, RAG systems, prompt engineering techniques, and practical AI development insights, usually with weekly deep-dive articles.

Does Nir Diamant offer AI advisory services?

Yes. Nir Diamant provides strategic AI advisory for companies building GenAI products, including GenAI strategy consultation, AI system architecture review, and implementation guidance. See diamant-ai.com/for-business for details.

Where can I find Nir Diamant's GitHub repositories?

All repositories are at github.com/NirDiamant. The four flagship repos are RAG_Techniques, Prompt_Engineering, GenAI_Agents, and agents-towards-production, with over 70,000 combined stars.

Controllable Agent for Complex RAG Tasks

**The full code tutorial is available here**

Introduction

Nowadays, with the rise of large language models, everyone wants to talk with their data and ask questions about it. As a result, Retrieval-Augmented Generation (RAG) has become very popular. The standard RAG pipeline consists of data ingestion and retrieval (with many techniques to optimize these steps for your specific problem and data), followed by feeding a user query with the retrieved information to the LLM to generate the response.

However, in some cases, both the data and the questions we want to ask are not trivial. These situations require a more sophisticated agent with reasoning capabilities to go through several steps to solve the question. In this article, I will show you how I tackled this problem, using the first book of Harry Potter as a use case.

Understanding RAG and Agents

We’ve talked a bit about what RAG is, but what are Agents in the field of LLMs?

LLM agents are advanced AI systems designed for creating complex text that needs sequential reasoning. They can think ahead, remember past conversations, and use different tools to adjust their responses based on the situation and style needed.

Limitations of Semantic Similarity in Retrieval

Traditional RAG systems often rely on semantic similarity for retrieval. This approach measures how close the meanings of two pieces of text are to each other, typically using vector representations and similarity scores. While effective for simple queries, it falls short for complex tasks that require multi-step reasoning or understanding of broader context. The semantic similarity might retrieve relevant individual chunks but struggles with questions needing information synthesis or logical inference across multiple sources.

The Challenge with Regular Agents

The challenge with regular agents lies in balancing the level of autonomy we grant them with the control we retain. An alternative approach is to construct our own workflow.

Issues with regular agents:

Lack of control over when and in what order tools are used.
No control over the conclusions drawn from tool usage.
Difficulty in tracing hallucinations or reliance on pre-trained knowledge.

Advantages of workflow engineering:

Enables the definition of a specific, structured path to address the problem.
Provides full control over each step in the process.
However, it requires a tailored solution, which can be time-consuming and complex to design as the problem becomes more challenging.

Our Mission: Creating a Controllable Agent for Complex RAG Tasks

Now that we understand RAG and Agents, let’s embark on our mission to create an agent capable of solving complex RAG tasks while maintaining control over its operations.

For this, we’ll utilize three types of vector stores:

Regular Vector Store: Contains book chunks for general context.
Chapter Summary Vector Store: Provides higher-level, granular information.
Quote Vector Store: Stores specific quotes from the book for detailed, high-resolution information.

A Naive Flow for Engineering an Agent to Validate a RAG Pipeline:

Context Retrieval: The process begins by retrieving context relevant to the given question.
Filtering: The retrieved context is filtered to retain only the most relevant content.
Question Answering: The agent attempts to answer the question using the refined context.
Evaluation: The answer is evaluated for relevance and potential hallucinations:
- If the answer is relevant and not a hallucination, the process ends successfully.
- If the answer is a hallucination but contains useful elements, the agent retrieves additional context.
- If the answer is irrelevant or unhelpful, the question is rewritten.
Iteration: The rewritten question is sent back to the context retrieval step, and the process repeats until a satisfactory answer is produced.

This could have been a nice solution, but it is not enough for complex questions.

Example: Complex Question Solving

Let’s consider an example of a complex question requiring reasoning:
“How did the protagonist defeat the villain’s assistant?”

To solve this question, the following steps are necessary:

Identify the Protagonist: Determine who the protagonist of the plot is.
Identify the Villain: Establish who the main antagonist is.
Identify the Villain’s Assistant: Determine who serves as the assistant to the villain.
Search for Relevant Interactions: Locate instances of confrontations or interactions between the protagonist and the villain’s assistant.
Deduce the Reason: Analyze the context to understand how and why the protagonist defeated the assistant.

Required Capabilities

Hence, the capabilities required for our solution are:

Tools: To facilitate retrieval and answering tasks effectively.
Reasoning: To deduce logical steps and derive meaningful conclusions.
Flow: To ensure a structured and coherent process throughout.
Control: To maintain oversight and adjust the solution dynamically as needed.
Verification: To validate the accuracy and grounding of the generated answers.
Stop Condition: To define when the process should terminate, ensuring efficiency.
Evaluation: To assess the solution’s relevance, reliability, and overall quality.

Implementation Components

The tools we may use in our case should include retrieval and answering. This involves breaking the previous graph into several subgraphs that will serve as tools for the new agent graph.

For reasoning and flow, we may need the following components:

Planner: Constructs a plan of steps needed to answer a given question and arrive at the final solution.
Step Breakdown Component: Breaks down the plan into steps for either retrieval or answering tasks.
Task Handler: Determines which tool to use at each step of the process.
Re-plan Component: Updates the plan dynamically based on progress, previous steps completed, and the information gathered at each stage.
Retrieval and Answer Tools: Designed as small agents themselves. These tools are monitored to ensure they clean retrieval information, verify grounding on the context, and perform hallucination checks.
(Optional) Question Anonymization Component: Generates a general plan without biases that could arise from prior knowledge of any LLM.

Stop Condition

How to Determine When the Process is Complete?

There are several possible strategies:

Answer Check at Re-plan Visits:
At each re-plan step, evaluate whether the question can already be answered using the aggregated information collected so far.
Saturation Threshold:
Continue collecting relevant data until the process reaches a point of saturation, where the amount of new, useful information falls below a predefined threshold.
Predefined Iteration Limit:
Limit the depth or recursion of the graph by setting a maximum number of iterations to avoid over-processing or unnecessary complexity.

The full agent logic

The Controllable Agent for Complex RAG Tasks

The agent follows a sophisticated, multi-step process to tackle complex queries with precision and control:

Question Anonymization:
The process starts by anonymizing the input question to minimize biases.
Plan Creation:
A planner constructs a general plan to answer the anonymized question.
De-anonymization:
The plan is de-anonymized to reintroduce specific context from the original question.
Task Breakdown:
The plan is divided into a series of retrieve or answer tasks.
Task Handling:
A task handler selects the appropriate tool for each task, such as:
- Retrieving book chunks
- Retrieving book quotes
- Retrieving summaries
- Answering the question directly
Information Retrieval:
For retrieval tasks, the system fetches relevant information and filters it to retain only the most pertinent content, ensuring it’s grounded in the original context.
Answer Attempt:
If the task is to answer the question, the system formulates a response based on the retrieved context.
Answer Verification:
The answer is evaluated for hallucinations and checked to ensure it is grounded in the provided context.
Replanning Phase:
If the question remains unanswered or the answer is unsatisfactory, the system enters a replanning phase to revise the approach.
Replan Evaluation:
During replanning, the system assesses whether the question can be answered with the current information or if further retrieval is necessary.
Final Answer Retrieval:
If the question can be answered, the system generates the final response.
Final Verification:
The final answer undergoes another round of checks for hallucinations and contextual grounding.
Process Completion:
If the final answer passes all verification steps, the process concludes successfully.

Conclusion
This iterative, multi-faceted approach enables the agent to handle complex queries effectively by breaking them into manageable tasks, retrieving relevant information, and continuously refining its methods until a well-grounded and satisfactory answer is achieved.

Evaluation

Since this is a RAG task, we can evaluate it using methods similar to those for other RAG tasks. For this evaluation, I chose a custom benchmark based on a QA bank, using the following metrics:

Answer Correctness: Evaluates whether the generated answer is factually accurate.
Faithfulness: Assesses how well the retrieved information supports the generated answer.
Answer Relevance: Measures how closely the generated answer addresses the question.
Answer Similarity: Quantifies the semantic similarity between the generated answer and the ground truth answer.

Conclusion

By implementing this controllable agent for complex RAG tasks, we achieve a balance between autonomy and control. This approach enables the generation of more accurate and traceable responses to sophisticated queries. It also opens new possibilities for interacting with and extracting insights from large bodies of text, such as novels or technical documentation.

Controllable Agent for Complex RAG Tasks

TL;DR

Key Takeaways

Introduction

Understanding RAG and Agents

Limitations of Semantic Similarity in Retrieval

The Challenge with Regular Agents

Our Mission: Creating a Controllable Agent for Complex RAG Tasks

A Naive Flow for Engineering an Agent to Validate a RAG Pipeline:

Example: Complex Question Solving

Required Capabilities

Implementation Components

Stop Condition

The full agent logic

Evaluation

Conclusion

A lecture I gave on this work:

If you found this article informative and valuable, and you want more:

Related Tutorials

Free Resources

Also available on Substack

Related Articles

Your First AI Agent: Simpler Than You Think

How to Choose Your AI Agent Framework

How to Stop AI Hallucinations

Get More AI Insights Weekly