Skip to content
    Agent MemoryAgent Memory Techniques

    Memory Benchmarks (LoCoMo)

    You can't improve what you can't measure. LoCoMo and LongMemEval provide standardized test suites that reveal exactly where your memory system fails: single-hop recall, multi-hop reasoning, temporal ordering, or open-ended generation.

    Imagine you hire a new assistant and give them a quiz after their first week. One section tests whether they remember your coffee order (direct recall). Another checks if they can connect two separate conversations to plan your schedule (multi-hop reasoning). A third tests whether they know you switched from tea to coffee last Wednesday (temporal awareness). Your overall score is useful, but the per-section breakdown tells you exactly what to train. LoCoMo (Long-Context Conversational Memory) and LongMemEval are standardized benchmarks that work the same way for AI memory systems. They provide multi-session conversations paired with ground-truth questions. Each question targets a specific memory capability. Running your system against these benchmarks produces per-category scores that turn "memory quality" from a vague claim into a concrete number y…

    About this tutorial

    This hands-on Jupyter notebook is part of Agent Memory Techniques, a free open-source repository by Nir Diamant covering agent memory techniques with runnable code examples and detailed explanations.

    Free and open-sourceRunnable Jupyter notebookActive community support
    Go deeper · Amazon Bestseller in Generative AI

    RAG Made Simple

    Nir Diamant's complete visual guide to Retrieval-Augmented Generation — essential for any GenAI engineer building systems that retrieve and ground responses on real data.

    Get it on Amazon

    ⭐ 4.4 stars · 1,500+ readers · Kindle $9.99 · Paperback $24.99 · Free with Kindle Unlimited

    More Agent Memory tutorials