Skip to content
    AI engineering roles via the DiamantAI Collective.See open roles
    Agent MemoryAgent Memory Techniques

    Memory Benchmarks (LoCoMo)

    You can't improve what you can't measure. LoCoMo and LongMemEval provide standardized test suites that reveal exactly where your memory system fails: single-hop recall, multi-hop reasoning, temporal ordering, or open-ended generation.

    Imagine you hire a new assistant and give them a quiz after their first week. One section tests whether they remember your coffee order (direct recall). Another checks if they can connect two separate conversations to plan your schedule (multi-hop reasoning). A third tests whether they know you switched from tea to coffee last Wednesday (temporal awareness). Your overall score is useful, but the per-section breakdown tells you exactly what to train. LoCoMo (Long-Context Conversational Memory) and LongMemEval are standardized benchmarks that work the same way for AI memory systems. They provide multi-session conversations paired with ground-truth questions. Each question targets a specific memory capability. Running your system against these benchmarks produces per-category scores that turn "memory quality" from a vague claim into a concrete number y…

    About this tutorial

    This hands-on Jupyter notebook is part of Agent Memory Techniques, a free open-source repository by Nir Diamant covering agent memory techniques with runnable code examples and detailed explanations.

    Free and open-sourceRunnable Jupyter notebookActive community support

    More Agent Memory tutorials