Skip to content
    Agent MemoryAgent Memory Techniques

    Summary Memory

    Compress older turns with an LLM-generated rolling summary: trade word-for-word recall for unbounded conversation length. In the previous notebook we saw that Sliding Window Memory bounds cost by discarding old messages entirely. That hard cutoff means the agent doesn't even *know* it forgot something. Summary Memory takes a different approach. Instead of discarding history, it *compresses* it. Think of it like reading a long book and writing a one-page summary at the end of each chapter. You can't quote the book word-for-word anymore, but you still know what happened. A secondary LLM call periodically condenses older messages into a running textual summary. The agent loses exact wording but retains the gist (key facts, decisions, and context) across arbitrarily long conversations. The catch: summaries are lossy (they can't perfectly reconstruct the original). Each compression cycle can lose details, shift emphasis, or subtly distort facts. Over many cycles this summary drift compounds. The agent's "memory" can diverge from what actually happened. By the end of this notebook you'l…

    About this tutorial

    This hands-on Jupyter notebook is part of Agent Memory Techniques, a free open-source repository by Nir Diamant covering agent memory techniques with runnable code examples and detailed explanations.

    Free and open-sourceRunnable Jupyter notebookActive community support
    Go deeper · Amazon Bestseller in Generative AI

    RAG Made Simple

    Nir Diamant's complete visual guide to Retrieval-Augmented Generation — essential for any GenAI engineer building systems that retrieve and ground responses on real data.

    Get it on Amazon

    ⭐ 4.4 stars · 1,500+ readers · Kindle $9.99 · Paperback $24.99 · Free with Kindle Unlimited

    More Agent Memory tutorials