Summary Memory
Compress older turns with an LLM-generated rolling summary: trade word-for-word recall for unbounded conversation length. In the previous notebook we saw that Sliding Window Memory bounds cost by discarding old messages entirely. That hard cutoff means the agent doesn't even *know* it forgot something. Summary Memory takes a different approach. Instead of discarding history, it *compresses* it. Think of it like reading a long book and writing a one-page summary at the end of each chapter. You can't quote the book word-for-word anymore, but you still know what happened. A secondary LLM call periodically condenses older messages into a running textual summary. The agent loses exact wording but retains the gist (key facts, decisions, and context) across arbitrarily long conversations. The catch: summaries are lossy (they can't perfectly reconstruct the original). Each compression cycle can lose details, shift emphasis, or subtly distort facts. Over many cycles this summary drift compounds. The agent's "memory" can diverge from what actually happened. By the end of this notebook you'l…
About this tutorial
This hands-on Jupyter notebook is part of Agent Memory Techniques, a free open-source repository by Nir Diamant covering agent memory techniques with runnable code examples and detailed explanations.
