Retrieving Memory
Zep provides three methods for retrieving memory from a User Graph, each offering different levels of control and customization.
Choosing a retrieval method
Zep’s Context Block
Zep’s Context Block is an optimized, automatically assembled string that you can directly provide as context to your agent. The Context Block combines semantic search, full text search, and breadth first search to return context that is highly relevant to the user’s current conversation slice, utilizing the past two messages.
The Context Block is returned by the thread.get_user_context() method. This method uses the latest messages of the given thread to search the (entire) User Graph and then returns the search results in the form of the Context Block.
Note that although thread.get_user_context() only requires a thread ID, it is able to return memory derived from any thread of that user. The thread is just used to determine what’s relevant.
The Context Block provides low latency (P95 < 200ms) while preserving detailed information from the user’s graph.
Deprecated: mode parameter and summarized context
The mode parameter is no longer supported. Previously, Zep offered a “summarized” context mode that used an LLM to condense context into a shorter format. However, we were unable to achieve the low latency required for real-time agent interactions with this approach. The Context Block now returns a user summary and structured facts in a detailed format optimized for both performance and information preservation.
Retrieving the Context Block
Context Block Format
The Context Block returns a user summary along with relevant facts in a structured format:
Getting the Context Block Sooner
You can get the Context Block sooner by passing in the return_context=True flag to the thread.add_messages() method. Read more about this in our performance guide.
Custom Context Templates
You can customize the format of the Context Block by using context templates. Templates allow you to define how memory data is structured and presented while keeping Zep’s automatic relevance detection.
To use a template, pass the template_id parameter when retrieving context:
See the Context Templates guide to learn how to create and manage templates.
Advanced Context Block Construction
For maximum control over memory retrieval, see our Advanced Context Block Construction cookbook. This approach lets you directly search the graph and assemble results with complete control over search queries, parameters, and formatting.
Using Memory
Provide the Context Block in Your System Prompt
Once you’ve retrieved the Context Block, used a custom context template, or constructed your own context block, you can include this string in your system prompt:
Provide the Last 4 to 6 Messages of the Thread
You should also include the last 4 to 6 messages of the thread when calling your LLM provider. Because Zep’s ingestion can take a few minutes, the context block may not include information from the last few messages; and so the context block acts as the “long-term memory,” and the last few messages serve as the raw, short-term memory.