Retrieving Memory

Learn how to retrieve relevant context from a User Graph.

There are two ways to retrieve memory from a User Graph: using Zep’s Context Block or searching the graph.

Zep’s Context Block

Zep’s Context Block is an optimized, automatically assembled string that you can directly provide as context to your agent. The Context Block combines semantic search, full text search, and breadth first search to return context that is highly relevant to the user’s current conversation slice, utilizing the past two messages.

The Context Block is returned by the thread.get_user_context() method. This method uses the latest messages of the given thread to search the (entire) User Graph and then returns the search results in the form of the Context Block.

Note that although thread.get_user_context() only requires a thread ID, it is able to return memory derived from any thread of that user. The thread is just used to determine what’s relevant.

The Context Block provides low latency (P95 < 200ms) while preserving detailed information from the user’s graph.

Deprecated: mode parameter and summarized context

The mode parameter is no longer supported. Previously, Zep offered a “summarized” context mode that used an LLM to condense context into a shorter format. However, we were unable to achieve the low latency required for real-time agent interactions with this approach. The Context Block now returns a user summary and structured facts in a detailed format optimized for both performance and information preservation.

Retrieving the Context Block

1# Get memory for the thread
2memory = client.thread.get_user_context(thread_id=thread_id)
3
4# Access the context block (for use in prompts)
5context_block = memory.context
6print(context_block)

Context Block Format

The Context Block returns a user summary along with relevant facts in a structured format:

# This is the user summary
<USER_SUMMARY>
Emily Painter is a user with account ID Emily0e62 who uses digital art tools for creative work. She maintains an active account with the service, though has recently experienced technical issues with the Magic Pen Tool. Emily values reliable payment processing and seeks prompt resolution for account-related issues. She expects clear communication and efficient support when troubleshooting technical problems.
</USER_SUMMARY>
# These are the most relevant facts and their valid date ranges
# format: FACT (Date range: from - to)
<FACTS>
- Emily is experiencing issues with logging in. (2024-11-14 02:13:19+00:00 - present)
- User account Emily0e62 has a suspended status due to payment failure. (2024-11-14 02:03:58+00:00 - present)
- user has the id of Emily0e62 (2024-11-14 02:03:54 - present)
- The failed transaction used a card with last four digits 1234. (2024-09-15 00:00:00+00:00 - present)
- The reason for the transaction failure was 'Card expired'. (2024-09-15 00:00:00+00:00 - present)
- user has the name of Emily Painter (2024-11-14 02:03:54 - present)
- Account Emily0e62 made a failed transaction of 99.99. (2024-07-30 00:00:00+00:00 - 2024-08-30 00:00:00+00:00)
</FACTS>

Getting the Context Block Sooner

You can get the Context Block sooner by passing in the return_context=True flag to the thread.add_messages() method. Read more about this in our performance guide.

Searching the Graph

You can also directly search a User Graph using our highly customizable graph.search method and construct a custom context block. Read more about this in our Searching the Graph guide.

Using Memory

Provide the Context Block in Your System Prompt

Once you’ve retrieved the Context Block, or constructed your own context block by searching the graph, you can include this string in your system prompt:

MessageTypeContent
SystemYour system prompt

{Zep context block}
AssistantAn assistant message stored in Zep
UserA user message stored in Zep
UserThe latest user message

Provide the Last 4 to 6 Messages of the Thread

You should also include the last 4 to 6 messages of the thread when calling your LLM provider. Because Zep’s ingestion can take a few minutes, the context block may not include information from the last few messages; and so the context block acts as the “long-term memory,” and the last few messages serve as the raw, short-term memory.