Retrieving Memory | Zep Documentation

Zep provides three methods for retrieving memory from a User Graph, each offering different levels of control and customization.

Choosing a retrieval method

Method	Query Control	Format Control	Graph Types	Best For
Zep’s Context Block	Automatic (last 2 messages)	Fixed	User graphs only	Most use cases - automatic relevance with optimized format
Custom Context Templates	Automatic (last 2 messages)	Custom	User graphs only	Consistent custom formatting across threads/users
Advanced Context Block Construction	Full control	Full control	User graphs or standalone graphs	Maximum flexibility - custom queries and formats

Zep’s Context Block

Zep’s Context Block is an optimized, automatically assembled string that you can directly provide as context to your agent. The Context Block combines semantic search, full text search, and breadth first search to return context that is highly relevant to the user’s current conversation slice, utilizing the past two messages.

The Context Block is returned by the thread.get_user_context() method. This method uses the latest messages of the given thread to search the (entire) User Graph and then returns the search results in the form of the Context Block.

Note that although thread.get_user_context() only requires a thread ID, it is able to return memory derived from any thread of that user. The thread is just used to determine what’s relevant.

The Context Block provides low latency (P95 < 200ms) while preserving detailed information from the user’s graph.

Deprecated: mode parameter and summarized context

The mode parameter is no longer supported. Previously, Zep offered a “summarized” context mode that used an LLM to condense context into a shorter format. However, we were unable to achieve the low latency required for real-time agent interactions with this approach. The Context Block now returns a user summary and structured facts in a detailed format optimized for both performance and information preservation.

Retrieving the Context Block

1 # Get memory for the thread
2 memory = client.thread.get_user_context(thread_id=thread_id)
3 
4 # Access the context block (for use in prompts)
5 context_block = memory.context
6 print(context_block)

Context Block Format

The Context Block returns a user summary along with relevant facts in a structured format:

# This is the user summary
<USER_SUMMARY>
Emily Painter is a user with account ID Emily0e62 who uses digital art tools for creative work. She maintains an active account with the service, though has recently experienced technical issues with the Magic Pen Tool. Emily values reliable payment processing and seeks prompt resolution for account-related issues. She expects clear communication and efficient support when troubleshooting technical problems.
</USER_SUMMARY>
# These are the most relevant facts and their valid date ranges
# format: FACT (Date range: from - to)
<FACTS>
  - Emily is experiencing issues with logging in. (2024-11-14 02:13:19+00:00 - present)
  - User account Emily0e62 has a suspended status due to payment failure. (2024-11-14 02:03:58+00:00 - present)
  - user has the id of Emily0e62 (2024-11-14 02:03:54 - present)
  - The failed transaction used a card with last four digits 1234. (2024-09-15 00:00:00+00:00 - present)
  - The reason for the transaction failure was 'Card expired'. (2024-09-15 00:00:00+00:00 - present)
  - user has the name of Emily Painter (2024-11-14 02:03:54 - present)
  - Account Emily0e62 made a failed transaction of 99.99. (2024-07-30 00:00:00+00:00 - 2024-08-30 00:00:00+00:00)
</FACTS>

Getting the Context Block Sooner

You can get the Context Block sooner by passing in the return_context=True flag to the thread.add_messages() method. Read more about this in our performance guide.

Custom Context Templates

You can customize the format of the Context Block by using context templates. Templates allow you to define how memory data is structured and presented while keeping Zep’s automatic relevance detection.

To use a template, pass the template_id parameter when retrieving context:

1 from zep_cloud import Zep
2 
3 client = Zep(api_key="YOUR_API_KEY")
4 
5 # Create a custom template
6 client.context.create_context_template(
7     template_id="customer-support",
8     template="""# CUSTOMER PROFILE
9 %{user_summary}
10 
11 # RECENT INTERACTIONS
12 %{edges limit=10}
13 
14 # KEY ENTITIES
15 %{entities limit=5}"""
16 )
17 
18 # Use the template to retrieve context
19 memory = client.thread.get_user_context(
20     thread_id="thread_id",
21     template_id="customer-support"
22 )
23 context_block = memory.context

See the Context Templates guide to learn how to create and manage templates.

Advanced Context Block Construction

For maximum control over memory retrieval, see our Advanced Context Block Construction cookbook. This approach lets you directly search the graph and assemble results with complete control over search queries, parameters, and formatting.

Using Memory

Provide the Context Block in Your System Prompt

Once you’ve retrieved the Context Block, used a custom context template, or constructed your own context block, you can include this string in your system prompt:

MessageType	Content
`System`	Your system prompt `{Zep context block}`
`Assistant`	An assistant message stored in Zep
`User`	A user message stored in Zep
…	…
`User`	The latest user message

Provide the Last 4 to 6 Messages of the Thread

You should also include the last 4 to 6 messages of the thread when calling your LLM provider. Because Zep’s ingestion can take a few minutes, the context block may not include information from the last few messages; and so the context block acts as the “long-term memory,” and the last few messages serve as the raw, short-term memory.