Memory | Zep Documentation

Zep’s agent memory capabilities make context engineering simple: you add memory with a single line, retrieve memory with a single line, and then can immediately use the retrieved memory in your next LLM call.

The Memory API is high-level and opinionated. For a more customizable, low-level way to add and retrieve memory, see the Graph API.

Adding memory

Add your chat history to Zep using the memory.add method. memory.add is session-specific and expects data in chat message format, including a role name (e.g., user’s real name), role_type (AI, human, tool), and message content. Zep stores the chat history and builds a user-level knowledge graph from the messages.

For best results, add chat history to Zep on every chat turn. That is, add both the AI and human messages in a single operation and in the order that the messages were created.

The example below adds messages to Zep’s memory for the user in the given session:

1 from zep_cloud.client import AsyncZep
2 from zep_cloud.types import Message
3 
4 zep_client = AsyncZep(
5     api_key=API_KEY,
6 )
7 
8 messages = [
9     Message(
10         role="Jane",
11         role_type="user",
12         content="Who was Octavia Butler?",
13     )
14 ]
15 
16 await zep_client.memory.add(session_id, messages=messages)

You can find additional arguments to memory.add in the SDK reference. Notably, for latency sensitive applications, you can set return_context to true which will make memory.add return a context string in the way that memory.get does (discussed below).

If you are looking to add JSON or unstructured text as memory to the graph, you will need to use our Graph API.

Ignore assistant messages

You can also pass in a list of role types to ignore when adding data to the graph using the ignore_roles argument. For example, you may not want assistant messages to be added to the user graph; providing the assistant messages in the memory.add call while setting ignore_roles to include “assistant” will make it so that only the user messages are ingested into the graph, but the assistant messages are still used to contextualize the user messages. This is important in case the user message itself does not have enough context, such as the message “Yes.” Additionally, the assistant messages will still be added to the session’s message history.

Retrieving memory

The memory.get() method is a user-friendly, high-level API for retrieving relevant context from Zep. It uses the latest messages of the given session to determine what information is most relevant from the user’s knowledge graph and returns that information in a context string for your prompt. Note that although memory.get() only requires a session ID, it is able to return memory derived from any session of that user. The session is just used to determine what’s relevant.

memory.get also returns recent chat messages and raw facts that may provide additional context for your agent. We recommend using these raw messages when you call your LLM provider (see below). The memory.get method is user and session-specific and cannot retrieve data from group graphs.

The example below gets the memory.context string for the given session:

1 memory = zep_client.memory.get(session_id="session_id")
2 # the context field described above
3 context = memory.context

You can find additional arguments to memory.get in the SDK reference. Notably, you can specify a minimum fact rating which will filter out any retrieved facts with a rating below the threshold, if you are using fact ratings.

If you are looking to customize how memory is retrieved, you will need to search the graph and construct a custom memory context string. For example, memory.get uses the last few messages as the search query on the graph, but using the graph API you can use whatever query you want, as well as experiment with other search parameters such as re-ranker used.

Using memory

Once you’ve retrieved the memory context string, or constructed your own context string by searching the graph, you can include this string in your system prompt:

MessageType	Content
`System`	Your system prompt `{Zep context string}`
`Assistant`	An assistant message stored in Zep
`User`	A user message stored in Zep
…	…
`User`	The latest user message

You should also include the last 4 to 6 messages of the session when calling your LLM provider. Because Zep’s ingestion can take a few minutes, the context string may not include information from the last few messages; and so the context string acts as the “long-term memory,” and the last few messages serve as the raw, short-term memory.

In latency sensitive applications such as voice chat bots, you can use the context string returned from memory.add to avoid making two API calls.

Customizing memory

The Memory API is our high level, easy-to-use API for adding and retrieving memory. If you want to add business data or documents to memory, or further customize how memory is retrieved, you should refer to our Guides on using the graph, such as adding data to the graph and searching the graph. We also have a cookbook on creating a custom context string using the graph API.

Additionally, group graphs can be used to store non-user-specific memory.