Every Zep integration is shaped by a few key architectural choices — what context to store, how to ingest it, and how to retrieve it. This page first walks through those choices, then presents some of the most common architecture patterns that result from combining them. The patterns below are not exhaustive; your architecture may combine these choices differently depending on your use case.
These are the key architectural choices to consider before implementing Zep for your use case:
Retrieval strategy describes when and how context is pulled. What kinds of context exist — facts, entities, episodes, thread summaries, observations, and the user summary — is a separate decision. See Context Types for descriptions of each primitive and when to reach for it.
Problem: Your conversational agent forgets chat history between sessions and has no context about the user beyond the current conversation.
This is the most common pattern for chatbots and conversational assistants. Chat messages are persisted through the Thread API, and user-specific business data (CRM records, support history, etc.) is sent to the user’s context graph via the Graph API. Context is retrieved from the user’s context graph before each response.
Messages flow into Zep via the Thread API. Zep extracts facts and entities into the user’s context graph. Business data (CRM records, user events, etc.) can be sent directly to the graph via graph.add(). When you call get_user_context(), Zep uses the latest messages in the thread as a search query to assemble relevant context from the user’s context graph.
get_user_context().This is the pattern used in the Quick Start Guide, which walks through the complete end-to-end setup.
Problem: Your agent needs access to company knowledge, business data, or other domain context that doesn’t come from a chat conversation.
Use this pattern when you want to populate Zep with domain knowledge, business data, or external data sources that exist outside of conversations. This is common for background data ingestion pipelines and agents that don’t have a traditional chat interface.
Data from any source is ingested directly into a context graph via graph.add(). The agent retrieves context via graph.search() without needing a thread.
graph.search() calls in application code.This is the pattern used in the Give Your Agent Domain Knowledge cookbook, which walks through ingesting data into a graph and searching it end-to-end.
Related guides: Adding Business Data, Searching the Graph, Group Chat FAQ
Problem: Your conversational agent forgets chat history between sessions and has no context about the user beyond the current conversation — but you want the agent to decide when to retrieve context rather than injecting it on every turn.
This pattern uses the same ingestion as Pattern 1 — messages are persisted through the Thread API and business data is sent to the graph — but retrieval happens through agent tool calls instead of automatically. The LLM decides whether and when to search based on the conversation context.
Ingestion is identical to Pattern 1: messages flow into Zep via the Thread API, facts are auto-extracted into the graph, and business data can be sent directly via graph.add(). The difference is retrieval — instead of calling get_user_context() on every turn, the agent exposes graph.search() as a tool and the LLM decides when to call it.
Follow the Quick Start Guide for ingestion setup, but instead of calling get_user_context() on every turn, expose graph.search() as a tool in your agent framework and let the LLM call it when needed.
Related guides: Searching the Graph, LangGraph Integration
Problem: Your conversational agent needs both user-specific context (chat history, preferences, account details) and domain context (company policies, product catalog, runbooks) to generate informed responses.
This pattern combines Patterns 1 and 2. Chat messages are persisted through the Thread API, user-specific business data goes to the user graph, and domain knowledge goes to a standalone graph. At retrieval time, the agent assembles context from both graphs and includes both in the LLM’s context window.
Chat messages flow into Zep via the Thread API, and facts are auto-extracted into the user graph. User-specific business data is sent to the user graph via graph.add(), while domain data (product catalogs, policies, etc.) is sent to a standalone graph. At retrieval time, the agent calls get_user_context() for user context and graph.search() on the standalone graph for domain context, then includes both in the LLM’s context window.
This pattern combines the implementations from the Quick Start Guide (user context and thread ingestion), the Give Your Agent Domain Knowledge cookbook (standalone graph ingestion), and the Share Context Across Users Using Graphs cookbook (retrieving from both graphs and combining them into a single context block).
Related guides: Assembling Context, Advanced Context Block Construction, Create Graph
Problem: Your agent needs to provide fast, responsive interactions — and even small increases in context retrieval latency meaningfully degrade the user experience.
This pattern applies to any of the four architectures above. The difference is that it incorporates Zep’s latency optimizations to minimize the time between a user’s input and the agent’s response. This is especially important for voice and video agents, where latency is immediately perceptible, but it benefits any agent using Zep where user experience matters.
Follow the Performance Best Practices guide, which covers the key latency optimizations: requesting context in the same call as message ingestion, running graph operations concurrently, and warming the user cache ahead of retrieval.