Architecture patterns

Choose the right Zep architecture by deciding across scope, ingestion, and retrieval axes.

Every Zep integration is shaped by a few key architectural choices — what context to store, how to ingest it, and how to retrieve it. This page first walks through those choices, then presents some of the most common architecture patterns that result from combining them. The patterns below are not exhaustive; your architecture may combine these choices differently depending on your use case.

Key architectural choices

These are the key architectural choices to consider before implementing Zep for your use case:

Context scope

Use case conditionRecommendationRelevant Docs
You want to persist and retrieve user-specific context (for example: chat history, preferences, support interactions)Use a user context graphCreate a Zep user
You want to persist and retrieve domain context (for example: company policies, product data, runbooks)Use a standalone context graphCreate Graph
You want to persist and retrieve both user-specific context and domain context in the same workflowUse user + standalone context graphs and assemble context from bothCreate a Zep user, Create Graph

What data you persist to Zep

Use case conditionRecommendationRelevant Docs
You want to persist conversational context from turns in an agent/chat loopUse Thread API for chat ingestion and thread continuityCreate a Zep thread
You want to persist non-chat system context (CRM, logs, tickets, emails, docs, events)Use Graph API for ingestionAdding Business Data
You want to persist both conversational context and business/system contextUse Thread API + Graph API togetherCreate a Zep thread, Adding Business Data

Retrieval strategy

Use case conditionRecommendationRelevant Docs
You want context retrieved deterministically, before every agent responseRetrieve or assemble a context block from Zep on each turn. Options include: Zep’s default context block, custom context templates, or advanced context block constructionAssembling Context Methods
You want retrieval to happen through agent tool callsExpose graph.search() as a tool and let the LLM call it when neededSearching the Graph

Pattern 1: ingesting conversations + user data

Problem: Your conversational agent forgets chat history between sessions and has no context about the user beyond the current conversation.

This is the most common pattern for chatbots and conversational assistants. Chat messages are persisted through the Thread API, and user-specific business data (CRM records, support history, etc.) is sent to the user’s context graph via the Graph API. Context is retrieved from the user’s context graph before each response.

Architecture diagram

YOUR APPLICATIONAgentUser Data(CRM, events)ZepThreadUser Graphadd_messages()get_user_context()graph.add()auto-extract to graph

Messages flow into Zep via the Thread API. Zep extracts facts and entities into the user’s context graph. Business data (CRM records, user events, etc.) can be sent directly to the graph via graph.add(). When you call get_user_context(), Zep uses the latest messages in the thread as a search query to assemble relevant context from the user’s context graph.

When to use

  • You need user-specific context in a conversational experience.
  • You want to persist chat messages to Zep using the Thread API.
  • You may also want to persist user-specific business data (CRM/events) using Graph API.
  • You want automatic context retrieval per turn via get_user_context().

How to implement

This is the pattern used in the Quick Start Guide, which walks through the complete end-to-end setup.


Pattern 2: ingesting domain data

Problem: Your agent needs access to company knowledge, business data, or other domain context that doesn’t come from a chat conversation.

Use this pattern when you want to populate Zep with domain knowledge, business data, or external data sources that exist outside of conversations. This is common for background data ingestion pipelines and agents that don’t have a traditional chat interface.

Architecture diagram

YOUR APPLICATIONAgentDocs(policies, products)Tickets(Jira, Zendesk)Communications(email, Slack, Teams)ZepStandalone Graphgraph.add()graph.add()graph.add()graph.search()

Data from any source is ingested directly into a context graph via graph.add(). The agent retrieves context via graph.search() without needing a thread.

When to use

  • You need domain knowledge such as company policies, product data, or runbooks.
  • You are ingesting business data, events, logs, or transcripts directly with Graph API.
  • You do not need thread-based chat persistence for this workflow.
  • You want deterministic retrieval via direct graph.search() calls in application code.

How to implement

This is the pattern used in the Give Your Agent Domain Knowledge cookbook, which walks through ingesting data into a graph and searching it end-to-end.

Related guides: Adding Business Data, Searching the Graph, Group Chat FAQ


Pattern 3: ingesting conversations + user data with tool-call retrieval

Problem: Your conversational agent forgets chat history between sessions and has no context about the user beyond the current conversation — but you want the agent to decide when to retrieve context rather than injecting it on every turn.

This pattern uses the same ingestion as Pattern 1 — messages are persisted through the Thread API and business data is sent to the graph — but retrieval happens through agent tool calls instead of automatically. The LLM decides whether and when to search based on the conversation context.

Architecture diagram

YOUR APPLICATIONAgentUser Data(CRM, events)ZepThreadUser Graphadd_messages()graph.search() (tool)graph.add()auto-extract to graph

Ingestion is identical to Pattern 1: messages flow into Zep via the Thread API, facts are auto-extracted into the graph, and business data can be sent directly via graph.add(). The difference is retrieval — instead of calling get_user_context() on every turn, the agent exposes graph.search() as a tool and the LLM decides when to call it.

When to use

  • You are ingesting conversations and user data the same way as Pattern 1.
  • You want the LLM to decide when context retrieval is needed, rather than retrieving on every turn.
  • You are building a multi-tool agent where context search is one of several capabilities.
  • Your agent has variable context needs per interaction — some turns need retrieval, others don’t.

How to implement

Follow the Quick Start Guide for ingestion setup, but instead of calling get_user_context() on every turn, expose graph.search() as a tool in your agent framework and let the LLM call it when needed.

Related guides: Searching the Graph, LangGraph Integration


Pattern 4: ingesting conversations, user data, and domain data

Problem: Your conversational agent needs both user-specific context (chat history, preferences, account details) and domain context (company policies, product catalog, runbooks) to generate informed responses.

This pattern combines Patterns 1 and 2. Chat messages are persisted through the Thread API, user-specific business data goes to the user graph, and domain knowledge goes to a standalone graph. At retrieval time, the agent assembles context from both graphs and includes both in the LLM’s context window.

Architecture diagram

INGESTIONYOUR APPLICATIONAgentUser Data(CRM, events)Domain Data(docs, tickets, comms)ZepThreadUser GraphStandalone Graphadd_messages()graph.add()graph.add()extract
RETRIEVALYOUR APPLICATIONAgentZepUser GraphStandalone Graphget_user_context()graph.search()

Chat messages flow into Zep via the Thread API, and facts are auto-extracted into the user graph. User-specific business data is sent to the user graph via graph.add(), while domain data (product catalogs, policies, etc.) is sent to a standalone graph. At retrieval time, the agent calls get_user_context() for user context and graph.search() on the standalone graph for domain context, then includes both in the LLM’s context window.

When to use

  • Your agent needs both user-specific context and domain knowledge in the same conversation.
  • You are persisting chat messages via the Thread API and user data to a user graph.
  • You are also ingesting domain data into a standalone graph shared across users.
  • You want to assemble context from both graphs before each agent response.

How to implement

This pattern combines the implementations from the Quick Start Guide (user context and thread ingestion), the Give Your Agent Domain Knowledge cookbook (standalone graph ingestion), and the Share Context Across Users Using Graphs cookbook (retrieving from both graphs and combining them into a single context block).

Related guides: Assembling Context, Advanced Context Block Construction, Create Graph