AutoGen integration | Zep Documentation

The zep-autogen package integrates Zep with Microsoft AutoGen agents, backing them with long-term memory and a temporal knowledge graph. It provides memory classes that plug into AutoGen’s native Memory interface for automatic context injection, plus function tools the agent can call to search and add data on demand. Choose between user-specific conversation memory or structured knowledge graph memory.

Core benefits

Native Memory interface: ZepUserMemory and ZepGraphMemory implement AutoGen’s Memory interface, so they drop straight into an agent’s memory list
Automatic context injection: Relevant memory is retrieved and prepended to the model context before each turn via update_context()
User and knowledge graphs: Persist a user’s conversation history or maintain a shared knowledge graph with custom entity models
On-demand function tools: Pre-built tools let the agent explicitly search and add graph data when it chooses
Graceful degradation: A Zep failure is logged but does not crash the agent run

How it works

The integration exposes two complementary retrieval paths:

Memory classes (ZepUserMemory, ZepGraphMemory) attach to an agent’s memory list. AutoGen calls update_context() before each turn, and the class retrieves memory from Zep and injects it as a system message — transparent, automatic context on every interaction.
Function tools (create_search_graph_tool, create_add_graph_data_tool) attach to an agent’s tools list. The model decides when to call them, giving explicit, observable search and add operations that work with AutoGen’s tool reflection.

Both approaches can be combined on the same agent: memory for consistent background context, tools for targeted lookups.

Context injection is automatic, but persistence is not: AutoGen’s Memory protocol has no hook that fires after the model responds, so your application calls memory.add() explicitly — typically once per user turn and once per assistant turn. This is AutoGen’s design, not a limitation of the integration.

Installation

$ pip install zep-autogen zep-cloud autogen-core autogen-agentchat

Requires Python 3.11+, zep-cloud>=3.23.0, autogen-agentchat>=0.7.0, and a Zep Cloud API key. Get your API key from app.getzep.com.

Set up your environment variables:

$ export ZEP_API_KEY="your-zep-api-key"
$ export OPENAI_API_KEY="your-openai-api-key"

Upgrading from zep-autogen 1.1.x

Two changes affect existing code.

ZepUserMemory now creates the Zep user and thread lazily on first use instead of requiring pre-provisioning. If your code relied on a 404 from update_context() to detect an unprovisioned user, that signal is gone — call ensure_user/ensure_thread explicitly instead and check their return value.

Search tools also expose scope, reranker, limit, mmr_lambda, and center_node_uuid to the model by default — pass pinned_params (or the legacy scope/limit arguments, which pin) to restore fixed values. See the changelog for the full release history.

Memory types

User memory: Stores conversation history in user threads with automatic context injection
Knowledge graph memory: Maintains structured knowledge with custom entity models

User memory

ZepUserMemory persists messages to a user’s thread and injects the context block into the agent before each turn. Set up the imports, initialize the memory, attach it to an agent, then store messages as the conversation proceeds.

Import dependencies

1 import os
2 import uuid
3 import asyncio
4 from autogen_agentchat.agents import AssistantAgent
5 from autogen_ext.models.openai import OpenAIChatCompletionClient
6 from autogen_core.memory import MemoryContent, MemoryMimeType
7 from zep_cloud.client import AsyncZep
8 from zep_autogen import ZepUserMemory

Initialize the client and memory

ZepUserMemory binds the client, user, and thread into a memory object that AutoGen can attach to an agent. The Zep user and thread are created lazily on first use by whichever of add() or update_context() runs first — no pre-creation step is required. Creation is idempotent and cached per instance.

1 zep_client = AsyncZep(api_key=os.environ.get("ZEP_API_KEY"))
2 user_id = f"user_{uuid.uuid4().hex[:16]}"
3 thread_id = f"thread_{uuid.uuid4().hex[:16]}"
4 
5 memory = ZepUserMemory(
6     client=zep_client,
7     user_id=user_id,
8     thread_id=thread_id,
9     first_name="Alice",
10     email="[email protected]",
11 )

Parameter	Description
`client`	An initialized `AsyncZep` instance (required)
`user_id`	Zep user ID for memory isolation (required)
`thread_id`	Thread identifier; generated automatically if omitted
`context_template_id`	Zep context template used to render the retrieved context block; ignored when `context_builder` is set
`first_name`, `last_name`, `email`	Passed to `user.add` during lazy provisioning; helps Zep anchor the user’s identity node in the graph
`on_created`	Async hook run exactly once, only when the user is newly created — use it for per-user ontology or custom instructions
`context_builder`	Async callable replacing the default context retrieval in `update_context()` — see custom context retrieval
`context_template`	Template wrapping injected context; defaults to `DEFAULT_CONTEXT_TEMPLATE`

The lazy path never raises into add() or update_context(): a provisioning failure (including an on_created failure) is logged and swallowed. To surface provisioning failures loudly — for example during account onboarding, before the first turn — call ensure_user and ensure_thread out-of-band:

1 from zep_autogen import ensure_user, ensure_thread
2 
3 await ensure_user(zep_client, user_id=user_id, first_name="Alice", email="[email protected]")
4 await ensure_thread(zep_client, thread_id=thread_id, user_id=user_id)

Both helpers are idempotent and return True only when the resource is newly created.

Attach the memory to an agent

Pass the memory in the agent’s memory list so context is injected before each turn.

1 # Create agent with Zep memory
2 agent = AssistantAgent(
3     name="MemoryAwareAssistant",
4     model_client=OpenAIChatCompletionClient(
5         model="gpt-4.1-mini",
6         api_key=os.environ.get("OPENAI_API_KEY")
7     ),
8     memory=[memory],
9     system_message="You are a helpful assistant with persistent memory."
10 )

Store messages and run

Persistence is manual: AutoGen never calls memory.add() for you, so persist each turn explicitly — once for the user message and once for the assistant reply. The agent automatically retrieves context via update_context() before responding; skipping the add() calls means the agent still sees Zep’s existing context, but that turn’s messages are never written to Zep and cannot be recalled later.

1 # Helper function to store messages with proper metadata
2 async def add_message(message: str, role: str, name: str = None):
3     """Store a message in Zep memory following AutoGen standards."""
4     metadata = {"type": "message", "role": role}
5     if name:
6         metadata["name"] = name
7 
8     await memory.add(MemoryContent(
9         content=message,
10         mime_type=MemoryMimeType.TEXT,
11         metadata=metadata
12     ))
13 
14 # Example conversation with memory persistence
15 user_message = "My name is Alice and I love hiking in the mountains."
16 print(f"User: {user_message}")
17 
18 # Store user message
19 await add_message(user_message, "user", "Alice")
20 
21 # Run agent - it will automatically retrieve context via update_context()
22 response = await agent.run(task=user_message)
23 agent_response = response.messages[-1].content
24 print(f"Agent: {agent_response}")
25 
26 # Store agent response
27 await add_message(agent_response, "assistant")

Automatic context injection: ZepUserMemory injects relevant memory via the update_context() method before each turn. On the default retrieval path it injects the context block and, when one is available, also appends up to 10 recent thread messages. When a context_builder is set, only the builder’s output is injected.

Allow time for indexing — Zep extracts knowledge asynchronously, so facts from a turn are not instantly searchable. Allow time for indexing before querying for newly added content.

Custom context retrieval

By default, update_context() retrieves context via thread.get_user_context(...). Pass context_builder to replace this with custom logic — for example a filtered graph search, or a different graph entirely:

1 from zep_autogen.memory import ContextInput
2 
3 async def my_builder(ctx: ContextInput) -> str | None:
4     results = await ctx.zep.graph.search(
5         user_id=ctx.user_id,
6         query=ctx.user_message,
7         scope="edges",
8     )
9     if not results.edges:
10         return None
11     return "\n".join(edge.fact for edge in results.edges)
12 
13 memory = ZepUserMemory(
14     client=zep_client,
15     user_id=user_id,
16     thread_id=thread_id,
17     context_builder=my_builder,
18 )

The builder receives a single frozen ContextInput:

Field	Description
`zep`	The `AsyncZep` client in use by this memory instance
`user_id`	The Zep user ID the memory is scoped to
`thread_id`	The Zep thread ID the memory records the conversation in
`user_message`	The last user-role message’s text from the model context (`""` if none)
`model_context`	The AutoGen `ChatCompletionContext` passed to `update_context()` for this call

If the builder raises, a warning is logged and context injection is skipped for that turn — update_context() never raises. The builder is retrieval-only and never runs concurrently with message persistence: AutoGen’s Memory protocol calls update_context() (injection) and add() (persistence) as two separate, caller-controlled steps, so persist turns explicitly via add().

Customizing the injected context template

Retrieved context (from the default retrieval or a context_builder) is wrapped in context_template before being added to the model context as a system message. The default DEFAULT_CONTEXT_TEMPLATE wraps the context in <ZEP_CONTEXT> tags with a short preamble. Override it with your own wording, as long as it contains a literal {context} placeholder:

1 memory = ZepUserMemory(
2     client=zep_client,
3     user_id=user_id,
4     thread_id=thread_id,
5     context_template="Relevant background:\n{context}",
6 )

The template is rendered via plain string replacement (template.replace("{context}", ...)), never str.format, so context text containing {, }, or % is always safe to inject.

Knowledge graph memory

ZepGraphMemory maintains a standalone knowledge graph with custom entity models. Define an ontology, create the graph, initialize the memory with search filters, add data, then attach the memory to an agent.

ZepGraphMemory is scoped to a standalone graph_id, not a Zep user, so it has no on_created hook and no lazy user provisioning — create the graph out-of-band via graph.create as shown below.

Define entity models

Custom entity models shape how Zep extracts structured knowledge from the data you add.

1 from zep_autogen.graph_memory import ZepGraphMemory
2 from zep_cloud.external_clients.ontology import EntityModel, EntityText
3 from pydantic import Field
4 
5 # Define entity models using Pydantic
6 class ProgrammingLanguage(EntityModel):
7     """A programming language entity."""
8     paradigm: EntityText = Field(
9         description="programming paradigm (e.g., object-oriented, functional)",
10         default=None
11     )
12     use_case: EntityText = Field(
13         description="primary use cases for this language",
14         default=None
15     )
16 
17 class Framework(EntityModel):
18     """A software framework or library."""
19     language: EntityText = Field(
20         description="the programming language this framework is built for",
21         default=None
22     )
23     purpose: EntityText = Field(
24         description="primary purpose of this framework",
25         default=None
26     )

Set the ontology and create the graph

1 from zep_cloud import SearchFilters
2 
3 # Set ontology first
4 await zep_client.graph.set_ontology(
5     entities={
6         "ProgrammingLanguage": ProgrammingLanguage,
7         "Framework": Framework,
8     }
9 )
10 
11 # Create graph
12 graph_id = f"graph_{uuid.uuid4().hex[:16]}"
13 try:
14     await zep_client.graph.create(
15         graph_id=graph_id,
16         name="Programming Knowledge Graph"
17     )
18     print(f"Created graph: {graph_id}")
19 except Exception as e:
20     print(f"Graph creation failed: {e}")

Initialize the graph memory

Configure search filters and context limits to control what ZepGraphMemory injects on each turn.

1 # Create graph memory with search configuration
2 graph_memory = ZepGraphMemory(
3     client=zep_client,
4     graph_id=graph_id,
5     search_filters=SearchFilters(
6         node_labels=["ProgrammingLanguage", "Framework"]
7     ),
8     facts_limit=20,  # Max facts in context injection (default: 20)
9     entity_limit=5   # Max entities in context injection (default: 5)
10 )

Add data and wait for indexing

Knowledge extraction is asynchronous, so allow time for indexing before the data is searchable.

1 # Add structured knowledge
2 await graph_memory.add(MemoryContent(
3     content="Python is excellent for data science and AI development",
4     mime_type=MemoryMimeType.TEXT,
5     metadata={"type": "data"}  # "data" stores in graph, "message" stores as episode
6 ))
7 
8 # Wait for graph processing (required)
9 print("Waiting for graph indexing...")
10 await asyncio.sleep(30)  # Allow time for knowledge extraction

Attach the memory to an agent

Pass the graph memory in the agent’s memory list so relevant facts and entities are injected before each turn.

1 # Create agent with graph memory
2 agent = AssistantAgent(
3     name="GraphMemoryAssistant",
4     model_client=OpenAIChatCompletionClient(model="gpt-4.1-mini"),
5     memory=[graph_memory],
6     system_message="You are a technical assistant with programming knowledge."
7 )

Graph memory context injection: ZepGraphMemory automatically retrieves the last 2 episodes from the graph and uses their content to query for relevant facts (up to facts_limit) and entities (up to entity_limit). This context is injected as a system message during agent interactions.

Tools integration

Zep tools let agents search and add data directly to memory storage with manual control and structured responses.

Important: Tools must be bound to either graph_id OR user_id, not both. This determines whether they operate on knowledge graphs or user graphs.

Search tool parameters

create_search_graph_tool follows a pin-or-expose pattern: every graph.search parameter is exposed to the model by default, each with a typed schema and documented default. Letting the model choose the scope and reranker per query produces better retrieval than a single fixed configuration; pin parameters when you need deterministic behavior instead. query is always exposed and required.

Parameter	Default	Description
`scope`	`"edges"`	One of `edges`, `nodes`, `episodes`, `observations`, `thread_summaries`, `auto`
`reranker`	`"rrf"`	One of `rrf`, `mmr`, `node_distance`, `episode_mentions`, `cross_encoder`
`limit`	`10`	Maximum number of results
`mmr_lambda`	`None`	Diversity (0.0) vs. relevance (1.0) balance; only used when `reranker="mmr"`
`center_node_uuid`	`None`	Center node for `reranker="node_distance"`

Use pinned_params to fix a parameter to a constant (hidden from the model), or hidden_params to remove it from the schema without pinning (Zep’s server-side default applies):

1 # Model chooses scope/reranker/limit/mmr_lambda/center_node_uuid freely (default)
2 tool = create_search_graph_tool(zep_client, user_id=user_id)
3 
4 # Pin scope to "nodes" and limit to 5 — hidden from the model, always sent as given
5 tool = create_search_graph_tool(
6     zep_client, user_id=user_id, pinned_params={"scope": "nodes", "limit": 5}
7 )
8 
9 # Hide mmr_lambda from the schema without pinning it — Zep's own default applies
10 tool = create_search_graph_tool(zep_client, user_id=user_id, hidden_params={"mmr_lambda"})

The legacy scope and limit arguments pin (and hide) the corresponding parameter — equivalent to passing them via pinned_params. search_filters and bfs_origin_node_uuids are constructor-only and never exposed to the model.

AutoGen’s FunctionTool derives its JSON schema strictly from the wrapped function’s typed signature. create_search_graph_tool implements pin-or-expose by building that signature dynamically: exposed parameters become real, typed parameters of the function AutoGen introspects, while pinned and hidden parameters are never part of the signature at all.

Add tool parameters

create_add_graph_data_tool exposes:

data: str (required) - Content to store
data_type: str (optional, default “text”) - Data type: “text”, “json”, “message”

User graph tools

1 from zep_autogen import create_search_graph_tool, create_add_graph_data_tool
2 
3 # Create tools bound to user graph
4 search_tool = create_search_graph_tool(zep_client, user_id=user_id)
5 add_tool = create_add_graph_data_tool(zep_client, user_id=user_id)
6 
7 # Agent with user graph tools
8 agent = AssistantAgent(
9     name="UserKnowledgeAssistant",
10     model_client=OpenAIChatCompletionClient(model="gpt-4.1-mini"),
11     tools=[search_tool, add_tool],
12     system_message="You can search and add data to the user's knowledge graph.",
13     reflect_on_tool_use=True  # Enables tool usage reflection
14 )

Knowledge graph tools

1 # Create tools bound to knowledge graph
2 search_tool = create_search_graph_tool(zep_client, graph_id=graph_id)
3 add_tool = create_add_graph_data_tool(zep_client, graph_id=graph_id)
4 
5 # Agent with knowledge graph tools
6 agent = AssistantAgent(
7     name="KnowledgeGraphAssistant",
8     model_client=OpenAIChatCompletionClient(model="gpt-4.1-mini"),
9     tools=[search_tool, add_tool],
10     system_message="You can search and add data to the knowledge graph.",
11     reflect_on_tool_use=True
12 )

Size limits

Zep rejects over-long direct SDK payloads with an HTTP 400. The AutoGen integration truncates before calling Zep, logging only the before and after lengths (never the content):

Thread messages (ZepUserMemory.add with type="message"): truncated to 4,000 characters, a safety margin under Zep’s 4,096-character thread-message limit
Graph data (ZepGraphMemory.add, ZepUserMemory.add with type="data", and create_add_graph_data_tool): truncated to 9,900 characters, a safety margin under Zep’s 10,000-character graph.add limit

Query memory

Both memory types support direct querying with different scope parameters.

User memory queries

1 # Query user conversation history
2 results = await memory.query("What does Alice like?", limit=5)
3 
4 # Process different result types
5 for result in results.results:
6     content = result.content
7     metadata = result.metadata
8 
9     if 'edge_name' in metadata:
10         # Fact/relationship result
11         print(f"Fact: {content}")
12         print(f"Relationship: {metadata['edge_name']}")
13         print(f"Valid: {metadata.get('valid_at', 'N/A')} - {metadata.get('invalid_at', 'present')}")
14     elif 'node_name' in metadata:
15         # Entity result
16         print(f"Entity: {metadata['node_name']}")
17         print(f"Summary: {content}")
18     else:
19         # Episode/message result
20         print(f"Message: {content}")
21         print(f"Role: {metadata.get('episode_role', 'unknown')}")
22 
23     print(f"Source: {metadata.get('source')}\n")

Graph memory queries

1 # Query knowledge graph with scope control
2 facts_results = await graph_memory.query(
3     "Python frameworks",
4     limit=10,
5     scope="edges"  # "edges" (facts), "nodes" (entities), "episodes" (messages)
6 )
7 
8 print(f"Found {len(facts_results.results)} facts about Python frameworks:")
9 for result in facts_results.results:
10     print(f"- {result.content}")
11 
12 entities_results = await graph_memory.query(
13     "programming languages",
14     limit=5,
15     scope="nodes"
16 )
17 
18 print(f"\nFound {len(entities_results.results)} programming language entities:")
19 for result in entities_results.results:
20     entity_name = result.metadata.get('node_name', 'Unknown')
21     print(f"- {entity_name}: {result.content}")

Search result structure

Edge results (facts)

1 {
2     "content": "fact text",
3     "metadata": {
4         "source": "graph" | "user_graph",
5         "edge_name": "relationship_name",
6         "edge_attributes": {...},
7         "created_at": "timestamp",
8         "valid_at": "timestamp",
9         "invalid_at": "timestamp",
10         "expired_at": "timestamp"
11     }
12 }

Node results (entities)

1 {
2     "content": "entity_name:\n entity_summary",
3     "metadata": {
4         "source": "graph" | "user_graph",
5         "node_name": "entity_name",
6         "node_attributes": {...},
7         "created_at": "timestamp"
8     }
9 }

Episode results (messages)

1 {
2     "content": "episode_content",
3     "metadata": {
4         "source": "graph" | "user_graph",
5         "episode_type": "source_type",
6         "episode_role": "role_type",
7         "episode_name": "role_name",
8         "created_at": "timestamp"
9     }
10 }

Memory vs tools comparison

Memory objects (ZepUserMemory / ZepGraphMemory):

Automatic context injection via update_context(); persistence stays manual via add()
Attached to the agent’s memory list
Transparent operation — happens automatically
Better for consistent memory across interactions

Function tools (search/add tools):

Manual control — the agent decides when to use them
More explicit and observable operations
Better for specific search/add operations
Works with AutoGen’s tool reflection features
Provides structured return values

Note: Both approaches can be combined — use memory for automatic context and tools for explicit operations.

Best practices

Pick the right memory type — use ZepUserMemory for per-user conversation history and ZepGraphMemory for a shared knowledge graph
Persist every turn explicitly — call memory.add() once per user turn and once per assistant turn; injection is the only automatic half of the loop
Bind tools to exactly one scope — a search or add tool targets either a graph_id or a user_id, never both
Combine memory and tools — attach a memory class for automatic context and add function tools for targeted lookups
Allow time for indexing — Zep extracts knowledge asynchronously, so facts from a turn are not instantly searchable

Next steps

Explore customizing graph structure for advanced knowledge organization
Learn about searching the graph and how to tune search
See code examples for additional patterns