AutoGen integration

Add long-term agent memory to AutoGen agents

The zep-autogen package integrates Zep with Microsoft AutoGen agents, backing them with long-term memory and a temporal knowledge graph. It provides memory classes that plug into AutoGen’s native Memory interface for automatic context injection, plus function tools the agent can call to search and add data on demand. Choose between user-specific conversation memory or structured knowledge graph memory.

Core benefits

  • Native Memory interface: ZepUserMemory and ZepGraphMemory implement AutoGen’s Memory interface, so they drop straight into an agent’s memory list
  • Automatic context injection: Relevant memory is retrieved and prepended to the model context before each turn via update_context()
  • User and knowledge graphs: Persist a user’s conversation history or maintain a shared knowledge graph with custom entity models
  • On-demand function tools: Pre-built tools let the agent explicitly search and add graph data when it chooses
  • Graceful degradation: A Zep failure is logged but does not crash the agent run

How it works

The integration exposes two complementary retrieval paths:

  • Memory classes (ZepUserMemory, ZepGraphMemory) attach to an agent’s memory list. AutoGen calls update_context() before each turn, and the class retrieves memory from Zep and injects it as a system message — transparent, automatic context on every interaction.
  • Function tools (create_search_graph_tool, create_add_graph_data_tool) attach to an agent’s tools list. The model decides when to call them, giving explicit, observable search and add operations that work with AutoGen’s tool reflection.

Both approaches can be combined on the same agent: memory for consistent background context, tools for targeted lookups.

Installation

$pip install zep-autogen zep-cloud autogen-core autogen-agentchat

Requires Python 3.11+, zep-cloud>=3.23.0, autogen-agentchat>=0.7.0, and a Zep Cloud API key. Get your API key from app.getzep.com.

Set up your environment variables:

$export ZEP_API_KEY="your-zep-api-key"
$export OPENAI_API_KEY="your-openai-api-key"

Memory types

  • User memory: Stores conversation history in user threads with automatic context injection
  • Knowledge graph memory: Maintains structured knowledge with custom entity models

User memory

ZepUserMemory persists messages to a user’s thread and injects the context block into the agent before each turn. Set up the imports, create the user and thread, initialize the memory, attach it to an agent, then store messages as the conversation proceeds.

1

Import dependencies

1import os
2import uuid
3import asyncio
4from autogen_agentchat.agents import AssistantAgent
5from autogen_ext.models.openai import OpenAIChatCompletionClient
6from autogen_core.memory import MemoryContent, MemoryMimeType
7from zep_cloud.client import AsyncZep
8from zep_autogen import ZepUserMemory
2

Create the user and thread

A user and thread must exist before memory can store messages against them.

1# Initialize Zep client
2zep_client = AsyncZep(api_key=os.environ.get("ZEP_API_KEY"))
3user_id = f"user_{uuid.uuid4().hex[:16]}"
4thread_id = f"thread_{uuid.uuid4().hex[:16]}"
5
6# Create user (required before using memory)
7try:
8 await zep_client.user.add(
9 user_id=user_id,
10 email="[email protected]",
11 first_name="Alice"
12 )
13except Exception as e:
14 print(f"User might already exist: {e}")
15
16# Create thread (required for conversation memory)
17try:
18 await zep_client.thread.create(thread_id=thread_id, user_id=user_id)
19except Exception as e:
20 print(f"Thread creation failed: {e}")
3

Initialize the memory

ZepUserMemory binds the client, user, and thread into a memory object that AutoGen can attach to an agent.

1# Create user memory with configuration
2memory = ZepUserMemory(
3 client=zep_client,
4 user_id=user_id,
5 thread_id=thread_id
6)
4

Attach the memory to an agent

Pass the memory in the agent’s memory list so context is injected before each turn.

1# Create agent with Zep memory
2agent = AssistantAgent(
3 name="MemoryAwareAssistant",
4 model_client=OpenAIChatCompletionClient(
5 model="gpt-4.1-mini",
6 api_key=os.environ.get("OPENAI_API_KEY")
7 ),
8 memory=[memory],
9 system_message="You are a helpful assistant with persistent memory."
10)
5

Store messages and run

Persist each turn to Zep as the conversation proceeds. The agent automatically retrieves context via update_context() before responding.

1# Helper function to store messages with proper metadata
2async def add_message(message: str, role: str, name: str = None):
3 """Store a message in Zep memory following AutoGen standards."""
4 metadata = {"type": "message", "role": role}
5 if name:
6 metadata["name"] = name
7
8 await memory.add(MemoryContent(
9 content=message,
10 mime_type=MemoryMimeType.TEXT,
11 metadata=metadata
12 ))
13
14# Example conversation with memory persistence
15user_message = "My name is Alice and I love hiking in the mountains."
16print(f"User: {user_message}")
17
18# Store user message
19await add_message(user_message, "user", "Alice")
20
21# Run agent - it will automatically retrieve context via update_context()
22response = await agent.run(task=user_message)
23agent_response = response.messages[-1].content
24print(f"Agent: {agent_response}")
25
26# Store agent response
27await add_message(agent_response, "assistant")

Automatic context injection: ZepUserMemory injects relevant memory via the update_context() method before each turn. It injects the context block, and when one is available also appends up to 10 recent thread messages.

Allow time for indexing — Zep extracts knowledge asynchronously, so facts from a turn are not instantly searchable. Allow time for indexing before querying for newly added content.

Knowledge graph memory

ZepGraphMemory maintains a standalone knowledge graph with custom entity models. Define an ontology, create the graph, initialize the memory with search filters, add data, then attach the memory to an agent.

1

Define entity models

Custom entity models shape how Zep extracts structured knowledge from the data you add.

1from zep_autogen.graph_memory import ZepGraphMemory
2from zep_cloud.external_clients.ontology import EntityModel, EntityText
3from pydantic import Field
4
5# Define entity models using Pydantic
6class ProgrammingLanguage(EntityModel):
7 """A programming language entity."""
8 paradigm: EntityText = Field(
9 description="programming paradigm (e.g., object-oriented, functional)",
10 default=None
11 )
12 use_case: EntityText = Field(
13 description="primary use cases for this language",
14 default=None
15 )
16
17class Framework(EntityModel):
18 """A software framework or library."""
19 language: EntityText = Field(
20 description="the programming language this framework is built for",
21 default=None
22 )
23 purpose: EntityText = Field(
24 description="primary purpose of this framework",
25 default=None
26 )
2

Set the ontology and create the graph

Register the entity models as the graph’s ontology, then create the graph that will hold the extracted knowledge.

1from zep_cloud import SearchFilters
2
3# Set ontology first
4await zep_client.graph.set_ontology(
5 entities={
6 "ProgrammingLanguage": ProgrammingLanguage,
7 "Framework": Framework,
8 }
9)
10
11# Create graph
12graph_id = f"graph_{uuid.uuid4().hex[:16]}"
13try:
14 await zep_client.graph.create(
15 graph_id=graph_id,
16 name="Programming Knowledge Graph"
17 )
18 print(f"Created graph: {graph_id}")
19except Exception as e:
20 print(f"Graph creation failed: {e}")
3

Initialize the graph memory

Configure search filters and context limits to control what ZepGraphMemory injects on each turn.

1# Create graph memory with search configuration
2graph_memory = ZepGraphMemory(
3 client=zep_client,
4 graph_id=graph_id,
5 search_filters=SearchFilters(
6 node_labels=["ProgrammingLanguage", "Framework"]
7 ),
8 facts_limit=20, # Max facts in context injection (default: 20)
9 entity_limit=5 # Max entities in context injection (default: 5)
10)
4

Add data and wait for indexing

Knowledge extraction is asynchronous, so allow time for indexing before the data is searchable.

1# Add structured knowledge
2await graph_memory.add(MemoryContent(
3 content="Python is excellent for data science and AI development",
4 mime_type=MemoryMimeType.TEXT,
5 metadata={"type": "data"} # "data" stores in graph, "message" stores as episode
6))
7
8# Wait for graph processing (required)
9print("Waiting for graph indexing...")
10await asyncio.sleep(30) # Allow time for knowledge extraction
5

Attach the memory to an agent

Pass the graph memory in the agent’s memory list so relevant facts and entities are injected before each turn.

1# Create agent with graph memory
2agent = AssistantAgent(
3 name="GraphMemoryAssistant",
4 model_client=OpenAIChatCompletionClient(model="gpt-4.1-mini"),
5 memory=[graph_memory],
6 system_message="You are a technical assistant with programming knowledge."
7)

Graph memory context injection: ZepGraphMemory automatically retrieves the last 2 episodes from the graph and uses their content to query for relevant facts (up to facts_limit) and entities (up to entity_limit). This context is injected as a system message during agent interactions.

Tools integration

Zep tools let agents search and add data directly to memory storage with manual control and structured responses.

Important: Tools must be bound to either graph_id OR user_id, not both. This determines whether they operate on knowledge graphs or user graphs.

Tool function parameters

Search tool parameters:

  • query: str (required) - Search query text
  • limit: int (optional, default 10) - Maximum results to return
  • scope: str (optional, default “edges”) - Search scope: “edges”, “nodes”, “episodes”

Add tool parameters:

  • data: str (required) - Content to store
  • data_type: str (optional, default “text”) - Data type: “text”, “json”, “message”

User graph tools

1from zep_autogen import create_search_graph_tool, create_add_graph_data_tool
2
3# Create tools bound to user graph
4search_tool = create_search_graph_tool(zep_client, user_id=user_id)
5add_tool = create_add_graph_data_tool(zep_client, user_id=user_id)
6
7# Agent with user graph tools
8agent = AssistantAgent(
9 name="UserKnowledgeAssistant",
10 model_client=OpenAIChatCompletionClient(model="gpt-4.1-mini"),
11 tools=[search_tool, add_tool],
12 system_message="You can search and add data to the user's knowledge graph.",
13 reflect_on_tool_use=True # Enables tool usage reflection
14)

Knowledge graph tools

1# Create tools bound to knowledge graph
2search_tool = create_search_graph_tool(zep_client, graph_id=graph_id)
3add_tool = create_add_graph_data_tool(zep_client, graph_id=graph_id)
4
5# Agent with knowledge graph tools
6agent = AssistantAgent(
7 name="KnowledgeGraphAssistant",
8 model_client=OpenAIChatCompletionClient(model="gpt-4.1-mini"),
9 tools=[search_tool, add_tool],
10 system_message="You can search and add data to the knowledge graph.",
11 reflect_on_tool_use=True
12)

Query memory

Both memory types support direct querying with different scope parameters.

User memory queries

1# Query user conversation history
2results = await memory.query("What does Alice like?", limit=5)
3
4# Process different result types
5for result in results.results:
6 content = result.content
7 metadata = result.metadata
8
9 if 'edge_name' in metadata:
10 # Fact/relationship result
11 print(f"Fact: {content}")
12 print(f"Relationship: {metadata['edge_name']}")
13 print(f"Valid: {metadata.get('valid_at', 'N/A')} - {metadata.get('invalid_at', 'present')}")
14 elif 'node_name' in metadata:
15 # Entity result
16 print(f"Entity: {metadata['node_name']}")
17 print(f"Summary: {content}")
18 else:
19 # Episode/message result
20 print(f"Message: {content}")
21 print(f"Role: {metadata.get('episode_role', 'unknown')}")
22
23 print(f"Source: {metadata.get('source')}\n")

Graph memory queries

1# Query knowledge graph with scope control
2facts_results = await graph_memory.query(
3 "Python frameworks",
4 limit=10,
5 scope="edges" # "edges" (facts), "nodes" (entities), "episodes" (messages)
6)
7
8print(f"Found {len(facts_results.results)} facts about Python frameworks:")
9for result in facts_results.results:
10 print(f"- {result.content}")
11
12entities_results = await graph_memory.query(
13 "programming languages",
14 limit=5,
15 scope="nodes"
16)
17
18print(f"\nFound {len(entities_results.results)} programming language entities:")
19for result in entities_results.results:
20 entity_name = result.metadata.get('node_name', 'Unknown')
21 print(f"- {entity_name}: {result.content}")

Search result structure

1{
2 "content": "fact text",
3 "metadata": {
4 "source": "graph" | "user_graph",
5 "edge_name": "relationship_name",
6 "edge_attributes": {...},
7 "created_at": "timestamp",
8 "valid_at": "timestamp",
9 "invalid_at": "timestamp",
10 "expired_at": "timestamp"
11 }
12}
1{
2 "content": "entity_name:\n entity_summary",
3 "metadata": {
4 "source": "graph" | "user_graph",
5 "node_name": "entity_name",
6 "node_attributes": {...},
7 "created_at": "timestamp"
8 }
9}
1{
2 "content": "episode_content",
3 "metadata": {
4 "source": "graph" | "user_graph",
5 "episode_type": "source_type",
6 "episode_role": "role_type",
7 "episode_name": "role_name",
8 "created_at": "timestamp"
9 }
10}

Memory vs tools comparison

Memory objects (ZepUserMemory / ZepGraphMemory):

  • Automatic context injection via update_context()
  • Attached to the agent’s memory list
  • Transparent operation — happens automatically
  • Better for consistent memory across interactions

Function tools (search/add tools):

  • Manual control — the agent decides when to use them
  • More explicit and observable operations
  • Better for specific search/add operations
  • Works with AutoGen’s tool reflection features
  • Provides structured return values

Note: Both approaches can be combined — use memory for automatic context and tools for explicit operations.

Best practices

  • Pick the right memory type — use ZepUserMemory for per-user conversation history and ZepGraphMemory for a shared knowledge graph
  • Bind tools to exactly one scope — a search or add tool targets either a graph_id or a user_id, never both
  • Combine memory and tools — attach a memory class for automatic context and add function tools for targeted lookups
  • Allow time for indexing — Zep extracts knowledge asynchronously, so facts from a turn are not instantly searchable

Next steps