Pydantic AI integration

Add long-term agent memory to Pydantic AI agents

Pydantic AI agents using Zep gain long-term memory backed by a temporal knowledge graph. The zep-pydantic-ai package persists each user turn, injects relevant context into the model prompt using Pydantic AI’s native ProcessHistory capability, and adds an on-demand graph-search tool.

Core benefits

  • Native ProcessHistory capability: Uses the current Pydantic AI history-processor hook, not a deprecated kwarg
  • Single round-trip: Persists the user turn and retrieves context in one add_messages call
  • Correct under tool calls: Dedupes per run (keyed by the run ID), so a run that makes tool calls records the turn exactly once
  • On-demand graph search: A model-callable tool over graph.search for explicit lookups
  • Lazy resource creation: The Zep user and thread are created on first use
  • Graceful degradation: A Zep failure is logged but never crashes the agent run

How it works

The integration plugs into Pydantic AI through three components:

  • ZepDeps — a dataclass used as the agent’s deps_type. It carries the Zep client and the user/thread identity. Construct one per conversation and pass it to agent.run(..., deps=deps); both the history processor and the search tool reach it through RunContext.deps.
  • zep_history_processor — registered via capabilities=[ProcessHistory(zep_history_processor)]. Pydantic AI runs it before every model request: on the user’s turn it persists the latest message via thread.add_messages(return_context=True) and prepends Zep’s context block as a system message. Because ProcessHistory fires once per model request (not once per run), the processor dedupes per run, keyed by the run ID. It persists and retrieves only on the first request of a run and replays the cached context on later requests within that same run, so tool-calling runs never create duplicate episodes.
  • create_zep_search_tool — a factory returning a model-callable tool over graph.search. The model decides when to search the knowledge graph; search parameters are pinned at construction.

Call persist_run after agent.run to persist the assistant’s reply. The history processor runs before each model request, so the assistant’s reply does not exist yet when it fires; persist_run writes that reply once the run completes. Only assistant text is sent, so Zep records one clean assistant message per turn.

Installation

$pip install zep-pydantic-ai

Requires Python 3.11+, pydantic-ai>=1.107,<2, and a Zep Cloud API key. Get your API key from app.getzep.com.

Set up your environment variables:

$export ZEP_API_KEY="your-zep-api-key"
$export OPENAI_API_KEY="your-openai-api-key"

Usage

Register the history processor and search tool when building the agent, then pass ZepDeps to each run:

Python
1import asyncio
2from pydantic_ai import Agent
3from pydantic_ai.capabilities import ProcessHistory
4from zep_cloud.client import AsyncZep
5from zep_pydantic_ai import (
6 ZepDeps,
7 zep_history_processor,
8 create_zep_search_tool,
9 persist_run,
10)
11
12zep = AsyncZep(api_key="your-zep-api-key")
13
14agent = Agent(
15 "openai:gpt-4o-mini",
16 deps_type=ZepDeps,
17 capabilities=[ProcessHistory(zep_history_processor)],
18 tools=[create_zep_search_tool()],
19 instructions="You are a helpful assistant with long-term memory.",
20)
21
22async def main() -> None:
23 deps = ZepDeps(
24 client=zep,
25 user_id="user_123",
26 thread_id="thread_abc",
27 first_name="Jane",
28 last_name="Smith",
29 )
30 result = await agent.run("What did I tell you about my project?", deps=deps)
31 print(result.output)
32 # Persist the assistant's reply (the user turn was already persisted).
33 await persist_run(deps, result.new_messages())
34
35asyncio.run(main())

Beyond the automatic context injection, create_zep_search_tool() adds a model-callable tool over graph.search. The model decides when to look up specific facts, entities, or prior episodes; it supplies only the query, while scope, reranker, and limit are pinned at construction. The tool returns a formatted text summary of the matching results. By default it searches the current user’s graph; pass graph_id=... to target a shared standalone graph.

Memory vs tools

The integration combines two retrieval paths on the same agent:

PathHowWhen it fires
Automatic injectionProcessHistory(zep_history_processor)Before every model request — prepends the context block
On-demand searchcreate_zep_search_tool()When the model chooses to call it for a specific lookup

Injection grounds each turn with cross-session context; the search tool lets the model actively dig for specific details.

Configuration options

ZepDeps

FieldTypeRequiredDefaultDescription
clientAsyncZepYesInitialized Zep async client (caller owns its lifecycle)
user_idstrYesZep user ID (one user graph)
thread_idstrYesZep thread ID for the conversation
first_namestrNoNoneUser first name (recommended; anchors the user node)
last_namestrNoNoneUser last name
emailstrNoNoneUser email (helps identity resolution)
user_namestrNoNoneDisplay name for persisted user messages (defaults to first + last)
assistant_namestrNo"Assistant"Display name for persisted assistant messages
ignore_roleslist[str]NoNoneRoles to exclude from graph ingestion

create_zep_search_tool

ParameterTypeDefaultDescription
graph_idstrNoneStandalone graph to search; when unset, searches the current user’s graph
scopestr"edges"What to search: edges, nodes, episodes, observations, thread_summaries, or auto
rerankerstr"rrf"Result ordering (ignored for scope="auto")
limitint10Maximum results (clamped to Zep’s ceiling of 50)
namestr"zep_search"Tool name exposed to the model

Best practices

  • Construct one ZepDeps per conversation and reuse a single AsyncZep client across runs
  • Pass real names so Zep can anchor the user’s identity node in the graph
  • Always call persist_run after a run so the assistant’s reply reaches the graph
  • Allow time for indexing — Zep extracts knowledge asynchronously, so facts from a turn are not instantly searchable

Next steps