LiveKit integration | Zep Documentation

The zep-livekit package adds long-term agent memory to LiveKit voice agents. It wraps LiveKit’s Agent so that completed conversation turns are persisted to Zep and relevant context is injected before each response. Choose between user thread memory or structured knowledge graph memory.

Core benefits

Persistent voice memory: Each completed turn is stored in Zep and contributes to the user’s temporal knowledge graph
Automatic context injection: Relevant context is retrieved and added as a system message before the agent’s next response
Two access patterns: ZepUserAgent for thread-based conversation memory, ZepGraphAgent for direct knowledge graph access
Drop-in replacement: Both classes subclass LiveKit’s Agent and accept all standard Agent parameters

How it works

LiveKit’s AgentSession owns the audio pipeline — speech-to-text, voice activity detection, turn detection, and text-to-speech. Zep does not touch audio. Instead, the Zep agent hooks into LiveKit’s turn lifecycle and runs a write-then-read cycle on each completed user turn:

Persist the turn — when LiveKit fires on_user_turn_completed, the user message is written to Zep (a thread for ZepUserAgent, the graph for ZepGraphAgent). Assistant responses are captured separately via the conversation_item_added session event.
Retrieve context — ZepUserAgent folds persistence and retrieval into a single thread.add_messages(..., return_context=True) round-trip; ZepGraphAgent writes the message to the graph, then runs hybrid search across edges, nodes, and episodes.
Inject context — the retrieved context is wrapped in a context template and added to the turn as a system message, so the LLM’s next response is grounded in prior conversation.

Allow time for indexing: Turns are ingested and knowledge is extracted asynchronously, so facts from the current turn are not searchable within that same turn. Context retrieved on a given turn reflects knowledge extracted from earlier turns.

Installation

$ pip install zep-livekit zep-cloud "livekit-agents[openai,silero]>=1.0.0"

Requires Python 3.11+, zep-livekit>=0.2.0, LiveKit Agents v1.0+ (not v0.x), and zep-cloud>=3.23.0, plus a Zep Cloud API key. The examples use the v1.0 AgentSession API. Get your API key from app.getzep.com.

Set up your environment variables:

$ export ZEP_API_KEY="your-zep-api-key"
$ export OPENAI_API_KEY="your-openai-api-key"
$ export LIVEKIT_URL="your-livekit-url"
$ export LIVEKIT_API_KEY="your-livekit-api-key"
$ export LIVEKIT_API_SECRET="your-livekit-api-secret"

LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET come from a LiveKit Cloud project or a self-hosted LiveKit server. They configure the LiveKit infrastructure your agent connects to and are unrelated to Zep.

Upgrading from zep-livekit 0.1.x

One breaking change affects existing code: both agents wrap injected context in the shared DEFAULT_CONTEXT_TEMPLATE (<ZEP_CONTEXT>...</ZEP_CONTEXT>) rather than the per-agent “Relevant user context:” and “Relevant knowledge from memory:” prefixes. Pass context_template to override the wrapper.

See the package changelog for the full list of changes.

Agent types

ZepUserAgent: Uses user threads for conversation memory with automatic context injection
ZepGraphAgent: Reads and writes a knowledge graph, optionally shaped by custom entity models

Identity and isolation

The example below derives a stable user_id from your application’s auth system and scopes the thread_id (and graph_id) to the LiveKit room. Use a stable, durable user ID — do not derive user_id from the room name. A room is a per-session construct, so a room-derived user_id fragments a returning user’s history across rooms and prevents Zep from accumulating long-term memory for that person.

Scope thread_id or graph_id to the room when you want per-session isolation while still attributing every session to the same long-lived user.

1 # Stable identity from your auth system — survives across sessions
2 user_id = authenticated_user_id
3 
4 # Room/session scopes the thread (or graph), not the user
5 thread_id = f"thread_{ctx.room.name}"
6 graph_id = f"graph_{ctx.room.name}"

Provisioning users and threads

ensure_user and ensure_thread are idempotent create-then-catch-conflict helpers. Both return True when the resource is newly created and False when it already exists; genuine failures (auth, network, 5xx) raise. Call them before the first turn — for example during account or session onboarding — so misconfiguration surfaces loudly.

The optional on_created hook fires exactly once, only when the user is genuinely new — use it to seed initial facts, set custom instructions, or configure an ontology.

Python

1 from zep_livekit import ensure_user, ensure_thread
2 
3 async def seed_new_user(zep_client, user_id: str) -> None:
4     """Runs exactly once, right after the user is first created."""
5     ...
6 
7 created = await ensure_user(
8     zep_client,
9     user_id="user_123",
10     first_name="Alice",
11     on_created=seed_new_user,  # fires only for a genuinely new user
12 )
13 await ensure_thread(zep_client, thread_id="conversation_456", user_id="user_123")

ZepUserAgent also accepts first_name, last_name, email, and on_created directly and lazily calls the same helpers on the first turn, cached per agent instance. The lazy path logs and swallows failures rather than raising into the voice session — convenient for prototyping, but prefer the explicit helpers when provisioning failures need to surface.

ZepGraphAgent does not accept on_created: it is scoped to a standalone graph_id, not a Zep user, so there is no “user created” event to hook into. Passing it raises TypeError.

User memory agent

ZepUserAgent stores each turn in a Zep thread and injects a context block before the next response.

Python

1 import logging
2 import os
3 
4 from livekit import agents
5 from livekit.agents import AutoSubscribe
6 from livekit.plugins import openai, silero
7 from zep_cloud.client import AsyncZep
8 from zep_livekit import ZepUserAgent, ensure_thread, ensure_user
9 
10 
11 async def entrypoint(ctx: agents.JobContext):
12     zep_client = AsyncZep(api_key=os.environ.get("ZEP_API_KEY"))
13 
14     # Stable user identity from your auth system; thread scoped to the room
15     user_id = ctx.job.metadata or "user-123"
16     thread_id = f"thread_{ctx.room.name}"
17 
18     # Provision the user and room-scoped thread (idempotent; genuine failures raise)
19     await ensure_user(zep_client, user_id=user_id, first_name="Alice")
20     await ensure_thread(zep_client, thread_id=thread_id, user_id=user_id)
21 
22     # Subscribe to audio only — a voice agent has no use for video tracks
23     await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
24 
25     # AgentSession owns the audio pipeline (STT, VAD, turn detection, TTS)
26     session = agents.AgentSession(
27         stt=openai.STT(),
28         llm=openai.LLM(model="gpt-5-mini"),
29         tts=openai.TTS(),
30         vad=silero.VAD.load(),
31     )
32 
33     # Drop-in Agent replacement that adds Zep memory
34     agent = ZepUserAgent(
35         zep_client=zep_client,
36         user_id=user_id,
37         thread_id=thread_id,
38         user_message_name="Alice",
39         assistant_message_name="Assistant",
40         instructions="You are a helpful voice assistant with long-term memory. "
41         "Reference details from previous conversations naturally.",
42     )
43 
44     await session.start(agent=agent, room=ctx.room)
45     logging.info("Voice assistant with Zep memory is running")
46 
47 
48 if __name__ == "__main__":
49     agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Automatic memory integration: ZepUserAgent captures each voice turn and injects relevant context from previous conversations, enabling continuity across sessions without manual memory management.

ZepUserAgent configuration

ZepUserAgent accepts the following parameters in addition to all standard LiveKit Agent parameters (stt, llm, tts, instructions, tools, chat_ctx, etc.):

Parameter	Description
`zep_client`	Initialized `AsyncZep` client
`user_id`	User identifier for memory isolation (use a stable ID)
`thread_id`	Thread identifier for conversation continuity
`user_message_name`	Optional name attributed to user messages in Zep
`assistant_message_name`	Optional name attributed to assistant messages in Zep
`first_name` / `last_name` / `email`	Optional identity fields applied during lazy provisioning
`on_created`	Hook fired once when the Zep user is newly created on the lazy path
`context_builder`	Async callable replacing the built-in retrieval — see customizing retrieved context
`context_template`	Template wrapping injected context (default: `DEFAULT_CONTEXT_TEMPLATE`)

The context_mode parameter is deprecated and ignored; the Zep V3 context block returns a structured format and no longer accepts a mode selector.

Knowledge graph agent

ZepGraphAgent writes each turn directly to a knowledge graph and retrieves context with hybrid search over edges (facts), nodes (entities), and episodes. You can optionally shape the graph with custom entity models.

Python

1 import os
2 
3 from livekit import agents
4 from livekit.agents import AutoSubscribe
5 from livekit.plugins import openai, silero
6 from pydantic import Field
7 from zep_cloud import SearchFilters
8 from zep_cloud.client import AsyncZep
9 from zep_cloud.external_clients.ontology import EntityModel, EntityText
10 from zep_livekit import ZepGraphAgent
11 
12 
13 class Person(EntityModel):
14     """A person entity for voice interactions."""
15 
16     role: EntityText = Field(description="person's role or profession", default=None)
17     interests: EntityText = Field(description="topics the person is interested in", default=None)
18 
19 
20 class Topic(EntityModel):
21     """A conversation topic or subject."""
22 
23     category: EntityText = Field(description="category of the topic", default=None)
24     importance: EntityText = Field(description="importance to the user", default=None)
25 
26 
27 async def entrypoint(ctx: agents.JobContext):
28     zep_client = AsyncZep(api_key=os.environ.get("ZEP_API_KEY"))
29 
30     # Optional: define a custom ontology for structured extraction
31     await zep_client.graph.set_ontology(entities={"Person": Person, "Topic": Topic})
32 
33     # Room-scoped graph
34     graph_id = f"graph_{ctx.room.name}"
35     try:
36         await zep_client.graph.get(graph_id)
37     except Exception:
38         await zep_client.graph.create(graph_id=graph_id, name="LiveKit Voice Knowledge Graph")
39 
40     # Subscribe to audio only — a voice agent has no use for video tracks
41     await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
42 
43     session = agents.AgentSession(
44         stt=openai.STT(),
45         llm=openai.LLM(model="gpt-5-mini"),
46         tts=openai.TTS(),
47         vad=silero.VAD.load(),
48     )
49 
50     agent = ZepGraphAgent(
51         zep_client=zep_client,
52         graph_id=graph_id,
53         facts_limit=15,  # Max facts (edges) to retrieve
54         entity_limit=8,  # Max entities (nodes) to retrieve
55         episode_limit=2,  # Max episodes to retrieve
56         search_filters=SearchFilters(node_labels=["Person"]),  # Constrain to Person entities
57         instructions="You are a knowledgeable voice assistant. Use the provided "
58         "context about entities and facts to give informed responses.",
59     )
60 
61     await session.start(agent=agent, room=ctx.room)
62 
63 
64 if __name__ == "__main__":
65     agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Search filters: The search_filters parameter constrains which results the agent retrieves. Use node_labels to filter by entity types defined in your ontology.

Graph memory context: ZepGraphAgent writes each turn to the graph and injects relevant facts, entities, and episodes as context, grounding responses in prior conversations.

ZepGraphAgent configuration

ZepGraphAgent accepts the following parameters in addition to all standard LiveKit Agent parameters:

Parameter	Description
`zep_client`	Initialized `AsyncZep` client
`graph_id`	Graph identifier for knowledge storage
`user_name`	Optional name prefixed to stored messages for attribution
`facts_limit`	Maximum facts (edges) to retrieve (default: `15`)
`entity_limit`	Maximum entities (nodes) to retrieve (default: `5`)
`episode_limit`	Maximum episodes to retrieve (default: `2`)
`search_filters`	Optional `SearchFilters` applied to graph search
`reranker`	Optional reranker for search results (default: `"rrf"`)
`context_builder`	Async callable replacing the built-in hybrid search — see customizing retrieved context
`context_template`	Template wrapping injected context (default: `DEFAULT_CONTEXT_TEMPLATE`)

ZepGraphAgent has no on_created parameter — it is graph-scoped, with no Zep user to provision. Passing on_created raises TypeError.

Customizing retrieved context

Both agents wrap injected context in DEFAULT_CONTEXT_TEMPLATE — a <ZEP_CONTEXT>...</ZEP_CONTEXT> block shared across Zep integrations — before adding it as a system message. Override with context_template: it must contain a literal {context} placeholder, substituted via plain string replacement (never str.format), so context text containing {, }, or % is always safe to inject.

Python

1 agent = ZepUserAgent(
2     zep_client=zep_client,
3     user_id="user_123",
4     thread_id="conversation_456",
5     context_template="Known facts about the user:\n{context}",
6 )

To replace the retrieval logic itself — a filtered graph search, a different graph, or multi-source context assembly — pass context_builder. On ZepUserAgent the builder is an async callable receiving a frozen ContextInput (zep, user_id, thread_id, user_message, session) and returning the context string, or None to skip injection:

Python

1 from zep_livekit import ContextInput
2 
3 async def my_builder(ctx: ContextInput) -> str | None:
4     results = await ctx.zep.graph.search(
5         user_id=ctx.user_id,
6         query=ctx.user_message,
7         scope="edges",
8     )
9     if not results.edges:
10         return None
11     return "\n".join(edge.fact for edge in results.edges)
12 
13 agent = ZepUserAgent(
14     zep_client=zep_client,
15     user_id="user_123",
16     thread_id="conversation_456",
17     context_builder=my_builder,
18 )

When context_builder is set on ZepUserAgent, message persistence and the builder run concurrently for lower latency, with per-side failure isolation: a builder error is logged and skips injection for that turn but does not stop persistence, and a persistence error is logged but a successful builder result is still injected.

ZepGraphAgent takes the analogous context_builder typed as GraphContextBuilder, receiving a GraphContextInput (zep, graph_id, user_message, session). Setting it fully replaces the built-in hybrid search rather than running concurrently with anything — graph message persistence happens independently, earlier in the turn.

Graph search tool

In addition to the context injected automatically every turn, create_graph_search_tool builds a model-callable LiveKit function tool (via function_tool(raw_schema=...), returning a RawFunctionTool) that lets the agent search a Zep graph on demand. Exactly one of graph_id or user_id is required: graph_id targets a shared standalone graph, user_id targets that user’s personal graph. Register it through the standard tools=[...] parameter:

Python

1 from zep_livekit import ZepUserAgent, create_graph_search_tool
2 
3 search_tool = create_graph_search_tool(zep_client, user_id="user_123")
4 
5 agent = ZepUserAgent(
6     zep_client=zep_client,
7     user_id="user_123",
8     thread_id="conversation_456",
9     tools=[search_tool],
10     instructions="...",
11 )

The tool exposes every graph.search parameter to the model by default:

Parameter	Values	Default
`query`	Natural language search query (required)	—
`scope`	`edges`, `nodes`, `episodes`, `observations`, `thread_summaries`, `auto`	`edges`
`reranker`	`rrf`, `mmr`, `node_distance`, `episode_mentions`, `cross_encoder`	`rrf`
`limit`	Maximum results	`10`
`mmr_lambda`	Diversity/relevance balance for the `mmr` reranker	omitted when unset
`center_node_uuid`	Center node for `node_distance` reranking	omitted when unset

Use pinned_params to fix a parameter to a constant value (hidden from the model, always sent), or hidden_params to hide a parameter without pinning it (Zep’s server-side default applies). search_filters and bfs_origin_node_uuids are constructor-only and never exposed to the model.

Python

1 search_tool = create_graph_search_tool(
2     zep_client,
3     user_id="user_123",
4     pinned_params={"scope": "edges", "limit": 5},
5     hidden_params={"center_node_uuid"},
6 )

Zep failures are caught and returned as an error string to the model — the tool never raises into the voice session.

Size limits

Zep rejects direct thread-message payloads over 4,096 characters; LiveKit agents truncate message content to 4,000 characters before writing it, logging lengths only and never content.
Zep rejects direct graph.add payloads over 10,000 characters; LiveKit graph agents truncate graph payloads to 9,900 characters before calling graph.add.

Best practices

Use a stable user ID — derive user_id from your auth system, not the room name, so a returning user’s memory accumulates instead of fragmenting across sessions
Scope sessions with the thread or graph — use the room name for thread_id or graph_id when you want per-session isolation, keeping user_id constant
Let LiveKit own audio — AgentSession handles STT, VAD, turn detection, and TTS; Zep only persists turns and injects context
Allow time for indexing — Zep extracts knowledge asynchronously, so facts from a turn are not instantly searchable

Next steps

Explore customizing graph structure for advanced knowledge organization
Learn about searching the graph and how to tune search
See code examples for additional patterns