LiveKit integration

Add long-term agent memory to LiveKit voice agents

The zep-livekit package adds long-term agent memory to LiveKit voice agents. It wraps LiveKit’s Agent so that completed conversation turns are persisted to Zep and relevant context is injected before each response. Choose between user thread memory or structured knowledge graph memory.

Core benefits

  • Persistent voice memory: Each completed turn is stored in Zep and contributes to the user’s temporal knowledge graph
  • Automatic context injection: Relevant context is retrieved and added as a system message before the agent’s next response
  • Two access patterns: ZepUserAgent for thread-based conversation memory, ZepGraphAgent for direct knowledge graph access
  • Drop-in replacement: Both classes subclass LiveKit’s Agent and accept all standard Agent parameters

How it works

LiveKit’s AgentSession owns the audio pipeline — speech-to-text, voice activity detection, turn detection, and text-to-speech. Zep does not touch audio. Instead, the Zep agent hooks into LiveKit’s turn lifecycle and runs a write-then-read cycle on each completed user turn:

  1. Persist the turn — when LiveKit fires on_user_turn_completed, the user message is written to Zep (a thread for ZepUserAgent, the graph for ZepGraphAgent). Assistant responses are captured separately via the conversation_item_added session event.
  2. Retrieve context — the agent fetches a context block (thread.get_user_context for ZepUserAgent) or runs hybrid graph search across edges, nodes, and episodes (ZepGraphAgent).
  3. Inject context — the retrieved context is added to the turn as a system message, so the LLM’s next response is grounded in prior conversation.

Allow time for indexing: Turns are ingested and knowledge is extracted asynchronously, so facts from the current turn are not searchable within that same turn. Context retrieved on a given turn reflects knowledge extracted from earlier turns.

Installation

$pip install zep-livekit zep-cloud "livekit-agents[openai,silero]>=1.0.0"

Requires LiveKit Agents v1.0+ (not v0.x) and a Zep Cloud API key. The examples use the v1.0 AgentSession API. Get your API key from app.getzep.com.

Set up your environment variables:

$export ZEP_API_KEY="your-zep-api-key"
$export OPENAI_API_KEY="your-openai-api-key"
$export LIVEKIT_URL="your-livekit-url"
$export LIVEKIT_API_KEY="your-livekit-api-key"
$export LIVEKIT_API_SECRET="your-livekit-api-secret"

LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET come from a LiveKit Cloud project or a self-hosted LiveKit server. They configure the LiveKit infrastructure your agent connects to and are unrelated to Zep.

Agent types

Identity and isolation

The example below derives a stable user_id from your application’s auth system and scopes the thread_id (and graph_id) to the LiveKit room. Use a stable, durable user ID — do not derive user_id from the room name. A room is a per-session construct, so a room-derived user_id fragments a returning user’s history across rooms and prevents Zep from accumulating long-term memory for that person.

Scope thread_id or graph_id to the room when you want per-session isolation while still attributing every session to the same long-lived user.

1# Stable identity from your auth system — survives across sessions
2user_id = authenticated_user_id
3
4# Room/session scopes the thread (or graph), not the user
5thread_id = f"thread_{ctx.room.name}"
6graph_id = f"graph_{ctx.room.name}"

User memory agent

ZepUserAgent stores each turn in a Zep thread and injects a context block before the next response.

Python
1import logging
2import os
3
4from livekit import agents
5from livekit.agents import AutoSubscribe
6from livekit.plugins import openai, silero
7from zep_cloud.client import AsyncZep
8from zep_livekit import ZepUserAgent
9
10
11async def entrypoint(ctx: agents.JobContext):
12 zep_client = AsyncZep(api_key=os.environ.get("ZEP_API_KEY"))
13
14 # Stable user identity from your auth system; thread scoped to the room
15 user_id = ctx.job.metadata or "user-123"
16 thread_id = f"thread_{ctx.room.name}"
17
18 # Ensure the user exists, then create the room-scoped thread
19 try:
20 await zep_client.user.get(user_id=user_id)
21 except Exception:
22 await zep_client.user.add(user_id=user_id, first_name="Alice")
23
24 await zep_client.thread.create(thread_id=thread_id, user_id=user_id)
25
26 # Subscribe to audio only — a voice agent has no use for video tracks
27 await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
28
29 # AgentSession owns the audio pipeline (STT, VAD, turn detection, TTS)
30 session = agents.AgentSession(
31 stt=openai.STT(),
32 llm=openai.LLM(model="gpt-4o-mini"),
33 tts=openai.TTS(),
34 vad=silero.VAD.load(),
35 )
36
37 # Drop-in Agent replacement that adds Zep memory
38 agent = ZepUserAgent(
39 zep_client=zep_client,
40 user_id=user_id,
41 thread_id=thread_id,
42 user_message_name="Alice",
43 assistant_message_name="Assistant",
44 instructions="You are a helpful voice assistant with long-term memory. "
45 "Reference details from previous conversations naturally.",
46 )
47
48 await session.start(agent=agent, room=ctx.room)
49 logging.info("Voice assistant with Zep memory is running")
50
51
52if __name__ == "__main__":
53 agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Automatic memory integration: ZepUserAgent captures each voice turn and injects relevant context from previous conversations, enabling continuity across sessions without manual memory management.

ZepUserAgent configuration

ZepUserAgent accepts the following parameters in addition to all standard LiveKit Agent parameters (stt, llm, tts, instructions, tools, chat_ctx, etc.):

ParameterDescription
zep_clientInitialized AsyncZep client
user_idUser identifier for memory isolation (use a stable ID)
thread_idThread identifier for conversation continuity
user_message_nameOptional name attributed to user messages in Zep
assistant_message_nameOptional name attributed to assistant messages in Zep

The context_mode parameter is deprecated and ignored; the Zep V3 context block returns a structured format and no longer accepts a mode selector.

Knowledge graph agent

ZepGraphAgent writes each turn directly to a knowledge graph and retrieves context with hybrid search over edges (facts), nodes (entities), and episodes. You can optionally shape the graph with custom entity models.

Python
1import os
2
3from livekit import agents
4from livekit.agents import AutoSubscribe
5from livekit.plugins import openai, silero
6from pydantic import Field
7from zep_cloud.client import AsyncZep
8from zep_cloud.external_clients.ontology import EntityModel, EntityText
9from zep_livekit import ZepGraphAgent
10
11
12class Person(EntityModel):
13 """A person entity for voice interactions."""
14
15 role: EntityText = Field(description="person's role or profession", default=None)
16 interests: EntityText = Field(description="topics the person is interested in", default=None)
17
18
19class Topic(EntityModel):
20 """A conversation topic or subject."""
21
22 category: EntityText = Field(description="category of the topic", default=None)
23 importance: EntityText = Field(description="importance to the user", default=None)
24
25
26async def entrypoint(ctx: agents.JobContext):
27 zep_client = AsyncZep(api_key=os.environ.get("ZEP_API_KEY"))
28
29 # Optional: define a custom ontology for structured extraction
30 await zep_client.graph.set_ontology(entities={"Person": Person, "Topic": Topic})
31
32 # Room-scoped graph
33 graph_id = f"graph_{ctx.room.name}"
34 try:
35 await zep_client.graph.get(graph_id)
36 except Exception:
37 await zep_client.graph.create(graph_id=graph_id, name="LiveKit Voice Knowledge Graph")
38
39 # Subscribe to audio only — a voice agent has no use for video tracks
40 await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
41
42 session = agents.AgentSession(
43 stt=openai.STT(),
44 llm=openai.LLM(model="gpt-4o-mini"),
45 tts=openai.TTS(),
46 vad=silero.VAD.load(),
47 )
48
49 agent = ZepGraphAgent(
50 zep_client=zep_client,
51 graph_id=graph_id,
52 facts_limit=15, # Max facts (edges) to retrieve
53 entity_limit=8, # Max entities (nodes) to retrieve
54 episode_limit=2, # Max episodes to retrieve
55 search_filters={"node_labels": ["Person"]}, # Constrain to Person entities
56 instructions="You are a knowledgeable voice assistant. Use the provided "
57 "context about entities and facts to give informed responses.",
58 )
59
60 await session.start(agent=agent, room=ctx.room)
61
62
63if __name__ == "__main__":
64 agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Search filters: The search_filters parameter constrains which results the agent retrieves. Use node_labels to filter by entity types defined in your ontology.

Graph memory context: ZepGraphAgent writes each turn to the graph and injects relevant facts, entities, and episodes as context, grounding responses in prior conversations.

ZepGraphAgent configuration

ZepGraphAgent accepts the following parameters in addition to all standard LiveKit Agent parameters:

ParameterDescription
zep_clientInitialized AsyncZep client
graph_idGraph identifier for knowledge storage
user_nameOptional name prefixed to stored messages for attribution
facts_limitMaximum facts (edges) to retrieve (default: 15)
entity_limitMaximum entities (nodes) to retrieve (default: 5)
episode_limitMaximum episodes to retrieve (default: 2)
search_filtersOptional SearchFilters applied to graph search
rerankerOptional reranker for search results (default: "rrf")

Best practices

  • Use a stable user ID — derive user_id from your auth system, not the room name, so a returning user’s memory accumulates instead of fragmenting across sessions
  • Scope sessions with the thread or graph — use the room name for thread_id or graph_id when you want per-session isolation, keeping user_id constant
  • Let LiveKit own audioAgentSession handles STT, VAD, turn detection, and TTS; Zep only persists turns and injects context
  • Allow time for indexing — Zep extracts knowledge asynchronously, so facts from a turn are not instantly searchable

Next steps