ElevenLabs Agents | Zep Documentation

A complete working example is available on GitHub: elevenlabs-zep-example

ElevenLabs Agents is a platform for building intelligent voice agents. This guide shows how to integrate Zep with ElevenLabs using a custom LLM proxy.

Why use a proxy instead of tools

ElevenLabs supports custom tools, but using tools for context retrieval has problems:

Latency — Tool calls add round-trips where the LLM decides whether to call the tool. For voice agents, this delay is noticeable.
Unreliable — The LLM may skip retrieval when it shouldn’t, or call it unnecessarily.

A proxy solves both problems. Context retrieval happens transparently on every request, without LLM involvement.

Architecture

┌────────┐       ┌──────────┐       ┌─────────┐       ┌────────┐
│Frontend│ ◄───► │ElevenLabs│ ◄───► │LLM Proxy│ ◄───► │ OpenAI │
└────────┘       └──────────┘       └─────────┘       └────────┘
                                         ▲
                                         │
                                         ▼
                                      ┌─────┐
                                      │ Zep │
                                      └─────┘

The proxy sits between ElevenLabs and your LLM. On every request it:

Adds the user message to Zep and retrieves context in one call
Injects context into the system prompt
Forwards to the LLM and streams the response back
Persists the assistant response to Zep

Implementation

The proxy endpoint

The proxy exposes an OpenAI-compatible /v1/chat/completions endpoint:

1 @app.post("/v1/chat/completions")
2 async def chat_completions(request: Request):
3     body = await request.json()
4 
5     # ElevenLabs puts customLlmExtraBody in "elevenlabs_extra_body"
6     extra = body.get("elevenlabs_extra_body", {})
7     user_id = extra.get("user_id")
8     conversation_id = extra.get("conversation_id")
9 
10     # Add user message to Zep and get context in one call
11     user_message = get_latest_user_message(body["messages"])
12     response = await zep.thread.add_messages(
13         thread_id=conversation_id,
14         messages=[Message(role="user", content=user_message)],
15         return_context=True  # Returns context without separate call
16     )
17 
18     # Inject context into system prompt
19     messages = inject_context(body["messages"], response.context)
20 
21     # Stream response from LLM
22     return StreamingResponse(
23         stream_and_persist(messages, conversation_id)
24     )

The key optimization is return_context=True, which retrieves context in the same call as adding the message.

Frontend integration

Your frontend passes user identity via customLlmExtraBody:

1 await conversation.startSession({
2   agentId: 'your-agent-id',
3   customLlmExtraBody: {
4     user_id: user.id,
5     conversation_id: crypto.randomUUID(),
6   },
7 });

ElevenLabs configuration

In your agent’s LLM section, select Custom LLM and set the server URL to your proxy
Add an Authorization header for authentication
In Security > Overrides, enable Custom LLM extra body (required for the proxy to receive user identity)

Production considerations

User identity — Use your auth system’s user ID, not random IDs
User metadata — Create users in Zep during registration with first_name, last_name, email for better personalization
Cache warming — Call zep.user.warm(user_id) when users arrive on your page to pre-fetch their data
Proxy location — Embed the endpoint in your existing backend for direct access to user data, or deploy as a standalone service