ElevenLabs Agents

Add persistent context to ElevenLabs voice agents using a custom LLM proxy.

A complete working example is available on GitHub: elevenlabs-zep-example

ElevenLabs Agents is a platform for building intelligent voice agents. This guide shows how to integrate Zep with ElevenLabs using a custom LLM proxy.

Why use a proxy instead of tools

ElevenLabs supports custom tools, but using tools for context retrieval has problems:

  • Latency — Tool calls add round-trips where the LLM decides whether to call the tool. For voice agents, this delay is noticeable.
  • Unreliable — The LLM may skip retrieval when it shouldn’t, or call it unnecessarily.

A proxy solves both problems. Context retrieval happens transparently on every request, without LLM involvement.

Architecture

┌────────┐ ┌──────────┐ ┌─────────┐ ┌────────┐
│Frontend│ ◄───► │ElevenLabs│ ◄───► │LLM Proxy│ ◄───► │ OpenAI │
└────────┘ └──────────┘ └─────────┘ └────────┘
┌─────┐
│ Zep │
└─────┘

The proxy sits between ElevenLabs and your LLM. On every request it:

  1. Adds the user message to Zep and retrieves context in one call
  2. Injects context into the system prompt
  3. Forwards to the LLM and streams the response back
  4. Persists the assistant response to Zep

Implementation

The proxy endpoint

The proxy exposes an OpenAI-compatible /v1/chat/completions endpoint:

1@app.post("/v1/chat/completions")
2async def chat_completions(request: Request):
3 body = await request.json()
4
5 # ElevenLabs puts customLlmExtraBody in "elevenlabs_extra_body"
6 extra = body.get("elevenlabs_extra_body", {})
7 user_id = extra.get("user_id")
8 conversation_id = extra.get("conversation_id")
9
10 # Add user message to Zep and get context in one call
11 user_message = get_latest_user_message(body["messages"])
12 response = await zep.thread.add_messages(
13 thread_id=conversation_id,
14 messages=[Message(role="user", content=user_message)],
15 return_context=True # Returns context without separate call
16 )
17
18 # Inject context into system prompt
19 messages = inject_context(body["messages"], response.context)
20
21 # Stream response from LLM
22 return StreamingResponse(
23 stream_and_persist(messages, conversation_id)
24 )

The key optimization is return_context=True, which retrieves context in the same call as adding the message.

Frontend integration

Your frontend passes user identity via customLlmExtraBody:

1await conversation.startSession({
2 agentId: 'your-agent-id',
3 customLlmExtraBody: {
4 user_id: user.id,
5 conversation_id: crypto.randomUUID(),
6 },
7});

ElevenLabs configuration

  1. In your agent’s LLM section, select Custom LLM and set the server URL to your proxy
  2. Add an Authorization header for authentication
  3. In Security > Overrides, enable Custom LLM extra body (required for the proxy to receive user identity)

Production considerations

  • User identity — Use your auth system’s user ID, not random IDs
  • User metadata — Create users in Zep during registration with first_name, last_name, email for better personalization
  • Cache warming — Call zep.user.warm(user_id) when users arrive on your page to pre-fetch their data
  • Proxy location — Embed the endpoint in your existing backend for direct access to user data, or deploy as a standalone service

Learn more