ElevenLabs Agents

Add persistent context to ElevenLabs Agents using a custom LLM proxy.

ElevenLabs Agents is a platform for building intelligent voice and chat agents that can talk, type, and take action. You can create agents directly in the browser using the ElevenLabs dashboard, then deploy them across phone, web, and mobile applications.

This guide shows how to integrate Zep with ElevenLabs Agents using a custom LLM proxy pattern.

Why not use tools for context retrieval

ElevenLabs Agents support custom tools, but implementing Zep retrieval as a tool call is not recommended for two reasons:

Latency — Tool calls add a round-trip where the LLM must decide to call the tool, execute it, then continue generation. For voice agents where responsiveness is critical, this latency is unacceptable.

Unreliable retrieval — The LLM decides when to call tools. It may forget, skip retrieval when it shouldn’t, or call it unnecessarily. Context retrieval should be deterministic, not left to LLM judgment.

Instead, use a custom LLM proxy that sits between ElevenLabs and your LLM provider (e.g., OpenAI). This proxy:

  1. Intercepts every request from ElevenLabs before it reaches the LLM
  2. Persists the conversation by saving messages to Zep (user messages, assistant responses)
  3. Retrieves relevant context from Zep’s knowledge graph for the current user
  4. Injects that context into the system prompt before forwarding to the LLM
  5. Streams the response back to ElevenLabs

This ensures context operations happen on every turn, deterministically, with minimal added latency (Zep retrieval typically adds 50-150ms).

Architecture overview

ElevenLabs Agent → Custom LLM Proxy → OpenAI (or other LLM)
↓↑
Zep
(persist + retrieve)

The proxy exposes an OpenAI-compatible /v1/chat/completions endpoint that ElevenLabs connects to as a Custom LLM.

Prerequisites

  • A Zep Cloud account with an API key
  • An OpenAI API key (or another LLM provider)
  • An ElevenLabs account with access to the Agents Platform
  • Python 3.10+ for the proxy server

Environment setup

Set your API keys as environment variables:

$export ZEP_API_KEY="your_zep_api_key"
>export OPENAI_API_KEY="your_openai_api_key"
>export PROXY_API_KEY="your_custom_proxy_api_key" # For authenticating requests to your proxy

Building the proxy

1

Step 1: Install dependencies

$pip install fastapi uvicorn openai zep-cloud
2

Step 2: Create the proxy server

The proxy intercepts chat completion requests, retrieves context from Zep, and forwards enriched requests to OpenAI.

1import os
2import json
3from typing import AsyncGenerator
4from fastapi import FastAPI, Request, HTTPException, Header
5from fastapi.responses import StreamingResponse
6from openai import AsyncOpenAI
7from zep_cloud.client import AsyncZep
8
9app = FastAPI()
10
11# Initialize clients
12zep = AsyncZep(api_key=os.environ["ZEP_API_KEY"])
13openai_client = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
14
15# Proxy authentication
16PROXY_API_KEY = os.environ["PROXY_API_KEY"]
17
18
19def extract_user_id(request_body: dict) -> str | None:
20 """Extract user_id from ElevenLabs custom_llm_extra_body."""
21 extra_body = request_body.get("extra_body", {})
22 return extra_body.get("user_id")
23
24
25async def get_zep_context(user_id: str, thread_id: str) -> str:
26 """Retrieve relevant context from Zep for the current user."""
27 try:
28 context = await zep.thread.get_user_context(thread_id=thread_id)
29 return context.context or ""
30 except Exception:
31 return ""
32
33
34async def ensure_user_and_thread(user_id: str, thread_id: str):
35 """Create user and thread if they don't exist."""
36 try:
37 await zep.user.add(user_id=user_id)
38 except Exception:
39 pass # User already exists
40
41 try:
42 await zep.thread.create(thread_id=thread_id, user_id=user_id)
43 except Exception:
44 pass # Thread already exists
45
46
47async def save_messages_to_zep(thread_id: str, user_message: str, assistant_message: str):
48 """Persist the conversation turn to Zep."""
49 from zep_cloud import Message
50
51 messages = [
52 Message(role="user", content=user_message),
53 Message(role="assistant", content=assistant_message),
54 ]
55 await zep.thread.add_messages(thread_id=thread_id, messages=messages)
3

Step 3: Implement the chat completions endpoint

1@app.post("/v1/chat/completions")
2async def chat_completions(
3 request: Request,
4 authorization: str = Header(None),
5):
6 # Validate API key
7 if not authorization or not authorization.startswith("Bearer "):
8 raise HTTPException(status_code=401, detail="Missing or invalid authorization")
9
10 provided_key = authorization.replace("Bearer ", "")
11 if provided_key != PROXY_API_KEY:
12 raise HTTPException(status_code=401, detail="Invalid API key")
13
14 body = await request.json()
15
16 # Extract user identity from ElevenLabs extra_body
17 user_id = extract_user_id(body)
18 if not user_id:
19 raise HTTPException(status_code=400, detail="user_id required in extra_body")
20
21 thread_id = f"elevenlabs_{user_id}"
22
23 # Ensure user and thread exist in Zep
24 await ensure_user_and_thread(user_id, thread_id)
25
26 # Get relevant context from Zep
27 zep_context = await get_zep_context(user_id, thread_id)
28
29 # Inject context into system prompt
30 messages = body.get("messages", [])
31 if messages and messages[0].get("role") == "system":
32 original_system = messages[0].get("content", "")
33 messages[0]["content"] = f"{original_system}\n\n## Relevant Context\n{zep_context}"
34 elif zep_context:
35 messages.insert(0, {
36 "role": "system",
37 "content": f"## Relevant Context\n{zep_context}"
38 })
39
40 # Extract the latest user message for persistence
41 latest_user_message = ""
42 for msg in reversed(messages):
43 if msg.get("role") == "user":
44 latest_user_message = msg.get("content", "")
45 break
46
47 # Forward to OpenAI with streaming
48 model = body.get("model", "gpt-4o-mini")
49 stream = body.get("stream", True)
50
51 if stream:
52 return StreamingResponse(
53 stream_response(messages, model, thread_id, latest_user_message),
54 media_type="text/event-stream",
55 )
56 else:
57 response = await openai_client.chat.completions.create(
58 model=model,
59 messages=messages,
60 )
61
62 # Save conversation to Zep
63 assistant_message = response.choices[0].message.content or ""
64 await save_messages_to_zep(thread_id, latest_user_message, assistant_message)
65
66 return response.model_dump()
4

Step 4: Implement streaming with message persistence

1async def stream_response(
2 messages: list,
3 model: str,
4 thread_id: str,
5 user_message: str,
6) -> AsyncGenerator[str, None]:
7 """Stream the response and persist the complete message to Zep."""
8 full_response = ""
9
10 stream = await openai_client.chat.completions.create(
11 model=model,
12 messages=messages,
13 stream=True,
14 )
15
16 async for chunk in stream:
17 # Accumulate the response
18 if chunk.choices[0].delta.content:
19 full_response += chunk.choices[0].delta.content
20
21 # Forward chunk to ElevenLabs
22 yield f"data: {json.dumps(chunk.model_dump())}\n\n"
23
24 yield "data: [DONE]\n\n"
25
26 # Save complete conversation turn to Zep
27 if user_message and full_response:
28 await save_messages_to_zep(thread_id, user_message, full_response)
5

Step 5: Run the proxy server

1if __name__ == "__main__":
2 import uvicorn
3 uvicorn.run(app, host="0.0.0.0", port=8000)

Save this as proxy.py and run:

$python proxy.py

Configuring ElevenLabs

1

Step 1: Navigate to agent settings

In the ElevenLabs dashboard, open your agent and go to the settings.

2

Step 2: Configure custom LLM

  1. In the LLM section, select Custom LLM as the provider
  2. Enter your proxy URL: https://your-proxy.example.com/v1/chat/completions
  3. Add the authorization header with your proxy API key
  4. Enable Custom LLM extra body to allow passing user identity
3

Step 3: Pass user identity

When embedding the ElevenLabs widget or using the SDK, pass the user_id in the custom LLM extra body:

1// JavaScript SDK example
2const conversation = await ElevenLabs.Conversation.startSession({
3 agentId: "your-agent-id",
4 customLlmExtraBody: {
5 user_id: "user_123" // Your application's user identifier
6 }
7});

For the widget embed:

1<elevenlabs-convai
2 agent-id="your-agent-id"
3 custom-llm-extra-body='{"user_id": "user_123"}'>
4</elevenlabs-convai>

Deployment considerations

Local development

For testing, you can expose your local proxy using ngrok:

$ngrok http 8000

Use the ngrok URL as your custom LLM endpoint in ElevenLabs.

Production deployment

For production, deploy the proxy as a containerized service with proper infrastructure:

  • Container orchestration: Docker/Kubernetes behind a load balancer
  • Cloud platforms: AWS ECS/EKS, Google Cloud Run, Azure Container Apps
  • API gateway: Add authentication, rate limiting, and monitoring
  • TLS termination: Handle at the load balancer or API gateway level
  • Stable domain: Use a consistent URL, not ephemeral tunnels

Security requirements: The proxy handles sensitive API keys and user data. Always use HTTPS in production, validate all inputs, and never expose internal API keys (OpenAI, Zep) to clients.

How Zep enriches conversations

With this integration, your ElevenLabs agent gains:

Conversation continuity — The agent recalls previous conversations with the same user, even across separate voice sessions.

Personalization — Zep extracts facts and entities from conversations. The agent can reference the user’s preferences, past topics, and stated information naturally.

Deterministic retrieval — Unlike tool-based approaches, every conversation turn retrieves relevant context. The agent always has access to the right context.

Example conversation flow

Turn 1 (Day 1):
User: "I'm planning a trip to Japan next month."
Agent: "That sounds exciting! What cities are you thinking of visiting?"
Turn 2 (Day 3):
User: "Hey, can you help me with some travel questions?"
Agent: "Of course! Are these questions about your upcoming Japan trip?"
↑ Agent uses context from Zep

Learn more