ElevenLabs Agents
ElevenLabs Agents is a platform for building intelligent voice and chat agents that can talk, type, and take action. You can create agents directly in the browser using the ElevenLabs dashboard, then deploy them across phone, web, and mobile applications.
This guide shows how to integrate Zep with ElevenLabs Agents using a custom LLM proxy pattern.
Why not use tools for context retrieval
ElevenLabs Agents support custom tools, but implementing Zep retrieval as a tool call is not recommended for two reasons:
Latency — Tool calls add a round-trip where the LLM must decide to call the tool, execute it, then continue generation. For voice agents where responsiveness is critical, this latency is unacceptable.
Unreliable retrieval — The LLM decides when to call tools. It may forget, skip retrieval when it shouldn’t, or call it unnecessarily. Context retrieval should be deterministic, not left to LLM judgment.
The recommended approach: custom LLM proxy
Instead, use a custom LLM proxy that sits between ElevenLabs and your LLM provider (e.g., OpenAI). This proxy:
- Intercepts every request from ElevenLabs before it reaches the LLM
- Persists the conversation by saving messages to Zep (user messages, assistant responses)
- Retrieves relevant context from Zep’s knowledge graph for the current user
- Injects that context into the system prompt before forwarding to the LLM
- Streams the response back to ElevenLabs
This ensures context operations happen on every turn, deterministically, with minimal added latency (Zep retrieval typically adds 50-150ms).
Architecture overview
The proxy exposes an OpenAI-compatible /v1/chat/completions endpoint that ElevenLabs connects to as a Custom LLM.
Prerequisites
- A Zep Cloud account with an API key
- An OpenAI API key (or another LLM provider)
- An ElevenLabs account with access to the Agents Platform
- Python 3.10+ for the proxy server
Environment setup
Set your API keys as environment variables:
Building the proxy
Configuring ElevenLabs
Deployment considerations
Local development
For testing, you can expose your local proxy using ngrok:
Use the ngrok URL as your custom LLM endpoint in ElevenLabs.
Production deployment
For production, deploy the proxy as a containerized service with proper infrastructure:
- Container orchestration: Docker/Kubernetes behind a load balancer
- Cloud platforms: AWS ECS/EKS, Google Cloud Run, Azure Container Apps
- API gateway: Add authentication, rate limiting, and monitoring
- TLS termination: Handle at the load balancer or API gateway level
- Stable domain: Use a consistent URL, not ephemeral tunnels
Security requirements: The proxy handles sensitive API keys and user data. Always use HTTPS in production, validate all inputs, and never expose internal API keys (OpenAI, Zep) to clients.
How Zep enriches conversations
With this integration, your ElevenLabs agent gains:
Conversation continuity — The agent recalls previous conversations with the same user, even across separate voice sessions.
Personalization — Zep extracts facts and entities from conversations. The agent can reference the user’s preferences, past topics, and stated information naturally.
Deterministic retrieval — Unlike tool-based approaches, every conversation turn retrieves relevant context. The agent always has access to the right context.