Bring Your Own LLM (BYOM)
Bring Your Own LLM (BYOM)
Bring Your Own LLM (BYOM)
Enterprise Add-on. Contact sales to enable BYOM for your account.
Bring Your Own LLM (BYOM) lets you use your own accounts with model providers such as OpenAI, Anthropic, and Google when using Zep Cloud. You keep using Zep’s agent memory, context assembly, and governance controls, routing inference through credentials you manage. This approach ensures:
Zep intentionally sets thinking and reasoning budgets off or low to minimize cost and latency. We recommend using smaller, faster models optimized for speed rather than extended reasoning.
Recommended model: Gemini 2.5 Flash Lite is the most well-tested model with Zep.
Not all larger models support disabling reasoning entirely. If you configure a model that requires reasoning tokens, you may experience higher costs and latency. Smaller models avoid this issue.
Zep only uses text generation endpoints—no embeddings, fine-tuning, file uploads, or assistants. LLM providers are configured at the account level, meaning the same credentials are used for all projects within your account.
Select a provider type from the dropdown and enter your credentials. For providers requiring JSON credentials (Vertex AI, Bedrock), paste the full JSON object.
Enter any provider-specific settings such as endpoint URLs, project IDs, or regions.
Choose a model from the list of verified models for your provider. Mark it as primary or fallback.
When configuring a provider, you can set the following options:
Deployment name must match model ID
Your Azure deployment name must match the model ID exactly. For example, if you’re using gpt-4.1, your deployment name must be gpt-4.1—not a custom name like my-gpt-deployment. A mismatched deployment name causes “model or resource not found” errors.
Use the base endpoint URL
Use the base Azure Endpoint URL, not the Target URI from the deployment page. Using the Target URI causes a “model or resource not found” error.
Correct: https://your-resource-name.openai.azure.com/
Incorrect: https://your-resource-name.openai.azure.com/openai/deployments/.../chat/completions?api-version=...
Find the correct endpoint in the Azure portal under Keys and Endpoint, or in the azure_endpoint value shown in the Python code examples on the deployment page.
Vertex AI uses service account authentication, which differs from API key authentication used by Google Gemini (AI Studio). You’ll need to gather three pieces of information from the Google Cloud Console:
Project ID
?project=your-project-idService Account JSON
roles/aiplatform.user)—this is the only role requiredLocation
Enter your preferred GCP region (e.g., us-central1). If omitted, Zep uses a default region.
We recommend using Vertex AI over Google Gemini (AI Studio) for production workloads. Vertex AI offers better control over rate limits, allows you to increase quotas, and supports purchasing provisioned throughput if needed.
Does Zep store our provider keys in its databases? No. Credentials are stored in an encrypted secrets manager (AWS SSM Parameter Store). Values are decrypted in memory only when needed and are never written to Zep databases or logs.
Can we use different vendors or models per project? Yes. Each project maintains its own provider configuration, including defaults and fallbacks. This is useful for isolating production from staging or testing providers side by side.
Can we prevent vendors from training on our data? Yes. Use the vendor endpoints and contractual controls that disable data retention or training. Zep routes requests accordingly and sets the necessary flags in each call.
How is usage billed? You receive invoices from Zep for Zep services only. LLM inference charges come directly from your vendors under your existing contract and pricing.
What happens if a key is compromised or needs rotation? Add a new credential in the dashboard and verify it. Then disable or delete the previous credential. Requests start using the new credential immediately with no downtime required.
How does BYOM affect observability? Requests are tagged by project and provider, so you can attribute usage and costs. Rate limits are applied per provider to protect budgets and enforce quotas.
Can we use a customer-managed KMS key? Contact support if you require customer-controlled encryption for credential storage.