Bring Your Own LLM (BYOM)

Enterprise Add-on. Contact sales to enable BYOM for your account.

Overview

Bring Your Own LLM (BYOM) lets you use your own accounts with model providers such as OpenAI, Anthropic, and Google when using Zep Cloud. You keep using Zep’s orchestration, context, and security controls while routing inference through credentials you manage. This approach ensures:

  • Contract continuity: Apply your negotiated pricing, quotas, and compliance commitments with each LLM vendor.
  • Data governance: Enforce provider-specific policies for data usage, retention, and residency.
  • Operational flexibility: Configure the best vendor or model for each project, including fallbacks for high availability.

Model recommendations

Zep intentionally sets thinking and reasoning budgets off or low to minimize cost and latency. We recommend using smaller, faster models optimized for speed rather than extended reasoning.

Recommended model: Gemini 2.5 Flash Lite is the most well-tested model with Zep.

Not all larger models support disabling reasoning entirely. If you configure a model that requires reasoning tokens, you may experience higher costs and latency. Smaller models avoid this issue.

Supported providers

Zep only uses text generation endpoints—no embeddings, fine-tuning, file uploads, or assistants. LLM providers are configured at the account level, meaning the same credentials are used for all projects within your account.

ProviderCredentialsRequired permissionsAdditional configuration
OpenAIAPI keyChat completions, ResponsesOrganization ID (optional)
Azure OpenAIAPI keyNone (key provides access)Endpoint URL, API version (optional)
Google GeminiAPI keyNone (full access by default)
Google Vertex AIService account JSONVertex AI User (roles/aiplatform.user)GCP project, location (optional)
AnthropicAPI keyNone (full access by default)
AWS Bedrock (Anthropic)IAM role ARNBedrock model access (cross-account AssumeRole)AWS region, external ID (optional)

Google Vertex AI recommended for production: We recommend using Google Vertex AI over Google Gemini (AI Studio) for production workloads. Vertex AI offers better control over rate limits, allows you to increase quotas, and supports purchasing provisioned throughput if needed.

Getting started

2

Add a provider

Select a provider type from the dropdown and enter your credentials. For providers requiring JSON credentials (Vertex AI, Bedrock), paste the full JSON object.

3

Configure provider settings

Enter any provider-specific settings such as endpoint URLs, project IDs, or regions.

4

Select a model

Choose a model from the list of verified models for your provider. Mark it as primary or fallback.

5

Set rate limits

Configure TPM Capacity and TPM Refill/s to control token usage. Optionally add Labels for cost allocation.

6

Save and verify

Click Save & Verify to validate your credentials. Zep makes a test API call to confirm authentication.

Configuration options

When configuring a provider, you can set the following options:

OptionDescriptionDefault
TPM CapacityMaximum tokens per minute allowed. This is your rate limit bucket size.90,000
TPM Refill/sTokens added to your rate limit bucket per second. Controls replenishment rate.1,500
LabelsKey-value tags passed to the LLM provider for cost allocation and tracking.
PrimaryDesignate this provider/model as the default for inference requests.
FallbackUse this provider/model when the primary is unavailable or rate limited.

FAQ

Does Zep store our provider keys in its databases? No. Credentials are stored in an encrypted secrets manager (AWS SSM Parameter Store). Values are decrypted in memory only when needed and are never written to Zep databases or logs.

Can we use different vendors or models per project? Yes. Each project maintains its own provider configuration, including defaults and fallbacks. This is useful for isolating production from staging or testing providers side by side.

Can we prevent vendors from training on our data? Yes. Use the vendor endpoints and contractual controls that disable data retention or training. Zep routes requests accordingly and sets the necessary flags in each call.

How is usage billed? You receive invoices from Zep for Zep services only. LLM inference charges come directly from your vendors under your existing contract and pricing.

What happens if a key is compromised or needs rotation? Add a new credential in the dashboard and verify it. Then disable or delete the previous credential. Requests start using the new credential immediately with no downtime required.

How does BYOM affect observability? Requests are tagged by project and provider, so you can attribute usage and costs. Rate limits are applied per provider to protect budgets and enforce quotas.

Can we use a customer-managed KMS key? Contact support if you require customer-controlled encryption for credential storage.