Rate limits
The Zep API enforces rate limits on incoming requests to ensure consistent performance and reliability across all accounts. Rate limits are measured in requests per minute (RPM) and applied per account.
The exact RPM limit for your account depends on your plan. See the Zep pricing page for details.
Rate limit headers
Every response from the Zep API includes headers that describe your current rate limit state. Inspect these headers to monitor usage and pace your client before you hit the limit.
Reading rate limit headers from the SDK
The Zep SDKs do not return response headers from a normal method call. To read headers, use the SDK’s raw response accessor, which returns both the parsed response data and the raw HTTP response.
Handling 429 responses
When you exceed your rate limit, the Zep API returns HTTP 429 Too Many Requests. The SDK surfaces this as a typed error whose response headers include Retry-After, indicating how many seconds to wait before retrying.
Catch the error, read Retry-After, wait, and retry.
For best results, combine Retry-After with exponential backoff and jitter to avoid synchronized retries when many clients are throttled at the same time.
Pacing requests proactively
To avoid hitting 429 responses in the first place, use X-RateLimit-Remaining and X-RateLimit-Reset to pace your requests:
- If
X-RateLimit-Remainingis approaching zero, slow your request rate or pause until the window resets. - The current window ends at the Unix timestamp in
X-RateLimit-Reset. After this time, a fresh allowance is available.
This is particularly useful for bulk operations, such as batch ingestion, where you control the cadence of outgoing requests.