The Zep API enforces rate limits on incoming requests. Rate limits are measured in requests per minute (RPM) and applied per account.
The exact RPM limit for your account depends on your plan. See the Zep pricing page for details.
Every response from the Zep API includes headers that describe your current rate limit state. Inspect these headers to monitor usage and pace your client before you hit the limit.
The Zep SDKs do not return response headers from a normal method call. To read headers, use the SDK’s raw response accessor, which returns both the parsed response data and the raw HTTP response.
When you exceed your rate limit, the Zep API returns HTTP 429 Too Many Requests. The SDK surfaces this as a typed error whose response headers include Retry-After, indicating how many seconds to wait before retrying.
Catch the error, read Retry-After, wait, and retry.
Combine Retry-After with exponential backoff and jitter to avoid synchronized retries when many clients are throttled at the same time.
To avoid hitting 429 responses in the first place, use X-RateLimit-Remaining and X-RateLimit-Reset to pace your requests:
X-RateLimit-Remaining is approaching zero, slow your request rate or pause until the window resets.X-RateLimit-Reset. After this time, a fresh allowance is available.This is particularly useful for bulk operations, such as batch ingestion, where you control the cadence of outgoing requests.