For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
PlaygroundDiscordStatusDashboardSign Up >
DocumentationSDK ReferenceGraphiti
DocumentationSDK ReferenceGraphiti
      • Rate Limits
LogoLogo
PlaygroundDiscordStatusDashboardSign Up >
On this page
  • Rate limit headers
  • Reading rate limit headers from the SDK
  • Handling 429 responses
  • Pacing requests proactively
Rate Limits

Rate limits

How the Zep API limits request rates and how to handle them
Was this page helpful?
Previous

LangGraph Memory Example

LangGraph is a library created by LangChain for building stateful, multi-agent applications. This example demonstrates using Zep for LangGraph agent memory.

Next
Built with

The Zep API enforces rate limits on incoming requests. Rate limits are measured in requests per minute (RPM) and applied per account.

The exact RPM limit for your account depends on your plan. See the Zep pricing page for details.

Rate limit headers

Every response from the Zep API includes headers that describe your current rate limit state. Inspect these headers to monitor usage and pace your client before you hit the limit.

HeaderDescription
X-RateLimit-LimitThe per-minute request limit for your account.
X-RateLimit-RemainingThe number of requests remaining in the current window.
X-RateLimit-ResetUnix timestamp (in seconds) at which the current window resets.
X-RateLimit-IncrementThe cost of the current request, in units of the limit. Always 1.
Retry-AfterNumber of seconds to wait before retrying. Only set on 429 responses.

Reading rate limit headers from the SDK

The Zep SDKs do not return response headers from a normal method call. To read headers, use the SDK’s raw response accessor, which returns both the parsed response data and the raw HTTP response.

1response = client.thread.with_raw_response.add_messages(
2 thread_id="thread_123",
3 messages=messages,
4)
5
6remaining = response.headers.get("x-ratelimit-remaining")
7reset = response.headers.get("x-ratelimit-reset")
8data = response.data

Handling 429 responses

When you exceed your rate limit, the Zep API returns HTTP 429 Too Many Requests. The SDK surfaces this as a typed error whose response headers include Retry-After, indicating how many seconds to wait before retrying.

Catch the error, read Retry-After, wait, and retry.

1import time
2from zep_cloud import ApiError
3
4try:
5 client.thread.add_messages(thread_id="thread_123", messages=messages)
6except ApiError as err:
7 if err.status_code == 429:
8 retry_after = int(err.headers.get("retry-after", "1"))
9 time.sleep(retry_after)
10 # retry your call

Combine Retry-After with exponential backoff and jitter to avoid synchronized retries when many clients are throttled at the same time.

Pacing requests proactively

To avoid hitting 429 responses in the first place, use X-RateLimit-Remaining and X-RateLimit-Reset to pace your requests:

  • If X-RateLimit-Remaining is approaching zero, slow your request rate or pause until the window resets.
  • The current window ends at the Unix timestamp in X-RateLimit-Reset. After this time, a fresh allowance is available.

This is particularly useful for bulk operations, such as batch ingestion, where you control the cadence of outgoing requests.