Batch ingestion
The Batch API is available to enterprise customers. Contact your Zep account team to enable it for your project.
The Batch API is the recommended way to load large historical datasets — backfills, document collections, archived conversations, migrations from another system — into your knowledge graphs.
Calling graph.add or thread.add_messages once per item works for live data, but becomes hard to manage at scale. With the Batch API you group items into a batch (splitting across multiple batches when needed — see Batch limits), hand it off to Zep, and track progress in one place — both programmatically and in the Zep dashboard.
A single batch can mix graph episodes and thread messages and can target any number of graphs, users, and threads, so a backfill across many destinations can be expressed as one batch instead of many one-off requests.
How batches work
A batch follows a three-step lifecycle:
Items in a batch are grouped by destination graph and processed in the order they were added. Episodes and messages added through the Batch API are priced the same as those added through graph.add or thread.add_messages.
Batch limits
- A single batch can contain up to 50,000 items.
- Each call to
batch.addaccepts up to 500 items.
To ingest more than 500 items, make multiple batch.add calls against the same batch ID before calling batch.process.
Quickstart
The example below creates a batch, adds a mix of graph episodes and thread messages, starts processing, and polls until the batch finishes.
Adding items to a batch
Each item in a batch is one of two types:
graph_episode— equivalent to a singlegraph.addcall. Targets a graph bygraph_idor a user graph byuser_id.thread_message— equivalent to one message inside athread.add_messagescall. Targets a thread bythread_id.
The fields below mirror the equivalent fields on graph.add and thread.add_messages. See Adding business data and Adding messages for the underlying semantics.
Common fields
Graph episode fields (type: "graph_episode")
Thread message fields (type: "thread_message")
Setting timestamps on batch items
Pass created_at on each item to give Zep accurate temporal information for historical data. This is important for backfills — Zep uses these timestamps in its fact invalidation process to determine the valid_at and invalid_at values on extracted facts (edges).
The created_at value should be in RFC3339 format (e.g., "2024-06-15T10:30:00Z").
Tracking progress
Two methods report on a running or completed batch:
batch.get(batch_id)returns a summary of the whole batch, including aprogressobject with counts fortotal_items,queued_items,processing_items,succeeded_items,failed_items,skipped_items, andpercent_complete.batch.list_items(batch_id)returns each item with its individual status (pending,queued,processing,succeeded,failed,skipped).
For long-running batches, polling is often impractical. Subscribe to the ingest.batch.completed webhook to be notified when a batch reaches a terminal state — the payload includes the batch_id so you can match it back to the batch you submitted.
Batch statuses
The status field on BatchSummary is one of:
Once a batch reaches a terminal state (succeeded, partial, or failed), it stays there.
Per-item statuses
The status field on each BatchItemDetail is one of:
Listing and managing batches
Use batch.list to enumerate batches in your project, optionally filtered by status. Use batch.delete to remove a batch that has not yet been processed — once a batch has been processed, it cannot be deleted.
Viewing batches in the dashboard
The Zep web dashboard provides a batches view showing all batches in your project, their status, item counts, and processing progress. Click into a batch to inspect its individual items and any errors.
Deprecated batch methods
The following methods are deprecated and no longer recommended. Use the Batch API described above for all new ingestion work.
The deprecated methods continue to work but will be removed in a future release.