Batch ingestion

Ingest large historical datasets into your knowledge graphs with the Batch API

The Batch API is available to enterprise customers. Contact your Zep account team to enable it for your project.

The Batch API is the recommended way to load large historical datasets — backfills, document collections, archived conversations, migrations from another system — into your knowledge graphs.

Calling graph.add or thread.add_messages once per item works for live data, but becomes hard to manage at scale. With the Batch API you group items into a batch (splitting across multiple batches when needed — see Batch limits), hand it off to Zep, and track progress in one place — both programmatically and in the Zep dashboard.

A single batch can mix graph episodes and thread messages and can target any number of graphs, users, and threads, so a backfill across many destinations can be expressed as one batch instead of many one-off requests.

How batches work

A batch follows a three-step lifecycle:

1

Create

Create an empty batch with optional metadata.

2

Add

Add items to the batch across one or more batch.add calls. Items can be graph episodes or thread messages, and may target different graphs, users, or threads.

3

Process

Start processing. Zep returns immediately and processes the batch asynchronously. You can poll for progress or watch it in the dashboard.

Items in a batch are grouped by destination graph and processed in the order they were added. Episodes and messages added through the Batch API are priced the same as those added through graph.add or thread.add_messages.

Batch limits

  • A single batch can contain up to 50,000 items.
  • Each call to batch.add accepts up to 500 items.

To ingest more than 500 items, make multiple batch.add calls against the same batch ID before calling batch.process.

Quickstart

The example below creates a batch, adds a mix of graph episodes and thread messages, starts processing, and polls until the batch finishes.

1import time
2from zep_cloud.client import Zep
3from zep_cloud import BatchAddItem
4
5client = Zep(api_key=API_KEY)
6
7# 1. Create the batch
8batch = client.batch.create(
9 metadata={"description": "Customer support backfill"},
10)
11batch_id = batch.batch_id
12
13# 2. Add items to the batch
14items = [
15 BatchAddItem(
16 type="graph_episode",
17 user_id="alice",
18 data="Alice signed up for the Pro plan on 2024-06-15.",
19 data_type="text",
20 ),
21 BatchAddItem(
22 type="graph_episode",
23 graph_id="company_kb",
24 data="Refund policy: orders may be refunded within 30 days of purchase.",
25 data_type="text",
26 ),
27 BatchAddItem(
28 type="thread_message",
29 thread_id="alice_support_thread_42",
30 content="My dashboard isn't loading.",
31 role="user",
32 name="Alice",
33 ),
34]
35
36client.batch.add(batch_id=batch_id, items=items)
37
38# 3. Start processing
39client.batch.process(batch_id=batch_id)
40
41# 4. Poll until the batch finishes
42while True:
43 summary = client.batch.get(batch_id=batch_id)
44 if summary.status in ("succeeded", "partial", "failed"):
45 break
46 print(f"Status: {summary.status} ({summary.progress.percent_complete:.0f}%)")
47 time.sleep(5)
48
49print(f"Final status: {summary.status}")

Adding items to a batch

Each item in a batch is one of two types:

  • graph_episode — equivalent to a single graph.add call. Targets a graph by graph_id or a user graph by user_id.
  • thread_message — equivalent to one message inside a thread.add_messages call. Targets a thread by thread_id.

The fields below mirror the equivalent fields on graph.add and thread.add_messages. See Adding business data and Adding messages for the underlying semantics.

Common fields

FieldDescription
typeRequired. graph_episode or thread_message.
metadataOptional. Up to 10 key-value pairs. See Episode metadata for constraints and search filtering.
created_atOptional. ISO 8601 timestamp marking when the original event occurred. Used by Zep’s fact invalidation process. See Setting timestamps.
source_descriptionOptional. Human-readable description of where the item came from.

Graph episode fields (type: "graph_episode")

FieldDescription
dataRequired. The episode content. Subject to the same 10,000-character limit as graph.add.
data_typeRequired. text, json, or message.
graph_id or user_idOne of the two is required to identify the destination graph.

Thread message fields (type: "thread_message")

FieldDescription
thread_idRequired. The destination thread.
contentRequired. The message body.
roleRequired. One of user, assistant, system, function, tool, norole.
nameOptional. Speaker name.

Setting timestamps on batch items

Pass created_at on each item to give Zep accurate temporal information for historical data. This is important for backfills — Zep uses these timestamps in its fact invalidation process to determine the valid_at and invalid_at values on extracted facts (edges).

The created_at value should be in RFC3339 format (e.g., "2024-06-15T10:30:00Z").

1from zep_cloud import BatchAddItem
2
3items = [
4 BatchAddItem(
5 type="graph_episode",
6 user_id="alice",
7 data="Alice joined the engineering team as a senior developer.",
8 data_type="text",
9 created_at="2024-06-15T10:30:00Z",
10 ),
11 BatchAddItem(
12 type="graph_episode",
13 user_id="alice",
14 data="Alice was promoted to tech lead of the engineering team.",
15 data_type="text",
16 created_at="2024-09-01T09:00:00Z",
17 ),
18]
19
20client.batch.add(batch_id=batch_id, items=items)

Tracking progress

Two methods report on a running or completed batch:

  • batch.get(batch_id) returns a summary of the whole batch, including a progress object with counts for total_items, queued_items, processing_items, succeeded_items, failed_items, skipped_items, and percent_complete.
  • batch.list_items(batch_id) returns each item with its individual status (pending, queued, processing, succeeded, failed, skipped).

For long-running batches, polling is often impractical. Subscribe to the ingest.batch.completed webhook to be notified when a batch reaches a terminal state — the payload includes the batch_id so you can match it back to the batch you submitted.

Batch statuses

The status field on BatchSummary is one of:

StatusMeaning
draftThe batch was just created. Items can still be added with batch.add. Processing has not started. Can be deleted.
invalidbatch.process was called, but one or more items reference graphs, users, or threads that don’t exist. The batch cannot proceed. Can be deleted.
queuedbatch.process was called and the batch is waiting for a worker.
processingA worker is actively processing the batch.
succeededTerminal. Every item processed successfully.
partialTerminal. Some items succeeded and others failed. Use batch.list_items to see which items failed.
failedTerminal. The batch as a whole failed.

Once a batch reaches a terminal state (succeeded, partial, or failed), it stays there.

Per-item statuses

The status field on each BatchItemDetail is one of:

StatusMeaning
pendingThe item has been added to the batch but processing has not started.
queuedThe item is queued for processing.
processingThe item is currently being processed.
succeededThe item processed successfully.
failedThe item failed to process. The error field on the item describes why.
skippedThe item was skipped during processing — for example, a thread message whose role matches a configured ignore_roles value.
1summary = client.batch.get(batch_id=batch_id)
2print(f"Status: {summary.status}")
3print(f"Progress: {summary.progress.succeeded_items}/{summary.progress.total_items}")
4
5# Inspect individual items
6items = client.batch.list_items(batch_id=batch_id, limit=50)
7for item in items.items:
8 print(item.item_id, item.status)

Listing and managing batches

Use batch.list to enumerate batches in your project, optionally filtered by status. Use batch.delete to remove a batch that has not yet been processed — once a batch has been processed, it cannot be deleted.

1# List recent batches
2result = client.batch.list(limit=20)
3for b in result.batches:
4 print(b.batch_id, b.status, b.item_count)
5
6# List only batches that are still being processed
7result = client.batch.list(status="processing")
8
9# Delete a draft batch
10client.batch.delete(batch_id=batch_id)

Viewing batches in the dashboard

The Zep web dashboard provides a batches view showing all batches in your project, their status, item counts, and processing progress. Click into a batch to inspect its individual items and any errors.

Deprecated batch methods

The following methods are deprecated and no longer recommended. Use the Batch API described above for all new ingestion work.

Deprecated methodReplacement
graph.add_batch() (POST /graph-batch)client.batch.* with type: "graph_episode"
thread.add_messages_batch() (POST /threads/{threadId}/messages-batch)client.batch.* with type: "thread_message"

The deprecated methods continue to work but will be removed in a future release.