Chunking Large Documents with Contextualized Retrieval

The graph.add endpoint has a 10,000 character limit per request. For larger documents, you need to chunk the content before ingestion. Simply splitting text can lose important context, so this cookbook demonstrates how to use contextualized retrieval—a technique where an LLM situates each chunk within the broader document before adding it to Zep.

This approach produces richer knowledge graphs with better entity and relationship extraction compared to naive chunking.

View the complete source code on GitHub: Python | TypeScript | Go

Overview

The ingestion pipeline follows these steps:

Read the document from a text file
Chunk the document into smaller pieces using paragraph-aware splitting
Contextualize each chunk using an LLM to add situational context
Add each chunk to Zep via graph.add

Setup

Install the required dependencies:

$ pip install zep-cloud openai python-dotenv

Initialize the clients:

1 import os
2 from openai import OpenAI
3 from zep_cloud.client import Zep
4 from dotenv import load_dotenv
5 
6 load_dotenv()
7 
8 openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
9 zep_client = Zep(api_key=os.environ.get("ZEP_API_KEY"))

Chunking the Document

Alternative chunking libraries: If you prefer using an established library over the custom implementation below, consider LangChain, LlamaIndex, Unstructured, or Chonkie.

The chunking algorithm splits text at paragraph boundaries first, then falls back to sentence boundaries for long paragraphs. This preserves semantic coherence better than fixed-size splitting.

1 import re
2 from typing import Generator
3 
4 def chunk_document(
5     text: str,
6     chunk_size: int = 500,
7     chunk_overlap: int = 50
8 ) -> Generator[tuple[int, str], None, None]:
9     """
10     Split a document into chunks with configurable size and overlap.
11 
12     Args:
13         text: The full document text
14         chunk_size: Maximum characters per chunk (default 6000 to leave room for context)
15         chunk_overlap: Characters to overlap between chunks for continuity
16 
17     Yields:
18         Tuple of (chunk_index, chunk_text)
19     """
20     if not text:
21         return
22 
23     text = text.strip()
24     paragraphs = text.split('\n\n')
25 
26     current_chunk = ""
27     chunk_index = 0
28 
29     for paragraph in paragraphs:
30         paragraph = paragraph.strip()
31         if not paragraph:
32             continue
33 
34         # If adding this paragraph exceeds chunk_size, yield current chunk
35         if len(current_chunk) + len(paragraph) + 2 > chunk_size:
36             if current_chunk:
37                 yield (chunk_index, current_chunk.strip())
38                 chunk_index += 1
39 
40                 # Start new chunk with overlap from previous
41                 if chunk_overlap > 0 and len(current_chunk) > chunk_overlap:
42                     overlap_text = current_chunk[-chunk_overlap:]
43                     first_space = overlap_text.find(' ')
44                     if first_space > 0:
45                         overlap_text = overlap_text[first_space + 1:]
46                     current_chunk = overlap_text + "\n\n"
47                 else:
48                     current_chunk = ""
49 
50             # Handle single paragraphs longer than chunk_size
51             if len(paragraph) > chunk_size:
52                 for sub_chunk in split_long_paragraph(paragraph, chunk_size, chunk_overlap):
53                     yield (chunk_index, sub_chunk)
54                     chunk_index += 1
55                 current_chunk = ""
56             else:
57                 current_chunk += paragraph
58         else:
59             if current_chunk:
60                 current_chunk += "\n\n" + paragraph
61             else:
62                 current_chunk = paragraph
63 
64     # Yield final chunk
65     if current_chunk.strip():
66         yield (chunk_index, current_chunk.strip())
67 
68 
69 def split_long_paragraph(
70     paragraph: str,
71     chunk_size: int,
72     chunk_overlap: int
73 ) -> Generator[str, None, None]:
74     """Split a long paragraph by sentences."""
75     sentences = re.split(r'(?<=[.!?])\s+', paragraph)
76     current_chunk = ""
77 
78     for sentence in sentences:
79         if len(current_chunk) + len(sentence) + 1 > chunk_size:
80             if current_chunk:
81                 yield current_chunk.strip()
82                 if chunk_overlap > 0:
83                     overlap = current_chunk[-chunk_overlap:]
84                     first_space = overlap.find(' ')
85                     if first_space > 0:
86                         current_chunk = overlap[first_space + 1:] + " "
87                     else:
88                         current_chunk = ""
89                 else:
90                     current_chunk = ""
91         current_chunk += sentence + " "
92 
93     if current_chunk.strip():
94         yield current_chunk.strip()

Contextualizing Chunks

This is the key step that improves retrieval quality. For each chunk, we ask the LLM to generate a short context that situates it within the full document. This context is prepended to the chunk before adding to Zep.

Cost optimization: When contextualizing many chunks from the same document, use prompt caching to cache the full document in the system prompt. This significantly reduces inference time and cost since the document tokens are reused across chunk requests.

1 def contextualize_chunk(
2     openai_client: OpenAI,
3     full_document: str,
4     chunk: str
5 ) -> str:
6     """
7     Use OpenAI to generate context for a chunk within its document.
8 
9     Args:
10         openai_client: Initialized OpenAI client
11         full_document: The complete document text
12         chunk: The specific chunk to contextualize
13 
14     Returns:
15         The contextualized chunk (context prepended to original chunk)
16     """
17     prompt = f"""<document>
18 {full_document}
19 </document>
20 
21 Here is the chunk we want to situate within the whole document:
22 <chunk>
23 {chunk}
24 </chunk>
25 
26 Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. If the document has a publication date, please include the date in your context. Answer only with the succinct context and nothing else."""
27 
28     response = openai_client.chat.completions.create(
29         model="gpt-5-mini-2025-08-07",
30         messages=[{"role": "user", "content": prompt}],
31         max_completion_tokens=256
32     )
33 
34     context = response.choices[0].message.content.strip()
35 
36     # Combine context with original chunk
37     return f"{context}\n\n---\n\n{chunk}"

Adding Chunks to Zep

Each contextualized chunk is added to the user’s graph using graph.add. The method returns an episode object that can be used to track the ingestion.

1 def add_chunk_to_zep(
2     zep_client: Zep,
3     user_id: str,
4     chunk_data: str
5 ) -> dict:
6     """
7     Add a contextualized chunk to Zep's graph.
8 
9     Args:
10         zep_client: Initialized Zep client
11         user_id: The user ID to add data to
12         chunk_data: The contextualized chunk text
13 
14     Returns:
15         The episode response from Zep
16     """
17     episode = zep_client.graph.add(
18         user_id=user_id,
19         type="text",
20         data=chunk_data
21     )
22     return episode

Complete Ingestion Pipeline

Here’s how to put it all together:

1 def ingest_document(
2     openai_client: OpenAI,
3     zep_client: Zep,
4     document_path: str,
5     user_id: str,
6     chunk_size: int = 500,
7     chunk_overlap: int = 50
8 ) -> dict:
9     """
10     Ingest a document into Zep with contextualized retrieval.
11 
12     Args:
13         openai_client: Initialized OpenAI client
14         zep_client: Initialized Zep client
15         document_path: Path to the text document
16         user_id: Zep user ID to add the document to
17         chunk_size: Maximum characters per chunk
18         chunk_overlap: Character overlap between chunks
19 
20     Returns:
21         Summary statistics of the ingestion
22     """
23     # Read document
24     with open(document_path, 'r', encoding='utf-8') as f:
25         full_document = f.read()
26 
27     # If document fits in a single request, add directly
28     if len(full_document) <= 10000:
29         episode = zep_client.graph.add(
30             user_id=user_id,
31             type="text",
32             data=full_document
33         )
34         return {"total_chunks": 1, "successful": 1, "episodes": [episode.uuid_]}
35 
36     # Chunk the document
37     chunks = list(chunk_document(full_document, chunk_size, chunk_overlap))
38 
39     stats = {"total_chunks": len(chunks), "successful": 0, "episodes": []}
40 
41     for chunk_index, chunk_text in chunks:
42         # Contextualize the chunk
43         contextualized = contextualize_chunk(
44             openai_client,
45             full_document,
46             chunk_text
47         )
48 
49         # Validate size after contextualization
50         if len(contextualized) > 10000:
51             # Truncate context if needed
52             excess = len(contextualized) - 10000
53             contextualized = contextualized[excess:]
54 
55         # Add to Zep
56         episode = add_chunk_to_zep(zep_client, user_id, contextualized)
57         stats["successful"] += 1
58         stats["episodes"].append(episode.uuid_)
59 
60     return stats

Usage Example

1 # Ensure the user exists
2 user_id = "user123"
3 zep_client.user.add(user_id=user_id)
4 
5 # Ingest a document
6 stats = ingest_document(
7     openai_client=openai_client,
8     zep_client=zep_client,
9     document_path="company_handbook.txt",
10     user_id=user_id,
11     chunk_size=500,
12     chunk_overlap=50
13 )
14 
15 print(f"Ingested {stats['successful']} of {stats['total_chunks']} chunks")

Best practices

Chunk size: Use 500 characters or less for optimal graph construction. Smaller chunks allow Zep to capture more granular entities and relationships.
Chunk overlap: 50 characters helps maintain continuity between chunks without excessive redundancy.
Small chunks produce better graphs: Zep can capture more entities and relationships from smaller, focused chunks. While the 10K character limit allows larger chunks, smaller chunks yield richer knowledge graphs.