Chunking Large Documents with Contextualized Retrieval

Ingest documents larger than 10,000 characters using semantic chunking and LLM-powered contextualization

The graph.add endpoint has a 10,000 character limit per request. For larger documents, you need to chunk the content before ingestion. Simply splitting text can lose important context, so this cookbook demonstrates how to use contextualized retrieval—a technique where an LLM situates each chunk within the broader document before adding it to Zep.

This approach produces richer knowledge graphs with better entity and relationship extraction compared to naive chunking.

View the complete source code on GitHub: Python | TypeScript | Go

Overview

The ingestion pipeline follows these steps:

  1. Read the document from a text file
  2. Chunk the document into smaller pieces using paragraph-aware splitting
  3. Contextualize each chunk using an LLM to add situational context
  4. Add each chunk to Zep via graph.add

Setup

Install the required dependencies:

$pip install zep-cloud openai python-dotenv

Initialize the clients:

1import os
2from openai import OpenAI
3from zep_cloud.client import Zep
4from dotenv import load_dotenv
5
6load_dotenv()
7
8openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
9zep_client = Zep(api_key=os.environ.get("ZEP_API_KEY"))

Chunking the Document

Alternative chunking libraries: If you prefer using an established library over the custom implementation below, consider LangChain, LlamaIndex, Unstructured, or Chonkie.

The chunking algorithm splits text at paragraph boundaries first, then falls back to sentence boundaries for long paragraphs. This preserves semantic coherence better than fixed-size splitting.

1import re
2from typing import Generator
3
4def chunk_document(
5 text: str,
6 chunk_size: int = 500,
7 chunk_overlap: int = 50
8) -> Generator[tuple[int, str], None, None]:
9 """
10 Split a document into chunks with configurable size and overlap.
11
12 Args:
13 text: The full document text
14 chunk_size: Maximum characters per chunk (default 6000 to leave room for context)
15 chunk_overlap: Characters to overlap between chunks for continuity
16
17 Yields:
18 Tuple of (chunk_index, chunk_text)
19 """
20 if not text:
21 return
22
23 text = text.strip()
24 paragraphs = text.split('\n\n')
25
26 current_chunk = ""
27 chunk_index = 0
28
29 for paragraph in paragraphs:
30 paragraph = paragraph.strip()
31 if not paragraph:
32 continue
33
34 # If adding this paragraph exceeds chunk_size, yield current chunk
35 if len(current_chunk) + len(paragraph) + 2 > chunk_size:
36 if current_chunk:
37 yield (chunk_index, current_chunk.strip())
38 chunk_index += 1
39
40 # Start new chunk with overlap from previous
41 if chunk_overlap > 0 and len(current_chunk) > chunk_overlap:
42 overlap_text = current_chunk[-chunk_overlap:]
43 first_space = overlap_text.find(' ')
44 if first_space > 0:
45 overlap_text = overlap_text[first_space + 1:]
46 current_chunk = overlap_text + "\n\n"
47 else:
48 current_chunk = ""
49
50 # Handle single paragraphs longer than chunk_size
51 if len(paragraph) > chunk_size:
52 for sub_chunk in split_long_paragraph(paragraph, chunk_size, chunk_overlap):
53 yield (chunk_index, sub_chunk)
54 chunk_index += 1
55 current_chunk = ""
56 else:
57 current_chunk += paragraph
58 else:
59 if current_chunk:
60 current_chunk += "\n\n" + paragraph
61 else:
62 current_chunk = paragraph
63
64 # Yield final chunk
65 if current_chunk.strip():
66 yield (chunk_index, current_chunk.strip())
67
68
69def split_long_paragraph(
70 paragraph: str,
71 chunk_size: int,
72 chunk_overlap: int
73) -> Generator[str, None, None]:
74 """Split a long paragraph by sentences."""
75 sentences = re.split(r'(?<=[.!?])\s+', paragraph)
76 current_chunk = ""
77
78 for sentence in sentences:
79 if len(current_chunk) + len(sentence) + 1 > chunk_size:
80 if current_chunk:
81 yield current_chunk.strip()
82 if chunk_overlap > 0:
83 overlap = current_chunk[-chunk_overlap:]
84 first_space = overlap.find(' ')
85 if first_space > 0:
86 current_chunk = overlap[first_space + 1:] + " "
87 else:
88 current_chunk = ""
89 else:
90 current_chunk = ""
91 current_chunk += sentence + " "
92
93 if current_chunk.strip():
94 yield current_chunk.strip()

Contextualizing Chunks

This is the key step that improves retrieval quality. For each chunk, we ask the LLM to generate a short context that situates it within the full document. This context is prepended to the chunk before adding to Zep.

Cost optimization: When contextualizing many chunks from the same document, use prompt caching to cache the full document in the system prompt. This significantly reduces inference time and cost since the document tokens are reused across chunk requests.

1def contextualize_chunk(
2 openai_client: OpenAI,
3 full_document: str,
4 chunk: str
5) -> str:
6 """
7 Use OpenAI to generate context for a chunk within its document.
8
9 Args:
10 openai_client: Initialized OpenAI client
11 full_document: The complete document text
12 chunk: The specific chunk to contextualize
13
14 Returns:
15 The contextualized chunk (context prepended to original chunk)
16 """
17 prompt = f"""<document>
18{full_document}
19</document>
20
21Here is the chunk we want to situate within the whole document:
22<chunk>
23{chunk}
24</chunk>
25
26Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. If the document has a publication date, please include the date in your context. Answer only with the succinct context and nothing else."""
27
28 response = openai_client.chat.completions.create(
29 model="gpt-5-mini-2025-08-07",
30 messages=[{"role": "user", "content": prompt}],
31 max_completion_tokens=256
32 )
33
34 context = response.choices[0].message.content.strip()
35
36 # Combine context with original chunk
37 return f"{context}\n\n---\n\n{chunk}"

Adding Chunks to Zep

Each contextualized chunk is added to the user’s graph using graph.add. The method returns an episode object that can be used to track the ingestion.

1def add_chunk_to_zep(
2 zep_client: Zep,
3 user_id: str,
4 chunk_data: str
5) -> dict:
6 """
7 Add a contextualized chunk to Zep's graph.
8
9 Args:
10 zep_client: Initialized Zep client
11 user_id: The user ID to add data to
12 chunk_data: The contextualized chunk text
13
14 Returns:
15 The episode response from Zep
16 """
17 episode = zep_client.graph.add(
18 user_id=user_id,
19 type="text",
20 data=chunk_data
21 )
22 return episode

Complete Ingestion Pipeline

Here’s how to put it all together:

1def ingest_document(
2 openai_client: OpenAI,
3 zep_client: Zep,
4 document_path: str,
5 user_id: str,
6 chunk_size: int = 500,
7 chunk_overlap: int = 50
8) -> dict:
9 """
10 Ingest a document into Zep with contextualized retrieval.
11
12 Args:
13 openai_client: Initialized OpenAI client
14 zep_client: Initialized Zep client
15 document_path: Path to the text document
16 user_id: Zep user ID to add the document to
17 chunk_size: Maximum characters per chunk
18 chunk_overlap: Character overlap between chunks
19
20 Returns:
21 Summary statistics of the ingestion
22 """
23 # Read document
24 with open(document_path, 'r', encoding='utf-8') as f:
25 full_document = f.read()
26
27 # If document fits in a single request, add directly
28 if len(full_document) <= 10000:
29 episode = zep_client.graph.add(
30 user_id=user_id,
31 type="text",
32 data=full_document
33 )
34 return {"total_chunks": 1, "successful": 1, "episodes": [episode.uuid_]}
35
36 # Chunk the document
37 chunks = list(chunk_document(full_document, chunk_size, chunk_overlap))
38
39 stats = {"total_chunks": len(chunks), "successful": 0, "episodes": []}
40
41 for chunk_index, chunk_text in chunks:
42 # Contextualize the chunk
43 contextualized = contextualize_chunk(
44 openai_client,
45 full_document,
46 chunk_text
47 )
48
49 # Validate size after contextualization
50 if len(contextualized) > 10000:
51 # Truncate context if needed
52 excess = len(contextualized) - 10000
53 contextualized = contextualized[excess:]
54
55 # Add to Zep
56 episode = add_chunk_to_zep(zep_client, user_id, contextualized)
57 stats["successful"] += 1
58 stats["episodes"].append(episode.uuid_)
59
60 return stats

Usage Example

1# Ensure the user exists
2user_id = "user123"
3zep_client.user.add(user_id=user_id)
4
5# Ingest a document
6stats = ingest_document(
7 openai_client=openai_client,
8 zep_client=zep_client,
9 document_path="company_handbook.txt",
10 user_id=user_id,
11 chunk_size=500,
12 chunk_overlap=50
13)
14
15print(f"Ingested {stats['successful']} of {stats['total_chunks']} chunks")

Best practices

  • Chunk size: Use 500 characters or less for optimal graph construction. Smaller chunks allow Zep to capture more granular entities and relationships.
  • Chunk overlap: 50 characters helps maintain continuity between chunks without excessive redundancy.
  • Small chunks produce better graphs: Zep can capture more entities and relationships from smaller, focused chunks. While the 10K character limit allows larger chunks, smaller chunks yield richer knowledge graphs.

Further Reading