Chunking Large Documents with Contextualized Retrieval
Ingest documents larger than 10,000 characters using semantic chunking and LLM-powered contextualization
The graph.add endpoint has a 10,000 character limit per request. For larger documents, you need to chunk the content before ingestion. Simply splitting text can lose important context, so this cookbook demonstrates how to use contextualized retrieval—a technique where an LLM situates each chunk within the broader document before adding it to Zep.
This approach produces richer knowledge graphs with better entity and relationship extraction compared to naive chunking.
View the complete source code on GitHub: Python | TypeScript | Go
Overview
The ingestion pipeline follows these steps:
- Read the document from a text file
- Chunk the document into smaller pieces using paragraph-aware splitting
- Contextualize each chunk using an LLM to add situational context
- Add each chunk to Zep via
graph.add
Setup
Install the required dependencies:
Initialize the clients:
Chunking the Document
Alternative chunking libraries: If you prefer using an established library over the custom implementation below, consider LangChain, LlamaIndex, Unstructured, or Chonkie.
The chunking algorithm splits text at paragraph boundaries first, then falls back to sentence boundaries for long paragraphs. This preserves semantic coherence better than fixed-size splitting.
Contextualizing Chunks
This is the key step that improves retrieval quality. For each chunk, we ask the LLM to generate a short context that situates it within the full document. This context is prepended to the chunk before adding to Zep.
Cost optimization: When contextualizing many chunks from the same document, use prompt caching to cache the full document in the system prompt. This significantly reduces inference time and cost since the document tokens are reused across chunk requests.
Adding Chunks to Zep
Each contextualized chunk is added to the user’s graph using graph.add. The method returns an episode object that can be used to track the ingestion.
Complete Ingestion Pipeline
Here’s how to put it all together:
Usage Example
Best practices
- Chunk size: Use 500 characters or less for optimal graph construction. Smaller chunks allow Zep to capture more granular entities and relationships.
- Chunk overlap: 50 characters helps maintain continuity between chunks without excessive redundancy.
- Small chunks produce better graphs: Zep can capture more entities and relationships from smaller, focused chunks. While the 10K character limit allows larger chunks, smaller chunks yield richer knowledge graphs.
Further Reading
- Adding Business Data - Learn about the
graph.addendpoint and data types - Adding Batch Data - For ingesting large datasets more efficiently
- Performance Best Practices - Optimization tips for data ingestion