Chunking Large Documents with Contextualized Retrieval
Ingest documents larger than 10,000 characters using semantic chunking and LLM-powered contextualization
Ingest documents larger than 10,000 characters using semantic chunking and LLM-powered contextualization
The graph.add endpoint has a 10,000 character limit per request. For larger documents, you need to chunk the content before ingestion. Simply splitting text can lose important context, so this cookbook demonstrates how to use contextualized retrieval—a technique where an LLM situates each chunk within the broader document before adding it to Zep.
This approach produces richer knowledge graphs with better entity and relationship extraction compared to naive chunking.
View the complete source code on GitHub: Python | TypeScript | Go
The ingestion pipeline follows these steps:
graph.addInstall the required dependencies:
Initialize the clients:
Alternative chunking libraries: If you prefer using an established library over the custom implementation below, consider LangChain, LlamaIndex, Unstructured, or Chonkie.
The chunking algorithm splits text at paragraph boundaries first, then falls back to sentence boundaries for long paragraphs. This preserves semantic coherence better than fixed-size splitting.
This is the key step that improves retrieval quality. For each chunk, we ask the LLM to generate a short context that situates it within the full document. This context is prepended to the chunk before adding to Zep.
Cost optimization: When contextualizing many chunks from the same document, use prompt caching to cache the full document in the system prompt. This reduces inference time and cost since the document tokens are reused across chunk requests.
Each contextualized chunk is added to the user’s graph using graph.add. The method returns an episode object that can be used to track the ingestion.
Here’s how to put it all together:
graph.add endpoint and data types