Retrieval philosophy | Zep Documentation

Zep’s retrieval system is designed with two primary goals: high recall and low latency. This is a deliberate architectural choice that differs from systems optimized for precision.

Understanding recall vs. precision

Think of recall and precision as two different ways to measure retrieval quality:

Recall measures completeness: “Did we find all the relevant information?”
Precision measures accuracy: “Is everything we returned actually relevant?”

In practical terms:

High recall means you get all the relevant results, but might also get some less relevant ones
High precision means everything returned is highly relevant, but you might miss important information

The tradeoff in practice

Approach	What You Get	What You Risk	Best For
Optimize for Recall (Zep’s approach)	All relevant facts, plus some less relevant results	Larger context with some noise	Agents that need complete information to make decisions; real-time applications
Optimize for Precision	Only highly relevant results	Missing critical facts that could cause task failure	Use cases where context size is severely constrained; manual review workflows

Example scenario

User query: “What did we discuss about the Q2 marketing budget?”

Retrieval Approach	Results Returned	Outcome
Recall-Optimized (Zep)	• Q2 marketing budget discussion ✓ • Related Q2 sales projections ✓ • Q3 budget planning mention ⚠️ • Q2 hiring costs mentioning marketing ⚠️	Agent has complete context, including tangentially related information. Can successfully answer follow-up questions about budget revisions.
Precision-Optimized	• Q2 marketing budget discussion ✓ • Related Q2 sales projections ✓	Clean, focused results, but missing a separate conversation about budget revisions that didn’t explicitly mention “marketing budget.” Agent may provide incomplete information.

Why recall over precision?

Agents need comprehensive context to make informed decisions. Missing a critical fact can cause an agent to fail its task or provide incorrect information. By optimizing for recall, Zep ensures that relevant information is available to the agent, even if that means returning more results than strictly necessary.

The underlying principle: it’s better to provide complete information and let the agent or downstream LLM filter what’s relevant than to risk omitting something important.

Why latency matters

Real-time applications like conversational AI, live customer support, and interactive agents require fast responses. Zep’s retrieval architecture is optimized to return results in milliseconds, enabling seamless user experiences without perceptible delays.

Tuning the recall-precision tradeoff

The recall-optimized approach described here is how Zep is tuned out of the box. However, Zep provides several mechanisms to adjust this tradeoff for different use cases:

Limit search results: Control the maximum number of results returned
Apply filters: Narrow retrieval to specific time ranges, Entity and/or Edge labels, or other criteria
Adjust search parameters: Fine-tune ranking and relevance thresholds

These controls allow you to shift toward precision when your application demands it, while maintaining Zep’s fast retrieval performance.

Balancing context size

While recall is our priority, Zep does consider token count when returning results. We balance the size of the resulting context with the goal of providing complete information, but when in doubt, we err on the side of ensuring your agent has what it needs to succeed.