Retrieval philosophy
Zepās retrieval system is designed with two primary goals: high recall and low latency. This is a deliberate architectural choice that differs from systems optimized for precision.
Understanding recall vs. precision
Think of recall and precision as two different ways to measure retrieval quality:
- Recall measures completeness: āDid we find all the relevant information?ā
- Precision measures accuracy: āIs everything we returned actually relevant?ā
In practical terms:
- High recall means you get all the relevant results, but might also get some less relevant ones
- High precision means everything returned is highly relevant, but you might miss important information
The tradeoff in practice
Example scenario
User query: āWhat did we discuss about the Q2 marketing budget?ā
Why recall over precision?
Agents need comprehensive context to make informed decisions. Missing a critical fact can cause an agent to fail its task or provide incorrect information. By optimizing for recall, Zep ensures that relevant information is available to the agent, even if that means returning more results than strictly necessary.
The underlying principle: itās better to provide complete information and let the agent or downstream LLM filter whatās relevant than to risk omitting something important.
Why latency matters
Real-time applications like conversational AI, live customer support, and interactive agents require fast responses. Zepās retrieval architecture is optimized to return results in milliseconds, enabling seamless user experiences without perceptible delays.
Tuning the recall-precision tradeoff
The recall-optimized approach described here is how Zep is tuned out of the box. However, Zep provides several mechanisms to adjust this tradeoff for different use cases:
- Limit search results: Control the maximum number of results returned
- Apply filters: Narrow retrieval to specific time ranges, Entity and/or Edge labels, or other criteria
- Adjust search parameters: Fine-tune ranking and relevance thresholds
These controls allow you to shift toward precision when your application demands it, while maintaining Zepās fast retrieval performance.
Balancing context size
While recall is our priority, Zep does consider token count when returning results. We balance the size of the resulting context with the goal of providing complete information, but when in doubt, we err on the side of ensuring your agent has what it needs to succeed.