Zep helps developers search through long-term memory stores to find relevant historical conversations efficiently. With automated embedding and advanced indexing, Zep offers robust search capabilities that are straightforward and effective.
Searching for Messages or Summaries
Zep enables vector similarity searches for Messages or Summaries stored within its system. This feature lets you populate prompts with past conversations that are contextually similar to a specific query, organizing the results by a similarity Score.
Choosing Between Summaries and Messages
Zep supports searches for both Messages and Summaries. Since individual messages might miss some conversational context, Summaries are often the preferred choice for executing searches. For more on this, check out the section on message limitations.
MMR Reranking for Summaries
Summaries can sometimes overlap in information, especially when the Message Window is set low. In such cases, employing Maximum Marginal Relevance (MMR) to rerank search results can be beneficial. Zep includes built-in, hardware-accelerated support for MMR, making it simple and easy to use.
Constructing Search Queries
Zep’s Collection and Memory search support semantic search queries, JSONPath-based metadata filters, and a combination of both.
Memory search also supports querying by message creation date.
Read more about constructing search queries.
Python
TypeScript
Hybrid Search for Chat History with Metadata Filters
Besides the vector similarity search for Messages and Summaries stored in Zep, you can also use metadata filters for your searches. You also have the option to conduct searches based purely on metadata.
Python
TypeScript
Search Ranking and Limits
Vector Indexes
Zep automatically creates HNSW (Hierarchical Navigable Small World) indexes for all messages, summaries, and documents. This means you get speedy and relevant search results right out of the box, without the hassle of manually setting up or integrating a vector store and building indexes. Zep uses an optimized distance function similar to cosine distance for search ranking.
Embedding Models
Zep uses the BAAI/bge-large-en
model for text embeddings, known for its high performance and optimization for semantic search. Keep in mind, this model has a 512 token maximum sequence length, which is important when deciding how to chunk your documents.
Limitations When Searching Over Messages or Short Document Chunks
Zep can return all messages from a search up to a certain row limit. This limit can be adjusted by passing a limit
query string argument to the search API. Due to the sparsity issue we’ll touch on below, we recommend sticking to the top 2-3 messages in your prompts. Or, you could analyze your search results and use a distance threshold to filter out messages that aren’t relevant.
Handling Short Texts in Embeddings
Searching through chat histories can be tricky. Chat messages are often brief and might not carry much “information”. When these short texts are turned into high-dimensional embedding vectors, the result can be very sparse vectors.
This sparsity means a lot of these vectors end up being close to each other in the vector space, which can lead to a higher chance of getting false positives in your search results for relevant messages. As a result, we recommend searching over Summaries, which include more information than Messages.