Vector Search over Chat History
Zep allows developers to search the Zep long-term memory store for relevant historical conversations.
You are viewing the Zep Open Source v0.x documentation. This version is no longer supported, and documentation is provided as a reference only.
The current documentation for Zep Community Edition is available here.
Zep allows developers to search the Zep long-term memory store for relevant historical conversations.
Searching for Messages or Summaries
Zep supports vector similarity search for Messages or Summaries of messages stored by Zep. This allows you to populate prompts with past conversations contextually similar to a given query, with the results sorted by a similarity score or distance
.
When to use Summaries versus Messages
Zep supports searching for both Messages and Summaries. Given that individual messages may lack conversational context, Summaries are often a better choice for search. See the discussion about message limitations below.
Messages, however, can contain specific details that may be useful for your application. It is possible to execute both types of searches within your app.
MMR Reranking Summaries
Since summaries often share information, particularly when the Message Window is set to a lower threshold, it is often useful to use Maximum Marginal Relevance (MMR) reranking of search results. Zep has built-in, hardware-accelerated support for MMR and enabling it is simple.
Info: Constructing Search Queries
Zep’s Collection and Memory search support semantic search queries, JSONPath-based metadata filters, and a combination of both. Memory search also supports querying by message creation date.
Read more about constructing search queries.
Python
TypeScript
Hybrid Search for Chat History using Metadata Filters
In addition to vector similarity search for Messages and Summaries stored in Zep, Zep also allows you to search using metadata filters. This allows you to find Messages or Summaries that match a combination of text and metadata filter. You can also query solely by specifying metadata.
Python
TypeScript
Search Ranking and Limits
Vector Indexes
Where available, Zep will use a pgvector v0.5
’s HNSW index for vector search over messages and summaries. Zep uses cosine distance for the distance function.
If you are using a version of pgvector
prior to v0.5
, Zep will fall back to using an exact nearest neighbor search.
If you don’t have access to pgvector v0.5
, it is possible to manually create IVFFLAT
indexes to improve search performance.
Please see the pgvector documentation for information on selecting the size of the lists
parameter.
Limitations
Zep returns all messages from a search, up to a default limit. This limit can be overridden by passing a limit
query string argument to the search API. Given the sparsity issue discussed below, we suggest only using the top 2-3 messages in your prompts. Alternatively, analyze your search results and use a distance threshold to filter out irrelevant messages.
Note: Embedding short texts
Contextual search over chat histories is challenging: chat messages are typically short and can lack “information”. When combined with high-dimensional embedding vectors, short texts can create very sparse vectors.
This vector sparsity results in many vectors appearing close to each other in the vector space. This may in turn result in many false positives when searching for relevant messages.
Embedding Models
Docker Container Deployments
By default, Zep uses OpenAI’s 1536-wide AdaV2 embeddings for docker deployments.
All other deployments
By default, Zep uses a built-in Sentence Transformers model, all-MiniLM-L6-v2
, for message embedding. The all-MiniLM-L6-v2
model offers a very low latency search experience when deployed on suitable infrastructure.
Note:
all-MiniLM-L6-v2
Model LimitationsThe
all-MiniLM-L6-v2
model has a 256 word piece limit. If your messages are likely to be larger, it is recommended you select an alternative model.
Selecting alternative models
Other embedding models and services, such as OpenAI, may be configured. See the Zep NLP Service configuration.