Vector Search over Chat History

Zep allows developers to search the Zep long-term memory store for relevant historical conversations.

You are viewing the Zep Open Source v0.x documentation. This version is no longer supported, and documentation is provided as a reference only.

The current documentation for Zep Community Edition is available here.

Zep allows developers to search the Zep long-term memory store for relevant historical conversations.

Searching for Messages or Summaries

Zep supports vector similarity search for Messages or Summaries of messages stored by Zep. This allows you to populate prompts with past conversations contextually similar to a given query, with the results sorted by a similarity score or distance.

When to use Summaries versus Messages

Zep supports searching for both Messages and Summaries. Given that individual messages may lack conversational context, Summaries are often a better choice for search. See the discussion about message limitations below.

Messages, however, can contain specific details that may be useful for your application. It is possible to execute both types of searches within your app.

MMR Reranking Summaries

Since summaries often share information, particularly when the Message Window is set to a lower threshold, it is often useful to use Maximum Marginal Relevance (MMR) reranking of search results. Zep has built-in, hardware-accelerated support for MMR and enabling it is simple.

Info: Constructing Search Queries

Zep’s Collection and Memory search support semantic search queries, JSONPath-based metadata filters, and a combination of both. Memory search also supports querying by message creation date.

Read more about constructing search queries.

Python

1from zep_python import (
2 MemorySearchPayload,
3 ZepClient,
4)
5
6# This uniquely identifies the user's session
7session_id = "my_session_id"
8
9# Initialize the Zep client before running this code
10search_payload = MemorySearchPayload(
11 text="Is Lauren Olamina a character in a book?",
12 search_scope="summary", # This could be messages or summary
13 search_type="mmr", # remove this if you'd prefer not to rerank results
14 mmr_lambda=0.5, # tune diversity vs relevance
15)
16
17search_results = await client.memory.asearch_memory(session_id, search_payload)
18
19for search_result in search_results:
20 # Uncomment for message search
21 # print(search_result.messsage.dict())
22 print(search_result.summary.dict())
1{
2 "summary": {
3 "uuid": "b47b83da-16ae-49c8-bacb-f7d049f9df99",
4 "created_at": "2023-11-02T18:22:10.103867Z",
5 "content": "The human asks the AI to explain the book Parable of the Sower by Octavia Butler. The AI responds by explaining that Parable of the Sower is a science fiction novel by Octavia Butler. The book follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.",
6 "token_count": 66
7 },
8 "metadata": null,
9 "dist": 0.8440576791763306
10}

TypeScript

1import { MemorySearchPayload, ZepClient } from "@getzep/zep-js";
2
3// This uniquely identifies the user's session
4const sessionID = "my_session_id";
5const searchText = "Is Lauren Olamina a character in a book?";
6
7// Initialize the ZepClient before running this code
8
9// Create a new MemorySearchPayload with the search text, scope, type, and MMR lambda
10const searchPayload = new MemorySearchPayload({
11 text: searchText,
12 search_scope: "summary", // This could be messages or summary
13 search_type: "mmr", // remove this if you'd prefer not to rerank results
14 mmr_lambda: 0.5, // tune diversity vs relevance
15});
16
17// Perform the memory search with the session ID, search payload, and a limit of 3 results
18const searchResults = await client.memory.searchMemory(
19 sessionID,
20 searchPayload,
21 3
22);
23
24searchResults.forEach((searchResult) => {
25 console.debug(JSON.stringify(searchResult));
26});
1{
2 "summary": {
3 "uuid": "b47b83da-16ae-49c8-bacb-f7d049f9df99",
4 "created_at": "2023-11-02T18:22:10.103867Z",
5 "content": "The human asks the AI to explain the book Parable of the Sower by Octavia Butler. The AI responds by explaining that Parable of the Sower is a science fiction novel by Octavia Butler. The book follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.",
6 "token_count": 66
7 },
8 "metadata": null,
9 "dist": 0.8440576791763306
10}

Hybrid Search for Chat History using Metadata Filters

In addition to vector similarity search for Messages and Summaries stored in Zep, Zep also allows you to search using metadata filters. This allows you to find Messages or Summaries that match a combination of text and metadata filter. You can also query solely by specifying metadata.

Python

1zep_client.search_memory(
2 session_id=session_id,
3 search_payload=MemorySearchPayload(
4 query="I enjoy reading science fiction.",
5 metadata={
6 "where": {"jsonpath": '$[*] ? (@.foo == "bar")'},
7 },
8 ),
9)
1{
2 "dist": 0.7170433826192629,
3 "message": {
4 "content": "I've read many books written by Octavia Butler.",
5 "created_at": "2023-06-03T22:00:43.034056Z",
6 "metadata": {
7 "foo": "bar",
8 "system": {
9 "entities": [
10 {
11 "Label": "PERSON",
12 "Matches": [
13 {
14 "End": 46,
15 "Start": 32,
16 "Text": "Octavia Butler"
17 }
18 ],
19 "Name": "Octavia Butler"
20 }
21 ]
22 }
23 },
24 "role": "human",
25 "token_count": 13,
26 "uuid": "8f3a06dd-0625-41da-a2af-b549f2056b3f"
27 },
28 "metadata": null,
29 "summary": null
30}

TypeScript

1const searchText = "I enjoy reading science fiction.";
2
3const searchPayload = new MemorySearchPayload({
4 metadata: {
5 where: { jsonpath: '$[*] ? (@.foo == "bar")' },
6 },
7 text: searchText,
8});
9
10const searchResults = await zepClient.searchMemory(sessionID, searchPayload);
1{
2 "dist": 0.7170433826192629,
3 "message": {
4 "content": "I've read many books written by Octavia Butler.",
5 "created_at": "2023-06-03T22:00:43.034056Z",
6 "metadata": {
7 "foo": "bar",
8 "system": {
9 "entities": [
10 {
11 "Label": "PERSON",
12 "Matches": [
13 {
14 "End": 46,
15 "Start": 32,
16 "Text": "Octavia Butler"
17 }
18 ],
19 "Name": "Octavia Butler"
20 }
21 ]
22 }
23 },
24 "role": "human",
25 "token_count": 13,
26 "uuid": "8f3a06dd-0625-41da-a2af-b549f2056b3f"
27 },
28 "metadata": null,
29 "summary": null
30}

Search Ranking and Limits

Vector Indexes

Where available, Zep will use a pgvector v0.5’s HNSW index for vector search over messages and summaries. Zep uses cosine distance for the distance function.

If you are using a version of pgvector prior to v0.5, Zep will fall back to using an exact nearest neighbor search.

If you don’t have access to pgvector v0.5, it is possible to manually create IVFFLAT indexes to improve search performance.

1CREATE INDEX ON message_embedding USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
2CREATE INDEX ON summary_embedding USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Please see the pgvector documentation for information on selecting the size of the lists parameter.

Limitations

Zep returns all messages from a search, up to a default limit. This limit can be overridden by passing a limit query string argument to the search API. Given the sparsity issue discussed below, we suggest only using the top 2-3 messages in your prompts. Alternatively, analyze your search results and use a distance threshold to filter out irrelevant messages.

Note: Embedding short texts

Contextual search over chat histories is challenging: chat messages are typically short and can lack “information”. When combined with high-dimensional embedding vectors, short texts can create very sparse vectors.

This vector sparsity results in many vectors appearing close to each other in the vector space. This may in turn result in many false positives when searching for relevant messages.

Embedding Models

Docker Container Deployments

By default, Zep uses OpenAI’s 1536-wide AdaV2 embeddings for docker deployments.

All other deployments

By default, Zep uses a built-in Sentence Transformers model, all-MiniLM-L6-v2, for message embedding. The all-MiniLM-L6-v2 model offers a very low latency search experience when deployed on suitable infrastructure.

Note: all-MiniLM-L6-v2 Model Limitations

The all-MiniLM-L6-v2 model has a 256 word piece limit. If your messages are likely to be larger, it is recommended you select an alternative model.

Selecting alternative models

Other embedding models and services, such as OpenAI, may be configured. See the Zep NLP Service configuration.