Document Collections
Document Collections are deprecated and have been removed from Zep Community Edition. We will be removing this feature from Zep Cloud in a future release.
Zep’s document vector store lets you embed and search documents using vector similarity search, Maximum Marginal Relevance Re-Ranking, and metadata filtering.
You can manage collections, ingest documents, and search using Zep’s SDKs, LangChain, or LlamaIndex.
zep-python
supports asynchronous operations.
All methods come in sync and async flavors, with async methods prefixed with a
.
For instance, zep-python
offers both zep_client.memory.add_memory
and zep_client.memory.add_memory
.
Key Concepts
Collections
A Collection
is a group of documents that use the same embedding strategy and model. Zep automatically creates embeddings for the documents you provide.
Documents
Documents
are the texts you want to embed and search. You can add documents to collections and optionally assign them a unique ID and metadata. If you add metadata, it can help filter search results.
Initializing the Zep Client
For details on initializing the Zep client, check out the SDK documentation.
Creating a Collection
Python
TypeScript
Loading an Existing Collection
Python
TypeScript
Adding Documents to a Collection
Python
TypeScript
Langchain
document_id
is an optional identifier you can assign to each document. It’s handy for linking a document chunk with a specific ID you choose.
The metadata
is an optional dictionary that holds metadata related to your document. Zep leverages this metadata for hybrid searches across a collection, enabling you to filter search results more effectively.
When you use document.add_documents
, it returns a list of Zep UUIDs corresponding to the documents you’ve added to the collection.
Chunking your documents
Choosing the right chunking strategy is crucial and highly dependent on your specific needs. A variety of 3rd-party libraries, including Langchain, offer support for processing documents from numerous sources and dividing them into smaller segments suitable for embedding.
We recommend experimenting with various extractors, chunking strategies, sizes, and overlaps to discover the optimal approach for your project.
Monitoring Embedding Progress
The process of embedding documents in Zep is asynchronous. To keep track of your collection’s embedding progress, you can periodically check the collection’s status:
Python
TypeScript
Once the collection’s status changes to ready
, it means all documents have been successfully embedded and are now searchable.
Searching a Collection with Hybrid Vector Search
Zep enables hybrid vector search across your collections, allowing you to pinpoint the most relevant documents based on semantic similarity. Additionally, you have the option to refine your search by filtering through document metadata.
You can initiate a search using either a text query or an embedding vector, depending on your needs.
Zep’s Collection and Memory search support semantic search queries, JSONPath-based metadata filters, and a combination of both. Memory search also supports querying by message creation date.
Python
TypeScript
Langchain
metadata
is an optional dictionary of JSONPath filters used to match on metadata associated with your documents.
limit
is an optional integer indicating the maximum number of results to return.
Retrieving Documents by UUID
Zep supports retrieving a list of documents by Zep UUID:
Python
TypeScript
Other Common Operations
This section covers additional common operations you might need to perform, such as listing all collections within your client’s scope. The examples above demonstrate how to create an index on a collection and list all collections for both Python and TypeScript.
Updating a Collection’s Description or Metadata
Python
TypeScript
Batch Update Documents’ ID or Metadata
Python
TypeScript
Deleting Documents
Zep supports deleting documents from a collection by UUID:
Python
TypeScript
Deleting a Collection
Deleting a collection will delete all documents in the collection, as well as the collection itself.