For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
PlaygroundDiscordStatusDashboardSign Up >
DocumentationSDK ReferenceGraphiti
DocumentationSDK ReferenceGraphiti
  • Getting Started
    • Coding with LLMs
    • Key Concepts
    • Quickstart
    • Building an Agent Walkthrough
    • Memory
    • Projects
    • Users
    • Sessions
    • Groups
  • Working with the Graph
    • Understanding the Graph
    • Utilizing Facts and Summaries
    • Customizing Graph Structure
    • Adding Data to the Graph
    • Reading Data from the Graph
    • Searching the Graph
    • Deleting Data from the Graph
    • Debugging
  • Cookbook
    • Check Data Ingestion Status
    • Customize Your Memory Context String
    • Add User Specific Business Data to User Graphs
    • Share Memory Across Users Using Group Graphs
    • Get Most Relevant Facts for an Arbitrary Query
    • Find Facts Relevant to a Specific Node
  • Best Practices
    • Performance Best Practices
    • Adding JSON Best Practices
  • Ecosystem
    • LangGraph
    • Autogen
  • Migrations
    • February 2026 Deprecation Wave
    • Migrate from Mem0
  • FAQ
    • Frequently Asked Questions
  • Legal
    • Privacy Policy
    • Terms of Service
    • Website Terms of Use
LogoLogo
PlaygroundDiscordStatusDashboardSign Up >
On this page
  • Reuse the Zep SDK Client
  • Optimizing Memory Operations
  • Get the memory context string sooner
  • Optimizing Search Queries
  • Summary
Best Practices

Performance Optimization Guide

Was this page helpful?
Previous

Adding JSON Best Practices

Next
Built with

This guide covers best practices for optimizing Zep’s performance in production environments.

Reuse the Zep SDK Client

The Zep SDK client maintains an HTTP connection pool that enables connection reuse, significantly reducing latency by avoiding the overhead of establishing new connections. To optimize performance:

  • Create a single client instance and reuse it across your application
  • Avoid creating new client instances for each request or function
  • Consider implementing a client singleton pattern in your application
  • For serverless environments, initialize the client outside the handler function

Optimizing Memory Operations

The memory.add and memory.get methods are optimized for conversational messages and low-latency retrieval. For optimal performance:

  • Keep individual messages under 10K characters
  • Use graph.add for larger documents, tool outputs, or business data
  • Consider chunking large documents before adding them to the graph (the graph.add endpoint has a 10,000 character limit)
  • Remove unnecessary metadata or content before persistence
  • For bulk document ingestion, process documents in parallel while respecting rate limits
1# Recommended for conversations
2zep_client.memory.add(
3 session_id="session_123",
4 message={
5 "role": "human",
6 "content": "What's the weather like today?"
7 }
8)
9
10# Recommended for large documents
11await zep_client.graph.add(
12 data=document_content, # Your chunked document content
13 user_id=user_id, # Or group_id for group graphs
14 type="text" # Can be "text", "message", or "json"
15)

Get the memory context string sooner

Additionally, you can request the memory context directly in the response to the memory.add() call. This optimization eliminates the need for a separate memory.get() if you happen to only need the context. Read more about Memory Context.

In this scenario you can pass in the return_context=True flag to the memory.add() method. Zep will perform a user graph search right after persisting the memory and return the context relevant to the recently added memory.

1memory_response = await zep_client.memory.add(
2 session_id=session_id,
3 messages=messages,
4 return_context=True
5)
6
7context = memory_response.context
Read more in the Memory SDK Reference

Optimizing Search Queries

Zep uses hybrid search combining semantic similarity and BM25 full-text search. For optimal performance:

  • Keep your queries concise. Queries are automatically truncated to 8,192 tokens (approximately 32,000 Latin characters)
  • Longer queries may not improve search quality and will increase latency
  • Consider breaking down complex searches into smaller, focused queries
  • Use specific, contextual queries rather than generic ones

Best practices for search:

  • Keep search queries concise and specific
  • Structure queries to target relevant information
  • Use natural language queries for better semantic matching
  • Consider the scope of your search (user vs group graphs)
1# Recommended - concise query
2results = await zep_client.graph.search(
3 user_id=user_id, # Or group_id for group graphs
4 query="project requirements discussion"
5)
6
7# Not recommended - overly long query
8results = await zep_client.graph.search(
9 user_id=user_id,
10 query="very long text with multiple paragraphs..." # Will be truncated
11)

Summary

  • Reuse Zep SDK client instances to optimize connection management
  • Use appropriate methods for different types of content (memory.add for conversations, graph.add for large documents)
  • Keep search queries focused and under the token limit for optimal performance