NVIDIA NeMo Agent Toolkit | Zep Documentation

What is NeMo Agent Toolkit?

NVIDIA NeMo Agent Toolkit (NAT) is a framework-agnostic library for building AI agents. It uses a configuration-driven approach where you define agents, tools, and workflows in YAML files. NAT works alongside existing frameworks like LangChain and LlamaIndex, adding capabilities like memory and observability without modifying your agent code.

Zep integration

See NVIDIA’s official documentation: Auto Memory Wrapper

The Zep integration for NAT uses the automatic memory wrapper — a general-purpose wrapper that adds memory capabilities to any NAT agent. Rather than requiring agents to explicitly call memory tools, the wrapper intercepts agent invocations and handles memory operations transparently.

This approach guarantees that all conversations are captured and relevant context is retrieved, regardless of which agent type you use or how the agent is implemented.

Why use automatic memory

Traditional tool-based memory requires agents to explicitly invoke memory tools, which can be unreliable. The auto memory wrapper provides:

Guaranteed capture of all user messages and agent responses
Automatic retrieval of relevant context before each agent call
Zero agent configuration — memory operations happen transparently
Universal compatibility with any agent type (ReAct, ReWOO, Tool Calling, Reasoning)

Install dependencies

$ pip install nvidia-nat-zep-cloud

Package information:

Package: nvidia-nat-zep-cloud
Python: >=3.11, <3.13

Quick start

Set your API key

$ export ZEP_API_KEY="your-zep-api-key"

Configure Zep memory

Create a configuration file that defines the Zep memory backend and wraps your agent with automatic memory:

1 memory:
2   zep_memory:
3     _type: nat.plugins.zep_cloud/zep_memory
4 
5 llm:
6   nim_llm:
7     _type: nim
8     model_name: meta/llama-3.3-70b-instruct
9 
10 functions:
11   my_react_agent:
12     _type: react_agent
13     llm_name: nim_llm
14     tool_names: [calculator]
15 
16 workflow:
17   _type: auto_memory_agent
18   inner_agent_name: my_react_agent
19   memory_name: zep_memory
20   llm_name: nim_llm

This configuration wraps a ReAct agent with automatic memory. Every user message and agent response is captured in Zep, and relevant context is retrieved before each agent call.

How it works

The auto memory wrapper intercepts agent invocations and handles memory operations in this sequence:

User message received — incoming message captured
Memory retrieval — relevant context fetched from Zep and injected as a system message
User message stored — message saved to Zep’s thread memory
Agent invocation — wrapped agent processes request with memory context
Response stored — agent response saved to Zep
Response returned — final response sent to user

The wrapped agent is unaware of memory operations — it simply receives enriched context and produces responses.

Configuration reference

Required parameters

Parameter	Description
`inner_agent_name`	Name of the agent function to wrap
`memory_name`	Name of the memory backend (e.g., `zep_memory`)
`llm_name`	Name of the LLM for memory operations

Optional feature flags

All flags default to true:

Parameter	Description
`save_user_messages_to_memory`	Store user messages in Zep
`retrieve_memory_for_every_response`	Fetch relevant context before each agent call
`save_ai_messages_to_memory`	Store agent responses in Zep

Zep-specific parameters

Configure memory retrieval and storage behavior:

1 workflow:
2   _type: auto_memory_agent
3   inner_agent_name: my_react_agent
4   memory_name: zep_memory
5   llm_name: nim_llm
6 
7   search_params:
8     top_k: 5         # Number of memory results to retrieve
9 
10   add_params:
11     ignore_roles: ["assistant"]  # Roles to exclude from graph memory

Multi-tenant memory isolation

Zep automatically isolates memory by user. User IDs are extracted in this priority:

user_manager.get_id() — production with custom auth middleware (recommended)
X-User-ID HTTP header — testing without middleware
"default_user" — fallback for local development

For production deployments, implement a custom user_manager that extracts user IDs from your authentication system.

Full configuration example

1 telemetry:
2   tracer:
3     _type: phoenix
4 
5 llm:
6   nim_llm:
7     _type: nim
8     model_name: meta/llama-3.3-70b-instruct
9     temperature: 0.0
10     max_tokens: 1024
11 
12 memory:
13   zep_memory:
14     _type: nat.plugins.zep_cloud/zep_memory
15 
16 function_groups:
17   calculator:
18     - add
19     - subtract
20     - multiply
21     - divide
22 
23 functions:
24   my_react_agent:
25     _type: react_agent
26     llm_name: nim_llm
27     tool_names: [calculator]
28     system_prompt: "You are a helpful assistant with memory capabilities."
29 
30 workflow:
31   _type: auto_memory_agent
32   inner_agent_name: my_react_agent
33   memory_name: zep_memory
34   llm_name: nim_llm
35 
36   # Feature flags
37   save_user_messages_to_memory: true
38   retrieve_memory_for_every_response: true
39   save_ai_messages_to_memory: true
40 
41   # Zep-specific parameters
42   search_params:
43     top_k: 5
44   add_params:
45     ignore_roles: ["assistant"]

Wrapping different agent types

The auto memory wrapper works with any NeMo agent type:

ReAct Agent

Tool Calling Agent

ReWOO Agent

1 functions:
2   my_agent:
3     _type: react_agent
4     llm_name: nim_llm
5     tool_names: [calculator, search]
6 
7 workflow:
8   _type: auto_memory_agent
9   inner_agent_name: my_agent
10   memory_name: zep_memory
11   llm_name: nim_llm