NVIDIA NeMo Agent Toolkit

Use Zep for automatic memory in NVIDIA NeMo Agent Toolkit agents.

What is NeMo Agent Toolkit?

NVIDIA NeMo Agent Toolkit (NAT) is a framework-agnostic library for building AI agents. It uses a configuration-driven approach where you define agents, tools, and workflows in YAML files. NAT works alongside existing frameworks like LangChain and LlamaIndex, adding capabilities like memory and observability without modifying your agent code.

Zep integration

The Zep integration for NAT uses the automatic memory wrapper — a general-purpose wrapper that adds memory capabilities to any NAT agent. Rather than requiring agents to explicitly call memory tools, the wrapper intercepts agent invocations and handles memory operations transparently.

This approach guarantees that all conversations are captured and relevant context is retrieved, regardless of which agent type you use or how the agent is implemented.

Why use automatic memory

Traditional tool-based memory requires agents to explicitly invoke memory tools, which can be unreliable. The auto memory wrapper provides:

  • Guaranteed capture of all user messages and agent responses
  • Automatic retrieval of relevant context before each agent call
  • Zero agent configuration — memory operations happen transparently
  • Universal compatibility with any agent type (ReAct, ReWOO, Tool Calling, Reasoning)

Install dependencies

$pip install nvidia-nat-zep-cloud

Package information:

  • Package: nvidia-nat-zep-cloud
  • Python: >=3.11, <3.13

Quick start

Set your API key

$export ZEP_API_KEY="your-zep-api-key"

Configure Zep memory

Create a configuration file that defines the Zep memory backend and wraps your agent with automatic memory:

1memory:
2 zep_memory:
3 _type: nat.plugins.zep_cloud/zep_memory
4
5llm:
6 nim_llm:
7 _type: nim
8 model_name: meta/llama-3.3-70b-instruct
9
10functions:
11 my_react_agent:
12 _type: react_agent
13 llm_name: nim_llm
14 tool_names: [calculator]
15
16workflow:
17 _type: auto_memory_agent
18 inner_agent_name: my_react_agent
19 memory_name: zep_memory
20 llm_name: nim_llm

This configuration wraps a ReAct agent with automatic memory. Every user message and agent response is captured in Zep, and relevant context is retrieved before each agent call.

How it works

The auto memory wrapper intercepts agent invocations and handles memory operations in this sequence:

  1. User message received — incoming message captured
  2. Memory retrieval — relevant context fetched from Zep and injected as a system message
  3. User message stored — message saved to Zep’s thread memory
  4. Agent invocation — wrapped agent processes request with memory context
  5. Response stored — agent response saved to Zep
  6. Response returned — final response sent to user

The wrapped agent is unaware of memory operations — it simply receives enriched context and produces responses.

Configuration reference

Required parameters

ParameterDescription
inner_agent_nameName of the agent function to wrap
memory_nameName of the memory backend (e.g., zep_memory)
llm_nameName of the LLM for memory operations

Optional feature flags

All flags default to true:

ParameterDescription
save_user_messages_to_memoryStore user messages in Zep
retrieve_memory_for_every_responseFetch relevant context before each agent call
save_ai_messages_to_memoryStore agent responses in Zep

Zep-specific parameters

Configure memory retrieval and storage behavior:

1workflow:
2 _type: auto_memory_agent
3 inner_agent_name: my_react_agent
4 memory_name: zep_memory
5 llm_name: nim_llm
6
7 search_params:
8 mode: "summary" # "basic" (fast) or "summary" (comprehensive)
9 top_k: 5 # Number of memory results to retrieve
10
11 add_params:
12 ignore_roles: ["assistant"] # Roles to exclude from graph memory

Search modes:

  • basic — fast retrieval, P95 latency under 200ms
  • summary — comprehensive retrieval including summaries and context

Multi-tenant memory isolation

Zep automatically isolates memory by user. User IDs are extracted in this priority:

  1. user_manager.get_id() — production with custom auth middleware (recommended)
  2. X-User-ID HTTP header — testing without middleware
  3. "default_user" — fallback for local development

For production deployments, implement a custom user_manager that extracts user IDs from your authentication system.

Full configuration example

1telemetry:
2 tracer:
3 _type: phoenix
4
5llm:
6 nim_llm:
7 _type: nim
8 model_name: meta/llama-3.3-70b-instruct
9 temperature: 0.0
10 max_tokens: 1024
11
12memory:
13 zep_memory:
14 _type: nat.plugins.zep_cloud/zep_memory
15
16function_groups:
17 calculator:
18 - add
19 - subtract
20 - multiply
21 - divide
22
23functions:
24 my_react_agent:
25 _type: react_agent
26 llm_name: nim_llm
27 tool_names: [calculator]
28 system_prompt: "You are a helpful assistant with memory capabilities."
29
30workflow:
31 _type: auto_memory_agent
32 inner_agent_name: my_react_agent
33 memory_name: zep_memory
34 llm_name: nim_llm
35
36 # Feature flags
37 save_user_messages_to_memory: true
38 retrieve_memory_for_every_response: true
39 save_ai_messages_to_memory: true
40
41 # Zep-specific parameters
42 search_params:
43 mode: "summary"
44 top_k: 5
45 add_params:
46 ignore_roles: ["assistant"]

Wrapping different agent types

The auto memory wrapper works with any NeMo agent type:

1functions:
2 my_agent:
3 _type: react_agent
4 llm_name: nim_llm
5 tool_names: [calculator, search]
6
7workflow:
8 _type: auto_memory_agent
9 inner_agent_name: my_agent
10 memory_name: zep_memory
11 llm_name: nim_llm

Resources