NVIDIA NeMo Agent Toolkit
What is NeMo Agent Toolkit?
NVIDIA NeMo Agent Toolkit (NAT) is a framework-agnostic library for building AI agents. It uses a configuration-driven approach where you define agents, tools, and workflows in YAML files. NAT works alongside existing frameworks like LangChain and LlamaIndex, adding capabilities like memory and observability without modifying your agent code.
Zep integration
The Zep integration for NAT uses the automatic memory wrapper — a general-purpose wrapper that adds memory capabilities to any NAT agent. Rather than requiring agents to explicitly call memory tools, the wrapper intercepts agent invocations and handles memory operations transparently.
This approach guarantees that all conversations are captured and relevant context is retrieved, regardless of which agent type you use or how the agent is implemented.
Why use automatic memory
Traditional tool-based memory requires agents to explicitly invoke memory tools, which can be unreliable. The auto memory wrapper provides:
- Guaranteed capture of all user messages and agent responses
- Automatic retrieval of relevant context before each agent call
- Zero agent configuration — memory operations happen transparently
- Universal compatibility with any agent type (ReAct, ReWOO, Tool Calling, Reasoning)
Install dependencies
Package information:
- Package:
nvidia-nat-zep-cloud - Python:
>=3.11, <3.13
Quick start
Set your API key
Configure Zep memory
Create a configuration file that defines the Zep memory backend and wraps your agent with automatic memory:
This configuration wraps a ReAct agent with automatic memory. Every user message and agent response is captured in Zep, and relevant context is retrieved before each agent call.
How it works
The auto memory wrapper intercepts agent invocations and handles memory operations in this sequence:
- User message received — incoming message captured
- Memory retrieval — relevant context fetched from Zep and injected as a system message
- User message stored — message saved to Zep’s thread memory
- Agent invocation — wrapped agent processes request with memory context
- Response stored — agent response saved to Zep
- Response returned — final response sent to user
The wrapped agent is unaware of memory operations — it simply receives enriched context and produces responses.
Configuration reference
Required parameters
Optional feature flags
All flags default to true:
Zep-specific parameters
Configure memory retrieval and storage behavior:
Search modes:
basic— fast retrieval, P95 latency under 200mssummary— comprehensive retrieval including summaries and context
Multi-tenant memory isolation
Zep automatically isolates memory by user. User IDs are extracted in this priority:
user_manager.get_id()— production with custom auth middleware (recommended)X-User-IDHTTP header — testing without middleware"default_user"— fallback for local development
For production deployments, implement a custom user_manager that extracts user IDs from your authentication system.
Full configuration example
Wrapping different agent types
The auto memory wrapper works with any NeMo agent type: