Vercel AI SDK integration

Add long-term agent memory to Vercel AI SDK applications

The @getzep/zep-vercel-ai package adds long-term memory to the Vercel AI SDK (v6), backed by Zep’s temporal knowledge graph. It exposes Zep through three layers so you can pick the integration point that fits your call: middleware, helpers, and tools.

Core benefits

  • Automatic context injection: Middleware prepends Zep’s context block as a system message on each new user turn
  • One write per turn: An onFinish callback persists the full turn once, even across a multi-step tool loop
  • On-demand tools: Let the model search and persist memory explicitly inside a tool loop
  • Works with generateText and streamText: The same inject-and-persist pattern applies to both
  • Graceful degradation: A Zep outage degrades to “no memory” and never crashes the host call

How it works

Inject the context block via middleware and persist the whole turn via onFinish: the tool loop calls the model once per step, so persisting from a per-step hook would fragment one turn across many writes.

The package exposes three layers:

LayerExportUse when
MiddlewarecreateZepMiddlewareYou want the context block injected automatically as a system message on each new user turn (injection only)
HelpersgetZepContext, persistZepTurn, createZepOnFinishYou want explicit control over fetching context and persisting turns
ToolscreateZepToolsYou want the model to retrieve and persist on demand inside a tool loop

createZepOnFinish fires exactly once per turn with the final assistant text, so persistence lives there.

Installation

$npm install @getzep/zep-vercel-ai @getzep/zep-cloud ai zod

Requires Node.js 20+, ai>=6 (the Vercel AI SDK v6; not compatible with v5), zod 3 or 4, @getzep/zep-cloud>=3.23.0, and a Zep Cloud API key. You’ll also want a model provider such as @ai-sdk/openai. Get your API key from app.getzep.com.

Set up your environment variables:

$export ZEP_API_KEY="your-zep-api-key"
$export OPENAI_API_KEY="your-openai-api-key"

Usage with generateText

Provision the Zep user and thread, wrap the model to inject context, optionally add tools, and persist the turn via onFinish:

TypeScript
1import { ZepClient } from "@getzep/zep-cloud";
2import { openai } from "@ai-sdk/openai";
3import { generateText, stepCountIs, wrapLanguageModel } from "ai";
4import {
5 createZepMiddleware,
6 createZepOnFinish,
7 createZepTools,
8 ensureZepUserAndThread,
9} from "@getzep/zep-vercel-ai";
10
11const client = new ZepClient({ apiKey: process.env.ZEP_API_KEY! });
12
13// 1. Provision the Zep user + thread before the first turn.
14await ensureZepUserAndThread({ client, userId: "u1", threadId: "t1", firstName: "Jane" });
15
16// 2. Wrap the model: inject the context block on each new user turn (inject-only).
17const model = wrapLanguageModel({
18 model: openai("gpt-4o-mini"),
19 middleware: createZepMiddleware({ client, threadId: "t1" }),
20});
21
22// 3. Optionally let the model search/store memory explicitly.
23const tools = createZepTools(client, { binding: { userId: "u1", threadId: "t1" } });
24
25// 4. Persist the whole turn once per turn via onFinish.
26const prompt = "What do you remember about me?";
27const { text } = await generateText({
28 model,
29 tools,
30 stopWhen: stepCountIs(5),
31 prompt,
32 onFinish: createZepOnFinish({ client, threadId: "t1", user: prompt }),
33});

If your OpenAI organization enforces Zero Data Retention, use openai.chat('gpt-4o-mini') (Chat Completions API) instead of openai('gpt-4o-mini'). The Responses API references server-persisted item IDs across a multi-step tool loop, which ZDR organizations reject. This is an OpenAI account constraint, not a Zep issue.

Usage with streamText

The same pattern works unchanged for streaming — inject via middleware, persist via onFinish:

TypeScript
1import { streamText, wrapLanguageModel } from "ai";
2import { openai } from "@ai-sdk/openai";
3import { createZepMiddleware, createZepOnFinish } from "@getzep/zep-vercel-ai";
4
5const userInput = "I just adopted a beagle named Cooper.";
6
7const model = wrapLanguageModel({
8 model: openai("gpt-4o-mini"),
9 middleware: createZepMiddleware({ client, threadId: "t1" }),
10});
11
12const result = streamText({
13 model,
14 prompt: userInput,
15 onFinish: createZepOnFinish({ client, threadId: "t1", user: userInput }),
16});
17
18for await (const chunk of result.textStream) process.stdout.write(chunk);

To set the system prompt yourself instead of using the middleware, fetch the block with getZepContext and persist with persistZepTurn (or createZepOnFinish) directly.

The layers in detail

createZepMiddleware

Returns a Vercel AI SDK LanguageModelMiddleware for wrapLanguageModel. Injection only — it does not persist. Its transformParams fetches the context block and prepends it as a system message, but only on a genuine new user turn (detected by the last prompt message being a user message). On tool-loop continuation steps it injects nothing, so the block is fetched at most once per turn. Options include formatContext, templateId, and logger.

createZepOnFinish

Returns an onFinish callback that persists the whole turn once — the user’s input plus the final assistant text — via thread.addMessages. Because onFinish fires exactly once per turn for both generateText and streamText, it records exactly one user message and one assistant message and never writes intermediate tool-call preamble. Supply the user side via user (a string or a (event) => string resolver); the assistant side is taken from event.text.

getZepContext and persistZepTurn

Plain async functions with no framework coupling. getZepContext returns the prompt-ready context block string. persistZepTurn writes a { user?, assistant? } turn; pass { returnContext: true } to fold persist and retrieval into one round-trip.

createZepTools

Returns { zepSearch, zepRemember, zepContext } built with the AI SDK’s tool() and Zod schemas. Spread them into a generateText / streamText tools record so the model decides when to retrieve or persist. Each tool is also exported as a standalone factory (createZepSearchTool, createZepRememberTool, createZepContextTool).

The tools return typed results: zepSearch{ facts: string[], found: boolean }, zepRemember{ stored: boolean, message: string }, and zepContext{ context: string, found: boolean }. zepSearch defaults to the edges scope (facts/relationships) — the most useful scope for an agent recalling discrete claims — and its facts are extracted strings tailored to the bound scope (edge facts, "name: summary" for entities, episode content, and so on).

Binding: user graph vs standalone graph

createZepTools is bound to a graph via a ZepBinding:

  • userId targets a user graph — the home for personalized agent memory. zepContext and the middleware also need a threadId (the thread scopes relevance; retrieval still spans the whole user graph).
  • graphId targets a standalone graph — shared or domain knowledge such as a product knowledge base or runbooks.

If both are set, userId wins. If neither is set, tools return a graceful “not configured” result instead of throwing.

Best practices

  • Inject via middleware, persist via onFinish — this records exactly one user and one assistant message per turn
  • Call ensureZepUserAndThread once before the first turn, then reuse a single ZepClient
  • Use AI SDK v6 — this package is not compatible with v5
  • Don’t read-after-write within a turn — Zep builds the graph asynchronously, so a just-stored fact is not instantly retrievable

Next steps