Graphiti works best with LLM services that support Structured Output (such as OpenAI and Gemini). Using other services may result in incorrect output schemas and ingestion failures, particularly when using smaller models.
Graphiti defaults to using OpenAI for LLM inference and embeddings, but supports multiple LLM providers including Azure OpenAI, Google Gemini, Anthropic, Groq, and local models via Ollama. This guide covers configuring Graphiti with alternative LLM providers.
Azure OpenAI v1 API Opt-in Required for Structured Outputs
Graphiti uses structured outputs via the client.beta.chat.completions.parse() method, which requires Azure OpenAI deployments to opt into the v1 API. Without this opt-in, you’ll encounter 404 Resource not found errors during episode ingestion.
To enable v1 API support in your Azure OpenAI deployment, follow Microsoft’s guide: Azure OpenAI API version lifecycle.
Azure OpenAI deployments often require different endpoints for LLM and embedding services, and separate deployments for default and small models.
Make sure to replace the placeholder values with your actual Azure OpenAI credentials and deployment names.
Azure OpenAI can also be configured using environment variables:
AZURE_OPENAI_ENDPOINT - Azure OpenAI LLM endpoint URLAZURE_OPENAI_DEPLOYMENT_NAME - Azure OpenAI LLM deployment nameAZURE_OPENAI_API_VERSION - Azure OpenAI API versionAZURE_OPENAI_EMBEDDING_API_KEY - Azure OpenAI Embedding deployment key (if different from OPENAI_API_KEY)AZURE_OPENAI_EMBEDDING_ENDPOINT - Azure OpenAI Embedding endpoint URLAZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME - Azure OpenAI embedding deployment nameAZURE_OPENAI_EMBEDDING_API_VERSION - Azure OpenAI embedding API versionAZURE_OPENAI_USE_MANAGED_IDENTITY - Use Azure Managed Identities for authenticationGoogle’s Gemini models provide excellent structured output support and can be used for LLM inference, embeddings, and cross-encoding/reranking.
The Gemini reranker uses the gemini-2.0-flash-exp model by default, which is optimized for cost-effective and low-latency classification tasks.
Google Gemini can be configured using:
GOOGLE_API_KEY - Your Google API keyAnthropic’s Claude models can be used for LLM inference with OpenAI embeddings and reranking.
When using Anthropic for LLM inference, you still need an OpenAI API key for embeddings and reranking functionality. Make sure to set both ANTHROPIC_API_KEY and OPENAI_API_KEY environment variables.
Anthropic can be configured using:
ANTHROPIC_API_KEY - Your Anthropic API keyOPENAI_API_KEY - Required for embeddings and rerankingGroq provides fast inference with various open-source models, using OpenAI for embeddings and reranking.
When using Groq, avoid smaller models as they may not accurately extract data or output the correct JSON structures required by Graphiti. Use larger, more capable models like Llama 3.1 70B for best results.
Groq can be configured using:
GROQ_API_KEY - Your Groq API keyOPENAI_API_KEY - Required for embeddingsOllama enables running local LLMs and embedding models via its OpenAI-compatible API, ideal for privacy-focused applications or avoiding API costs.
When using Ollama, avoid smaller local models as they may not accurately extract data or output the correct JSON structures required by Graphiti. Use larger, more capable models and ensure they support structured output for reliable knowledge graph construction.
Ollama provides an OpenAI-compatible API, but does not support the /v1/responses endpoint that OpenAIClient uses. Use OpenAIGenericClient instead, which uses the /v1/chat/completions endpoint with response_format for structured outputs—both of which Ollama supports.
First, install and configure Ollama:
Ensure Ollama is running (ollama serve) and that you have pulled the models you want to use.
Many LLM providers offer OpenAI-compatible APIs. Use the OpenAIGenericClient for these services, which ensures proper schema injection for JSON output since most providers don’t support OpenAI’s structured output format.
When using OpenAI-compatible services, avoid smaller models as they may not accurately extract data or output the correct JSON structures required by Graphiti. Choose larger, more capable models that can handle complex reasoning and structured output.
Replace the placeholder values with your actual service credentials and model names.