OpenAI
Installation
Section titled “Installation”npm i llamaindex @llamaindex/openai
import { OpenAI } from "@llamaindex/openai";import { Settings } from "llamaindex";
Settings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0, apiKey: <YOUR_API_KEY> });
You can setup the apiKey on the environment variables, like:
export OPENAI_API_KEY="<YOUR_API_KEY>"
You can optionally set a custom base URL, like:
export OPENAI_BASE_URL="https://api.scaleway.ai/v1"
or
Settings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0, apiKey: <YOUR_API_KEY>, baseURL: "https://api.scaleway.ai/v1" });
Using OpenAI Responses API
Section titled “Using OpenAI Responses API”The OpenAI Responses API provides enhanced functionality for handling complex interactions, including built-in tools, annotations, and streaming responses. Here’s how to use it:
Basic Setup
Section titled “Basic Setup”import { openaiResponses } from "@llamaindex/openai";
const llm = openaiResponses({ model: "gpt-4o", temperature: 0.1, maxOutputTokens: 1000});
Message Content Types
Section titled “Message Content Types”The API supports different types of message content, including text and images:
const response = await llm.chat({ messages: [ { role: "user", content: [ { type: "input_text", text: "What's in this image?" }, { type: "input_image", image_url: "https://example.com/image.jpg", detail: "auto" // Optional: can be "auto", "low", or "high" } ] } ]});
Advanced Features
Section titled “Advanced Features”Built-in Tools
Section titled “Built-in Tools”const llm = openaiResponses({ model: "gpt-4o", builtInTools: [ { type: "function", name: "search_files", description: "Search through available files" } ], strict: true // Enable strict mode for tool calls});
Response Tracking and Storage
Section titled “Response Tracking and Storage”const llm = openaiResponses({ trackPreviousResponses: true, // Enable response tracking store: true, // Store responses for future reference user: "user-123", // Associate responses with a user callMetadata: { // Add custom metadata sessionId: "session-123", context: "customer-support" }});
Streaming Responses
Section titled “Streaming Responses”const response = await llm.chat({ messages: [ { role: "user", content: "Generate a long response" } ], stream: true // Enable streaming});
for await (const chunk of response) { console.log(chunk.delta); // Process each chunk of the response}
Configuration Options
Section titled “Configuration Options”The OpenAI Responses API supports various configuration options:
const llm = openaiResponses({ // Model and basic settings model: "gpt-4o", temperature: 0.1, topP: 1, maxOutputTokens: 1000,
// API configuration apiKey: "your-api-key", baseURL: "custom-endpoint", maxRetries: 10, timeout: 60000,
// Response handling trackPreviousResponses: false, store: false, strict: false,
// Additional options instructions: "Custom instructions for the model", truncation: "auto", // Can be "auto", "disabled", or null include: ["citations", "reasoning"] // Specify what to include in responses});
Response Structure
Section titled “Response Structure”The API returns responses with rich metadata and optional annotations:
interface ResponseStructure { message: { content: string; role: "assistant"; options: { built_in_tool_calls: Array<ToolCall>; annotations?: Array<Citation | URLCitation | FilePath>; refusal?: string; reasoning?: ReasoningItem; usage?: ResponseUsage; toolCall?: Array<PartialToolCall>; } }}
Best Practices
Section titled “Best Practices”- Use
trackPreviousResponses
when you need conversation continuity - Enable
strict
mode when using tools to ensure accurate function calls - Set appropriate
maxOutputTokens
to control response length - Use
annotations
to track citations and references in responses - Implement error handling for potential API failures and retries
Using JSON Response Format
Section titled “Using JSON Response Format”You can configure OpenAI to return responses in JSON format:
Settings.llm = new OpenAI({ model: "gpt-4o", temperature: 0, responseFormat: { type: "json_object" }});
// You can also use a Zod schema to validate the response structureimport { z } from "zod";
const responseSchema = z.object({ summary: z.string(), topics: z.array(z.string()), sentiment: z.enum(["positive", "negative", "neutral"])});
Settings.llm = new OpenAI({ model: "gpt-4o", temperature: 0, responseFormat: responseSchema});
Response Formats
Section titled “Response Formats”The OpenAI LLM supports different response formats to structure the output in specific ways. There are two main approaches to formatting responses:
1. JSON Object Format
Section titled “1. JSON Object Format”The simplest way to get structured JSON responses is using the json_object
response format:
Settings.llm = new OpenAI({ model: "gpt-4o", temperature: 0, responseFormat: { type: "json_object" }});
const response = await llm.chat({ messages: [ { role: "system", content: "You are a helpful assistant that outputs JSON." }, { role: "user", content: "Summarize this meeting transcript" } ]});
// Response will be valid JSONconsole.log(response.message.content);
2. Schema Validation with Zod
Section titled “2. Schema Validation with Zod”For more robust type safety and validation, you can use Zod schemas to define the expected response structure:
import { z } from "zod";
// Define the response schemaconst meetingSchema = z.object({ summary: z.string(), participants: z.array(z.string()), actionItems: z.array(z.string()), nextSteps: z.string()});
// Configure the LLM with the schemaSettings.llm = new OpenAI({ model: "gpt-4o", temperature: 0, responseFormat: meetingSchema});
const response = await llm.chat({ messages: [ { role: "user", content: "Summarize this meeting transcript" } ]});
// Response will be typed and validated according to the schemaconst result = response.message.content;console.log(result.summary);console.log(result.actionItems);
Response Format Options
Section titled “Response Format Options”The response format can be configured in two ways:
- At LLM initialization:
const llm = new OpenAI({ model: "gpt-4o", responseFormat: { type: "json_object" } // or a Zod schema});
- Per request:
const response = await llm.chat({ messages: [...], responseFormat: { type: "json_object" } // or a Zod schema});
The response format options are:
{ type: "json_object" }
- Returns responses as JSON objectszodSchema
- A Zod schema that defines and validates the response structure
Best Practices
Section titled “Best Practices”- Use JSON object format for simple structured responses
- Use Zod schemas when you need:
- Type safety
- Response validation
- Complex nested structures
- Specific field constraints
- Set a low temperature (e.g. 0) when using structured outputs for more reliable formatting
- Include clear instructions in system or user messages about the expected response format
- Handle potential parsing errors when working with JSON responses
Load and index documents
Section titled “Load and index documents”For this example, we will use a single document. In a real-world scenario, you would have multiple documents to index.
import { Document, VectorStoreIndex } from "llamaindex";
const document = new Document({ text: essay, id_: "essay" });
const index = await VectorStoreIndex.fromDocuments([document]);
const queryEngine = index.asQueryEngine();
const query = "What is the meaning of life?";
const results = await queryEngine.query({ query,});
Full Example
Section titled “Full Example”import { OpenAI } from "@llamaindex/openai";import { Document, Settings, VectorStoreIndex } from "llamaindex";
// Use the OpenAI LLMSettings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0 });
async function main() { const document = new Document({ text: essay, id_: "essay" });
// Load and index documents const index = await VectorStoreIndex.fromDocuments([document]);
// get retriever const retriever = index.asRetriever();
// Create a query engine const queryEngine = index.asQueryEngine({ retriever, });
const query = "What is the meaning of life?";
// Query const response = await queryEngine.query({ query, });
// Log the response console.log(response.response);}
API Reference
Section titled “API Reference”OpenAI Live LLM
Section titled “OpenAI Live LLM”The OpenAI Live LLM integration in LlamaIndex provides real-time chat capabilities with support for audio streaming and tool calling.
Basic Usage
Section titled “Basic Usage”import { openai } from "@llamaindex/openai";import { tool, ModalityType } from "llamaindex";
// Get the ephimeral key on the serverconst serverllm = openai({ apiKey: "your-api-key", model: "gpt-4o-realtime-preview-2025-06-03",});
// Get an ephemeral key// Usually this code is run on the server and the ephemeral key is passed to the// client - the ephemeral key can be securely used on the client sideconst ephemeralKey = await serverllm.live.getEphemeralKey();
// Create a client-side LLM instance with the ephemeral keyconst llm = openai({ apiKey: ephemeralKey, model: "gpt-4o-realtime-preview-2025-06-03"});
// Create a live sessionimport { tool } from "llamaindex";const session = await llm.live.connect({ systemInstruction: "You are a helpful assistant.",});
// Send a messagesession.sendMessage({ content: "Hello!", role: "user",});
Tool Integration
Section titled “Tool Integration”Tools are handled server-side, making it simple to pass them to the live session:
// Define your toolsconst weatherTool = tool({ name: "weather", description: "Get the weather for a location", parameters: z.object({ location: z.string().describe("The location to get weather for"), }), execute: async ({ location }) => { return `The weather in ${location} is sunny`; },});
// Create session with toolsconst session = await llm.live.connect({ systemInstruction: "You are a helpful assistant.", tools: [weatherTool],});
Audio Support
Section titled “Audio Support”For audio capabilities:
// Get microphone accessconst userStream = await navigator.mediaDevices.getUserMedia({ audio: true,});
// Create session with audioconst session = await llm.live.connect({ audioConfig: { stream: userStream, onTrack: (remoteStream) => { // Handle incoming audio audioElement.srcObject = remoteStream; }, },});
Event Handling
Section titled “Event Handling”Listen to events from the session:
for await (const event of session.streamEvents()) { if (liveEvents.open.include(event)) { // Connection established console.log("Connected!"); } else if (liveEvents.text.include(event)) { // Received text response console.log("Assistant:", event.text); }}
Capabilities
Section titled “Capabilities”The OpenAI Live LLM supports:
- Real-time text chat
- Audio streaming (if configured)
- Tool calling (server-side execution)
- Ephemeral key generation for secure sessions
API Reference
Section titled “API Reference”LiveLLM Methods
Section titled “LiveLLM Methods”// Get an ephemeral key // Usually this code is run on the server and the ephemeral key is passed to the // client - the ephemeral key can be securely used on the client side
connect(config?: LiveConnectConfig)
Section titled “connect(config?: LiveConnectConfig)”Creates a new live session.
interface LiveConnectConfig { systemInstruction?: string; tools?: BaseTool[]; audioConfig?: AudioConfig; responseModality?: ModalityType[];}
getEphemeralKey()
Section titled “getEphemeralKey()”Gets a temporary key for the session.
LiveLLMSession Methods
Section titled “LiveLLMSession Methods”sendMessage(message: ChatMessage)
Section titled “sendMessage(message: ChatMessage)”Sends a message to the assistant.
interface ChatMessage { content: string | MessageContentDetail[]; role: "user" | "assistant";}
disconnect()
Section titled “disconnect()”Closes the session and cleans up resources.
Error Handling
Section titled “Error Handling”try { const session = await llm.live.connect();} catch (error) { if (error instanceof Error) { console.error("Connection failed:", error.message); }}
Best Practices
Section titled “Best Practices”-
Tool Definition
- Keep tool implementations server-side
- Use clear descriptions for tools
- Handle tool errors gracefully
-
Session Management
- Always disconnect sessions when done
- Clean up audio resources
- Handle reconnection scenarios
-
Security
- Use ephemeral keys for sessions
- Validate tool inputs
- Secure API key handling