Low-Level LLM Execution
Sometimes your need more control over LLM interactions than what high-level agents provide. The llm.exec
method makes it simple for you to make a single LLM call with tools but hides the complexity of executing the tools and generating the tool messages.
When to Use llm.exec
Section titled “When to Use llm.exec”Use llm.exec
when you need to:
- Build custom agent logic in workflow steps
- Have precise control over message handling and tool execution
- Extract structured data from LLM responses
Basic Usage
Section titled “Basic Usage”The llm.exec
method takes messages and tools as parameter and executes one LLM call.
The LLM might either request to call one or more of the tools or generate an assistant message as result.
For each tool call that is requested, llm.exec
executes it and generates the two tool call messages (call and result). If no tool call is requested, just the assistant message is returned.
import { openai } from "@llamaindex/openai";import { ChatMessage, tool } from "llamaindex";import z from "zod";
const llm = openai({ model: "gpt-4.1-mini" });const messages = [ { content: "What's the weather like in San Francisco?", role: "user", } as ChatMessage,];
const { newMessages, toolCalls } = await llm.exec({ messages, tools: [ tool({ name: "get_weather", description: "Get the current weather for a location", parameters: z.object({ address: z.string().describe("The address"), }), execute: ({ address }) => { return `It's sunny in ${address}!`; }, }), ],});
// Add the new messages (including tool calls and responses) to your conversationmessages.push(...newMessages);
newMessages
is an array as each tool call generates two messages: a tool call message and the tool call result message.
Structured Output
Section titled “Structured Output”You can use responseFormat
with a Zod schema to get structured data from the LLM response:
import { openai } from "@llamaindex/openai";import { ChatMessage } from "llamaindex";import z from "zod";
const llm = openai({ model: "gpt-4.1-mini" });
const schema = z.object({ title: z.string(), author: z.string(), year: z.number(),});
const messages = [ { role: "user", content: "I have been reading La Divina Commedia by Dante Alighieri, published in 1321", } as ChatMessage,];
const { newMessages, toolCalls, object } = await llm.exec({ messages, responseFormat: schema,});
console.log(object); // { title: "La Divina Commedia", author: "Dante Alighieri", year: 1321 }
Agent Loop Pattern
Section titled “Agent Loop Pattern”A common pattern is to use llm.exec
in a loop until the LLM stops making tool calls:
import { openai } from "@llamaindex/openai";import { ChatMessage, tool } from "llamaindex";import z from "zod";
async function runAgentLoop() { const llm = openai({ model: "gpt-4.1-mini" }); const messages = [ { content: "What's the weather like in San Francisco?", role: "user", } as ChatMessage, ];
let exit = false; do { const { newMessages, toolCalls } = await llm.exec({ messages, tools: [ tool({ name: "get_weather", description: "Get the current weather for a location", parameters: z.object({ address: z.string().describe("The address"), }), execute: ({ address }) => { return `It's sunny in ${address}!`; }, }), ], });
console.log(newMessages); messages.push(...newMessages);
// Exit when no more tool calls are made exit = toolCalls.length === 0; } while (!exit);}
Streaming Support
Section titled “Streaming Support”For real-time responses, use the stream
option to get the assistant’s response as streamed tokens:
import { openai } from "@llamaindex/openai";import { ChatMessage, tool } from "llamaindex";import z from "zod";
async function streamingAgentLoop() { const llm = openai({ model: "gpt-4o-mini" }); const messages = [ { content: "What's the weather like in San Francisco?", role: "user", } as ChatMessage, ];
let exit = false; do { const { stream, newMessages, toolCalls } = await llm.exec({ messages, tools: [ tool({ name: "get_weather", description: "Get the current weather for a location", parameters: z.object({ address: z.string().describe("The address"), }), execute: ({ address }) => { return `It's sunny in ${address}!`; }, }), ], stream: true, });
// Stream the response token by token for await (const chunk of stream) { process.stdout.write(chunk.delta); }
messages.push(...newMessages());
exit = toolCalls.length === 0; } while (!exit);}
newMessages
is a function when streaming. The reason is that the result only is available after streaming. Calling it before, will throw an error.
Return Values
Section titled “Return Values”llm.exec
returns an object with:
newMessages
: Array of new chat messages including the LLM response and any tool call messages (call or result). This is a function return the array when streaming.toolCalls
: Array of tool calls made by the LLMobject
: The structured object when usingresponseFormat
with a Zod schema (undefined if no schema is provided)stream
: Async iterable for streaming responses (only whenstream: true
)
Best Practices
Section titled “Best Practices”For using llm.exec
in an agent loop, take care to:
- Maintain message history: Always add
newMessages
to your conversation history - Set exit conditions: Implement proper logic to avoid infinite loops
- Handle structured output: When using
responseFormat
, theobject
property contains your parsed data