Low-Level LLM Execution

Sometimes your need more control over LLM interactions than what high-level agents provide. The llm.exec method makes it simple for you to make a single LLM call with tools but hides the complexity of executing the tools and generating the tool messages.

When to Use `llm.exec`

Use llm.exec when you need to:

Build custom agent logic in workflow steps
Have precise control over message handling and tool execution
Extract structured data from LLM responses

Basic Usage

The llm.exec method takes messages and tools as parameter and executes one LLM call. The LLM might either request to call one or more of the tools or generate an assistant message as result. For each tool call that is requested, llm.exec executes it and generates the two tool call messages (call and result). If no tool call is requested, just the assistant message is returned.

import { openai } from "@llamaindex/openai";
import { ChatMessage, tool } from "llamaindex";
import z from "zod";

const llm = openai({ model: "gpt-4.1-mini" });
const messages = [
  {
    content: "What's the weather like in San Francisco?",
    role: "user",
  } as ChatMessage,
];

const { newMessages, toolCalls } = await llm.exec({
  messages,
  tools: [
    tool({
      name: "get_weather",
      description: "Get the current weather for a location",
      parameters: z.object({
        address: z.string().describe("The address"),
      }),
      execute: ({ address }) => {
        return `It's sunny in ${address}!`;
      },
    }),
  ],
});

// Add the new messages (including tool calls and responses) to your conversation
messages.push(...newMessages);

newMessages is an array as each tool call generates two messages: a tool call message and the tool call result message.

Structured Output

You can use responseFormat with a Zod schema to get structured data from the LLM response:

import { openai } from "@llamaindex/openai";
import { ChatMessage } from "llamaindex";
import z from "zod";

const llm = openai({ model: "gpt-4.1-mini" });

const schema = z.object({
  title: z.string(),
  author: z.string(),
  year: z.number(),
});

const messages = [
  {
    role: "user",
    content: "I have been reading La Divina Commedia by Dante Alighieri, published in 1321",
  } as ChatMessage,
];

const { newMessages, toolCalls, object } = await llm.exec({
  messages,
  responseFormat: schema,
});

console.log(object); // { title: "La Divina Commedia", author: "Dante Alighieri", year: 1321 }

Agent Loop Pattern

A common pattern is to use llm.exec in a loop until the LLM stops making tool calls:

import { openai } from "@llamaindex/openai";
import { ChatMessage, tool } from "llamaindex";
import z from "zod";

async function runAgentLoop() {
  const llm = openai({ model: "gpt-4.1-mini" });
  const messages = [
    {
      content: "What's the weather like in San Francisco?",
      role: "user",
    } as ChatMessage,
  ];

  let exit = false;
  do {
    const { newMessages, toolCalls } = await llm.exec({
      messages,
      tools: [
        tool({
          name: "get_weather",
          description: "Get the current weather for a location",
          parameters: z.object({
            address: z.string().describe("The address"),
          }),
          execute: ({ address }) => {
            return `It's sunny in ${address}!`;
          },
        }),
      ],
    });

    console.log(newMessages);
    messages.push(...newMessages);

    // Exit when no more tool calls are made
    exit = toolCalls.length === 0;
  } while (!exit);
}

Streaming Support

For real-time responses, use the stream option to get the assistant’s response as streamed tokens:

import { openai } from "@llamaindex/openai";
import { ChatMessage, tool } from "llamaindex";
import z from "zod";

async function streamingAgentLoop() {
  const llm = openai({ model: "gpt-4o-mini" });
  const messages = [
    {
      content: "What's the weather like in San Francisco?",
      role: "user",
    } as ChatMessage,
  ];

  let exit = false;
  do {
    const { stream, newMessages, toolCalls } = await llm.exec({
      messages,
      tools: [
        tool({
          name: "get_weather",
          description: "Get the current weather for a location",
          parameters: z.object({
            address: z.string().describe("The address"),
          }),
          execute: ({ address }) => {
            return `It's sunny in ${address}!`;
          },
        }),
      ],
      stream: true,
    });

    // Stream the response token by token
    for await (const chunk of stream) {
      process.stdout.write(chunk.delta);
    }

    messages.push(...newMessages());

    exit = toolCalls.length === 0;
  } while (!exit);
}

newMessages is a function when streaming. The reason is that the result only is available after streaming. Calling it before, will throw an error.

Return Values

llm.exec returns an object with:

newMessages: Array of new chat messages including the LLM response and any tool call messages (call or result). This is a function return the array when streaming.
toolCalls: Array of tool calls made by the LLM
object: The structured object when using responseFormat with a Zod schema (undefined if no schema is provided)
stream: Async iterable for streaming responses (only when stream: true)

Best Practices

For using llm.exec in an agent loop, take care to:

Maintain message history: Always add newMessages to your conversation history
Set exit conditions: Implement proper logic to avoid infinite loops
Handle structured output: When using responseFormat, the object property contains your parsed data