Skip to content

Structured data extraction

Make sure you have installed LlamaIndex.TS and have an OpenAI key. If you haven’t, check out the installation guide.

You can use other LLMs via their APIs; if you would prefer to use local models check out our local LLM example.

In a new folder:

npm init
npm i -D typescript @types/node
npm i @llamaindex/openai zod

Create the file example.ts. This code will:

  • Set up an LLM connection to GPT-4
  • Give an example of the data structure we wish to generate
  • Prompt the LLM with instructions and the example, plus a sample transcript
../../examples/misc/jsonExtract.ts

To run the code:

npx tsx example.ts

You should expect output something like:

{
"summary": "Sarah from XYZ Company called John to introduce the XYZ Widget, a tool designed to automate tasks and improve productivity. John expressed interest and requested case studies and a product demo. Sarah agreed to send the information and follow up to schedule the demo.",
"products": ["XYZ Widget"],
"rep_name": "Sarah",
"prospect_name": "John",
"action_items": [
"Send case studies and additional product information to John",
"Follow up with John to schedule a product demo"
]
}

Many LLMs do not natively support structured output, and often rely exclusively on prompt or context engineering.

In this sense, we proved you with an alternative for structured data extraction, using the exec method with responseFormat.

For example, you can, in a new folder, install our Anthropic integration and zod v3:

npm init
npm i -D typescript @types/node
npm i @llamaindex/anthropic zod@3.25.76

And then try extracting data with this code:

../../examples/agents/tools/response-format-exec.ts

The output should look like this:

{
"title": "La Divina Commedia",
"author": "Dante Alighieri",
"year": 1321
}