Structured Input for LLMs

It has been observed that most LLMs perfom better when prompted with XML-like content (you can see it in Anthropic’s prompting guide, for instance).

We could refer to this kind of prompting as structured input, and LlamaIndex offers you the possibility of chatting with LLMs exactly through this technique - let’s go through an example in this notebook!

1. Install Needed Dependencies

Make sure to have llama-index>=0.12.34 installed if you wish to follow this tutorial along without any problem😄

! pip install -q llama-index

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m65.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.6/284.6 kB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.0/41.0 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.4/40.4 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.7/309.7 kB[0m [31m23.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m55.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.3/129.3 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ipython 7.34.0 requires jedi>=0.16, which is not installed.[0m[31m
[0m

! pip show llama-index | grep "Version"

Version: 0.12.50

2. Create a Prompt Template

In order to use the structured input, we need to create a prompt template that would have a Jinja expression (recognizable by the {{}}) with a specific filter (to_xml) that will turn inputs such as Pydantic BaseModel subclasses, dictionaries or JSON-like strings into XML representations.

from llama_index.core.prompts import RichPromptTemplate

template_str = "Please extract from the following XML code the contact details of the user:\n\n```xml\n{{ data | to_xml }}\n```\n\n"
prompt = RichPromptTemplate(template_str)

Let’s now try to format the input as a string, using different objects as data.

# Using a BaseModel

from pydantic import BaseModel
from typing import Dict
from IPython.display import Markdown, display


class User(BaseModel):
    name: str
    surname: str
    age: int
    email: str
    phone: str
    social_accounts: Dict[str, str]


user = User(
    name="John",
    surname="Doe",
    age=30,
    email="john.doe@example.com",
    phone="123-456-7890",
    social_accounts={"bluesky": "john.doe", "instagram": "johndoe1234"},
)

display(Markdown(prompt.format(data=user)))

Please extract from the following XML code the contact details of the user:

<user>
  <name>John</name>
  <surname>Doe</surname>
  <age>30</age>
  <email>john.doe@example.com</email>
  <phone>123-456-7890</phone>
  <social_accounts>{'bluesky': 'john.doe', 'instagram': 'johndoe1234'}</social_accounts>
</user>

# with a dictionary

user_dict = {
    "name": "John",
    "surname": "Doe",
    "age": 30,
    "email": "john.doe@example.com",
    "phone": "123-456-7890",
    "social_accounts": {"bluesky": "john.doe", "instagram": "johndoe1234"},
}

display(Markdown(prompt.format(data=user_dict)))

Please extract from the following XML code the contact details of the user:

<input>
  <name>John</name>
  <surname>Doe</surname>
  <age>30</age>
  <email>john.doe@example.com</email>
  <phone>123-456-7890</phone>
  <social_accounts>{'bluesky': 'john.doe', 'instagram': 'johndoe1234'}</social_accounts>
</input>

# Using a JSON-like string

user_str = '{"name":"John","surname":"Doe","age":30,"email":"john.doe@example.com","phone":"123-456-7890","social_accounts":{"bluesky":"john.doe","instagram":"johndoe1234"}}'

display(Markdown(prompt.format(data=user_str)))

Please extract from the following XML code the contact details of the user:

<input>
  <name>John</name>
  <surname>Doe</surname>
  <age>30</age>
  <email>john.doe@example.com</email>
  <phone>123-456-7890</phone>
  <social_accounts>{'bluesky': 'john.doe', 'instagram': 'johndoe1234'}</social_accounts>
</input>

3. Chat With an LLM

Now that we know how to produce structured input, let’s employ it to chat with an LLM!

import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass()

··········

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4.1-mini")

response = await llm.achat(prompt.format_messages(data=user))

print(response.message.content)

The contact details of the user are:

- Email: john.doe@example.com
- Phone: 123-456-7890
- Social Accounts:
  - Bluesky: john.doe
  - Instagram: johndoe1234

4. Use Structured Input and Structured Output

Combining structured input and structured output might really help to boost the reliability of the outputs of your LLMs - so let’s give it a go!

from pydantic import Field
from typing import Optional


class SocialAccounts(BaseModel):
    instagram: Optional[str] = Field(default=None)
    bluesky: Optional[str] = Field(default=None)
    x: Optional[str] = Field(default=None)
    mastodon: Optional[str] = Field(default=None)


class ContactDetails(BaseModel):
    email: str
    phone: str
    social_accounts: SocialAccounts

sllm = llm.as_structured_llm(ContactDetails)

structured_response = await sllm.achat(prompt.format_messages(data=user))

print(structured_response.raw.email)
print(structured_response.raw.phone)
print(structured_response.raw.social_accounts.instagram)
print(structured_response.raw.social_accounts.bluesky)

john.doe@example.com
123-456-7890
johndoe1234
john.doe