Structured Input for LLMs
It has been observed that most LLMs perfom better when prompted with XML-like content (you can see it in Anthropicβs prompting guide, for instance).
We could refer to this kind of prompting as structured input, and LlamaIndex offers you the possibility of chatting with LLMs exactly through this technique - letβs go through an example in this notebook!
1. Install Needed Dependencies
Section titled β1. Install Needed DependenciesβMake sure to have
llama-index>=0.12.34
installed if you wish to follow this tutorial along without any problemπ
! pip install -q llama-index
[2K [90mββββββββββββββββββββββββββββββββββββββββ[0m [32m7.6/7.6 MB[0m [31m65.4 MB/s[0m eta [36m0:00:00[0m[2K [90mββββββββββββββββββββββββββββββββββββββββ[0m [32m284.6/284.6 kB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m[2K [90mββββββββββββββββββββββββββββββββββββββββ[0m [32m41.0/41.0 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m[2K [90mββββββββββββββββββββββββββββββββββββββββ[0m [32m40.4/40.4 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m[2K [90mββββββββββββββββββββββββββββββββββββββββ[0m [32m309.7/309.7 kB[0m [31m23.8 MB/s[0m eta [36m0:00:00[0m[2K [90mββββββββββββββββββββββββββββββββββββββββ[0m [32m1.2/1.2 MB[0m [31m55.3 MB/s[0m eta [36m0:00:00[0m[2K [90mββββββββββββββββββββββββββββββββββββββββ[0m [32m50.9/50.9 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m[2K [90mββββββββββββββββββββββββββββββββββββββββ[0m [32m129.3/129.3 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.ipython 7.34.0 requires jedi>=0.16, which is not installed.[0m[31m[0m
! pip show llama-index | grep "Version"
Version: 0.12.50
2. Create a Prompt Template
Section titled β2. Create a Prompt TemplateβIn order to use the structured input, we need to create a prompt template that would have a Jinja expression (recognizable by the {{}}
) with a specific filter (to_xml
) that will turn inputs such as Pydantic BaseModel
subclasses, dictionaries or JSON-like strings into XML representations.
from llama_index.core.prompts import RichPromptTemplate
template_str = "Please extract from the following XML code the contact details of the user:\n\n```xml\n{{ data | to_xml }}\n```\n\n"prompt = RichPromptTemplate(template_str)
Letβs now try to format the input as a string, using different objects as data
.
# Using a BaseModel
from pydantic import BaseModelfrom typing import Dictfrom IPython.display import Markdown, display
class User(BaseModel): name: str surname: str age: int email: str phone: str social_accounts: Dict[str, str]
user = User( name="John", surname="Doe", age=30, email="john.doe@example.com", phone="123-456-7890", social_accounts={"bluesky": "john.doe", "instagram": "johndoe1234"},)
display(Markdown(prompt.format(data=user)))
Please extract from the following XML code the contact details of the user:
<user> <name>John</name> <surname>Doe</surname> <age>30</age> <email>john.doe@example.com</email> <phone>123-456-7890</phone> <social_accounts>{'bluesky': 'john.doe', 'instagram': 'johndoe1234'}</social_accounts></user>
# with a dictionary
user_dict = { "name": "John", "surname": "Doe", "age": 30, "email": "john.doe@example.com", "phone": "123-456-7890", "social_accounts": {"bluesky": "john.doe", "instagram": "johndoe1234"},}
display(Markdown(prompt.format(data=user_dict)))
Please extract from the following XML code the contact details of the user:
<input> <name>John</name> <surname>Doe</surname> <age>30</age> <email>john.doe@example.com</email> <phone>123-456-7890</phone> <social_accounts>{'bluesky': 'john.doe', 'instagram': 'johndoe1234'}</social_accounts></input>
# Using a JSON-like string
user_str = '{"name":"John","surname":"Doe","age":30,"email":"john.doe@example.com","phone":"123-456-7890","social_accounts":{"bluesky":"john.doe","instagram":"johndoe1234"}}'
display(Markdown(prompt.format(data=user_str)))
Please extract from the following XML code the contact details of the user:
<input> <name>John</name> <surname>Doe</surname> <age>30</age> <email>john.doe@example.com</email> <phone>123-456-7890</phone> <social_accounts>{'bluesky': 'john.doe', 'instagram': 'johndoe1234'}</social_accounts></input>
3. Chat With an LLM
Section titled β3. Chat With an LLMβNow that we know how to produce structured input, letβs employ it to chat with an LLM!
import osfrom getpass import getpass
os.environ["OPENAI_API_KEY"] = getpass()
Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4.1-mini")
response = await llm.achat(prompt.format_messages(data=user))
print(response.message.content)
The contact details of the user are:
- Email: john.doe@example.com- Phone: 123-456-7890- Social Accounts: - Bluesky: john.doe - Instagram: johndoe1234
4. Use Structured Input and Structured Output
Section titled β4. Use Structured Input and Structured OutputβCombining structured input and structured output might really help to boost the reliability of the outputs of your LLMs - so letβs give it a go!
from pydantic import Fieldfrom typing import Optional
class SocialAccounts(BaseModel): instagram: Optional[str] = Field(default=None) bluesky: Optional[str] = Field(default=None) x: Optional[str] = Field(default=None) mastodon: Optional[str] = Field(default=None)
class ContactDetails(BaseModel): email: str phone: str social_accounts: SocialAccounts
sllm = llm.as_structured_llm(ContactDetails)
structured_response = await sllm.achat(prompt.format_messages(data=user))
print(structured_response.raw.email)print(structured_response.raw.phone)print(structured_response.raw.social_accounts.instagram)print(structured_response.raw.social_accounts.bluesky)
john.doe@example.com123-456-7890johndoe1234john.doe