LLM Pydantic Program - NVIDIA

This guide shows you how to generate structured data with our LLMTextCompletionProgram. Given an LLM as well as an output Pydantic class, generate a structured Pydantic object.

In terms of the target object, you can choose to directly specify output_cls, or specify a PydanticOutputParser or any other BaseOutputParser that generates a Pydantic object.

in the examples below, we show you different ways of extracting into the Album object (which can contain a list of Song objects)

Extract into `Album` class

This is a simple example of parsing an output into an Album schema, which can contain multiple songs.

Just pass Album into the output_cls property on initialization of the LLMTextCompletionProgram.

%pip install llama-index-readers-file llama-index-embeddings-nvidia llama-index-llms-nvidia

import getpass
import os

# del os.environ['NVIDIA_API_KEY']  ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith(
        "nvapi-"
    ), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

from pydantic import BaseModel
from typing import List
from llama_index.core import Settings
from llama_index.llms.nvidia import NVIDIA
from llama_index.embeddings.nvidia import NVIDIAEmbedding
from llama_index.core.program import LLMTextCompletionProgram
from llama_index.core.program import FunctionCallingProgram

llm = NVIDIA()

embedder = NVIDIAEmbedding(model="NV-Embed-QA", truncate="END")
Settings.embed_model = embedder
Settings.llm = llm

class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)

Run program to get structured output.

output = program(movie_name="The Shining")

The output is a valid Pydantic object that we can then use to call functions/APIs.

output

from llama_index.core.output_parsers import PydanticOutputParser

program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(output_cls=Album),
    prompt_template_str=prompt_template_str,
    verbose=True,
)

output = program(movie_name="Lord of the Rings")
output

Define a Custom Output Parser

Sometimes you may want to parse an output your own way into a JSON object.

from llama_index.core.output_parsers import BaseOutputParser


class CustomAlbumOutputParser(BaseOutputParser):
    """Custom Album output parser.

    Assume first line is name and artist.

    Assume each subsequent line is the song.

    """

    def __init__(self, verbose: bool = False):
        self.verbose = verbose

    def parse(self, output: str) -> Album:
        """Parse output."""
        if self.verbose:
            print(f"> Raw output: {output}")
        lines = output.split("\n")
        lines = list(filter(None, (line.strip() for line in lines)))
        name, artist = lines[1].split(",")
        songs = []
        for i in range(2, len(lines)):
            title, length_seconds = lines[i].split(",")
            songs.append(Song(title=title, length_seconds=length_seconds))

        return Album(name=name, artist=artist, songs=songs)

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\

Return answer in following format.
The first line is:
<album_name>, <album_artist>
Every subsequent line is a song with format:
<song_title>, <song_length_in_seconds>

"""
program = LLMTextCompletionProgram.from_defaults(
    output_parser=CustomAlbumOutputParser(verbose=True),
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)

output = program(movie_name="The Dark Knight")
print(output)

Function Calling Program for Structured Extraction

This guide shows you how to do structured data extraction with our FunctionCallingProgram. Given a function-calling LLM as well as an output Pydantic class, generate a structured Pydantic object.

in the examples below, we show you different ways of extracting into the Album object (which can contain a list of Song objects).

NOTE: The FunctionCallingProgram only works with LLMs that natively support function calling, by inserting the schema of the Pydantic object as the “tool parameters” for a tool. For all other LLMs, please use our LLMTextCompletionProgram, which will directly prompt the model through text to get back a structured output.

Without docstring in Model

llm = NVIDIA(model="meta/llama-3.1-8b-instruct")

class Song(BaseModel):
    title: str
    length_seconds: int


class Album(BaseModel):
    name: str
    artist: str
    songs: List[Song]

Define pydantic program

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""

program = FunctionCallingProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
    llm=llm,
)

Run program to get structured output.

output = program(
    movie_name="The Shining", description="Data model for an album."
)

With docstring in Model

class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = FunctionCallingProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
    llm=llm,
)

Run program to get structured output.

output = program(movie_name="The Shining")

The output is a valid Pydantic object that we can then use to call functions/APIs.

output

Langchain Output Parsing

Download Data

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Load documents, build the VectorStoreIndex

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from IPython.display import Markdown, display

# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

index = VectorStoreIndex.from_documents(documents, chunk_size=512)

Define Query + Langchain Output Parser

from llama_index.core.output_parsers import LangchainOutputParser
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

Define custom QA and Refine Prompts

response_schemas = [
    ResponseSchema(
        name="Education",
        description=(
            "Describes the author's educational experience/background."
        ),
    ),
    ResponseSchema(
        name="Work",
        description="Describes the author's work experience/background.",
    ),
]

lc_output_parser = StructuredOutputParser.from_response_schemas(
    response_schemas
)
output_parser = LangchainOutputParser(lc_output_parser)

from llama_index.core.prompts.default_prompts import (
    DEFAULT_TEXT_QA_PROMPT_TMPL,
)

# take a look at the new QA template!
fmt_qa_tmpl = output_parser.format(DEFAULT_TEXT_QA_PROMPT_TMPL)
print(fmt_qa_tmpl)

Query Index

query_engine = index.as_query_engine(
    llm=llm,
)
response = query_engine.query(
    "What are a few things the author did growing up?",
)

LLM Pydantic Program - NVIDIA

Extract into Album class

Define a Custom Output Parser

Function Calling Program for Structured Extraction

Without docstring in Model

With docstring in Model

Langchain Output Parsing

Load documents, build the VectorStoreIndex

Define Query + Langchain Output Parser

Query Index

Extract into `Album` class