LlamaExtract Python SDK
For a more programmatic approach, the Python SDK is the recommended way to experiment with different schemas and run extractions at scale. The Github repo for the Python SDK is here.
First, get an api key. We recommend putting your key in a file called .env
that looks like this:
LLAMA_CLOUD_API_KEY=llx-xxxxxx
Set up a new python environment using the tool of your choice, we used poetry init
. Then install the deps you’ll need:
pip install llama-cloud-services python-dotenv
Now we have our libraries and our API key available, let’s create a extract.py
file and extract data from files. In this case,
we’re using some sample resumes from our example:
Quick Start
Section titled “Quick Start”from llama_cloud_services import LlamaExtractfrom pydantic import BaseModel, Field
# bring in our LLAMA_CLOUD_API_KEYfrom dotenv import load_dotenvload_dotenv()
# Initialize clientextractor = LlamaExtract()
# Define schema using Pydanticclass Resume(BaseModel): name: str = Field(description="Full name of candidate") email: str = Field(description="Email address") skills: list[str] = Field(description="Technical skills and technologies")
# Create extraction agentagent = extractor.create_agent(name="resume-parser", data_schema=Resume)
# Extract data from documentresult = agent.extract("resume.pdf")print(result.data)
Now run it like any python file. This will print the results of the extraction.
python extract.py
Defining Schemas
Section titled “Defining Schemas”Schemas can be defined using either Pydantic models or JSON Schema. Refer to the Schemas page for more details.
Other Extraction APIs
Section titled “Other Extraction APIs”Extraction over bytes or text
Section titled “Extraction over bytes or text”You can use the SourceText
class to extract from bytes or text directly without using a file. If passing the file bytes,
you will need to pass the filename to the SourceText
class.
with open("resume.pdf", "rb") as f: file_bytes = f.read()result = test_agent.extract(SourceText(file=file_bytes, filename="resume.pdf"))
result = test_agent.extract(SourceText(text_content="Candidate Name: Jane Doe"))
Batch Processing
Section titled “Batch Processing”Process multiple files asynchronously:
# Queue multiple files for extractionjobs = await agent.queue_extraction(["resume1.pdf", "resume2.pdf"])
# Check job statusfor job in jobs: status = agent.get_extraction_job(job.id).status print(f"Job {job.id}: {status}")
# Get results when completeresults = [agent.get_extraction_run_for_job(job.id) for job in jobs]
Updating Schemas
Section titled “Updating Schemas”Schemas can be modified and updated after creation:
# Update schemaagent.data_schema = new_schema
# Save changesagent.save()
Managing Agents
Section titled “Managing Agents”# List all agentsagents = extractor.list_agents()
# Get specific agentagent = extractor.get_agent(name="resume-parser")
# Delete agentextractor.delete_agent(agent.id)