Classify Python SDK
This guide shows how to classify documents using the Python SDK. You will:
- Create classification rules
- Upload files
- Submit a classify job
- Read predictions (type, confidence, reasoning)
The SDK is available in llama-cloud-services.
First, get an API key: Get an API key
Put it in a .env
file:
LLAMA_CLOUD_API_KEY=llx-xxxxxx
Install dependencies:
pip install llama-cloud-services python-dotenv
or with uv
:
uv add llama-cloud-services python-dotenv
Quick start
Section titled “Quick start”The snippet below uses a convenience ClassifyClient
wrapper from llama-cloud-services
that uploads files, creates a classify job, polls for completion and returns results.
import osfrom dotenv import load_dotenvfrom llama_cloud.client import AsyncLlamaCloudfrom llama_cloud.types import ClassifierRule, ClassifyParsingConfiguration, ParserLanguagesfrom llama_cloud_services.beta.classifier.client import ClassifyClient # helper wrapper
load_dotenv()
client = AsyncLlamaCloud(token=os.environ["LLAMA_CLOUD_API_KEY"])project_id = "your-project-id"organization_id = "your-organization-id"classify = ClassifyClient(client, project_id=project_id, organization_id=organization_id)
rules = [ ClassifierRule( type="invoice", description="Documents that contain an invoice number, invoice date, bill-to section, and line items with totals." ), ClassifierRule( type="receipt", description="Short purchase receipts, typically from POS systems, with merchant, items and total, often a single page." ),]
parsing = ClassifyParsingConfiguration( lang=ParserLanguages.EN, max_pages=5, # optional, parse at most 5 pages # target_pages=[1] # optional, parse only specific pages (1-indexed), can't be used with max_pages)
# for async usage, use `await classify.aclassify_file_paths(...)`results = classify.classify_file_paths( rules=rules, file_input_paths=["/path/to/doc1.pdf", "/path/to/doc2.pdf"], parsing_configuration=parsing,)
for item in results.items: # in cases of partial success, some of the items may not have a result if item.result is None: print(f"Classification job {item.classify_job_id} error-ed on file {item.file_id}") continue print(item.file_id, item.result.type, item.result.confidence) print(item.result.reasoning)
Notes:
ClassifierRule
requires atype
and a descriptivedescription
that the model can follow.ClassifyParsingConfiguration
is optional; setlang
,max_pages
, ortarget_pages
to control parsing.- In cases of partial failure, some of the items may not have a result (i.e.
results.items[*].result
could beNone
).
Tips for writing good rules
Section titled “Tips for writing good rules”- Be specific about content features that distinguish the type.
- Include key fields the document usually contains (e.g., invoice number, total amount).
- Add multiple rules when needed to cover distinct patterns.
- Start simple, test on a small set, then refine.