Serving your Workflows
LlamaDeploy runs your LlamaIndex workflows locally and in the cloud. Author your workflows, add minimal configuration, and llamactl
wraps them in an application server that exposes them as HTTP APIs.
Learn the basics (LlamaIndex Workflows)
Section titled “Learn the basics (LlamaIndex Workflows)”LlamaDeploy is built on top of LlamaIndex workflows. If you’re new to workflows, start here: LlamaIndex Workflows.
Author a workflow (quick example)
Section titled “Author a workflow (quick example)”from llama_index.core.workflow import Workflow, step, StartEvent, StopEvent
class QuestionFlow(Workflow): @step async def generate(self, ev: StartEvent) -> StopEvent: question = ev.question return StopEvent(result=f"Answer to {question}")
qa_workflow = QuestionFlow(timeout=120)
Configure workflows for LlamaDeploy to serve
Section titled “Configure workflows for LlamaDeploy to serve”LlamaDeploy reads workflows configured in your pyproject.toml
and makes them available under their configured names.
Define workflow instances in your code, then reference them in your config.
[project]name = "app"# ...[tool.llamadeploy.workflows]answer-question = "app.workflows:qa_workflow"
How serving works (local and cloud)
Section titled “How serving works (local and cloud)”llamactl serve
discovers your config. Seellamactl serve
.- The app server loads your workflows.
- HTTP routes are exposed under
/deployments/{name}
. In development,{name}
defaults to your Python project name and is configurable. On deploy, you can set a new name; a short random suffix may be appended to ensure uniqueness. - Workflow instances are registered under the specified name. For example,
POST /deployments/app/workflows/answer-question/run
runs the workflow above. - If you configure a UI, it runs alongside your API (proxied in dev, static in preview). For details, see UI build and dev integration.
During development, the API is available at http://localhost:4501
. After you deploy to LlamaCloud, it is available at https://api.cloud.llamaindex.ai
.
Authorization
Section titled “Authorization”During local development, the API is unprotected. After deployment, your API uses the same authorization as LlamaCloud. Create an API token in the same project as the agent to make requests. For example:
curl 'https://api.cloud.llamaindex.ai/deployments/app-xyz123/workflows/answer-question/run' \ -H 'Authorization: Bearer llx-xxx' \ -H 'Content-Type: application/json' \ --data '{"start_event": {"question": "What is the capital of France?"}}'
Workflow HTTP API
Section titled “Workflow HTTP API”When using a WorkflowServer
, the app server exposes your workflows as an API. View the OpenAPI reference at /deployments/<name>/docs
.
This API allows you to:
- Retrieve details about registered workflows
- Trigger runs of your workflows
- Stream published events from your workflows, and retrieve final results from them
- Send events to in-progress workflows (for example, HITL scenarios).
During development, visit http://localhost:4501/debugger
to test and observe your workflows in a UI.