Skip to content

Workflows API

LlamaDeploy runs your LlamaIndex workflows locally and in the cloud. Author your workflows, then export them so the app server can discover and expose them as HTTP APIs.

If you’re new to LlamaIndex workflows, start here: LlamaIndex Workflows v1.

Key ideas:

  • Define workflows by subclassing Workflow and composing @step functions
  • Use StartEvent to pass inputs and StopEvent to end the run
  • Optionally define typed events and manage state/context across steps
from llama_index.core.workflow import Workflow, step, StartEvent, StopEvent
class JokeFlow(Workflow):
@step
async def generate(self, ev: StartEvent) -> StopEvent:
topic = ev.topic
return StopEvent(result=f"Joke about {topic}")
workflow = JokeFlow(timeout=60)

Export workflows so LlamaDeploy can serve them

Section titled “Export workflows so LlamaDeploy can serve them”

You can expose workflows in two ways: map individual workflows, or register them in a WorkflowServer and export that as a single app entrypoint.

Option A: Map individual workflows (simple)

Section titled “Option A: Map individual workflows (simple)”

Define workflow instances in your code, then reference them from config. See examples in the Deployment Config Reference.

Option B: Bundle with WorkflowServer (advanced)

Section titled “Option B: Bundle with WorkflowServer (advanced)”

Aggregate multiple workflows and export a single WorkflowServer instance.

my_package/app.py
from workflows.server import WorkflowServer
from my_package.jokes import workflow as joke_workflow
from my_package.other import workflow as other_workflow
app = WorkflowServer()
app.add_workflow("joke", joke_workflow)
app.add_workflow("other", other_workflow)

Reference the app entrypoint in config (mutually exclusive with workflows); see Deployment Config Reference.

How serving works (locally and in the cloud)

Section titled “How serving works (locally and in the cloud)”
  • llamactl serve discovers your config (pyproject.toml or llama_deploy.yaml). See llamactl serve.
  • The app server loads your workflows (or your WorkflowServer).
  • HTTP routes are exposed under /deployments/{name} for your app name.
  • If you configured a UI, it runs alongside your API (proxy in dev, static in preview). See UI build and dev integration.

To persist structured outputs from your workflows (extractions, events, etc.), use Agent Data Overview and language guides for TypeScript and Python.

When using a WorkflowServer, the app server exposes these routes (prefixed by /deployments/{name}):

  • GET /workflows: List registered workflow names
  • POST /workflows/{workflowName}/run: Run and wait; body supports:
    • start_event: JSON string of a serialized StartEvent
    • context: object for workflow context
    • kwargs: object of keyword args passed to .run()
  • POST /workflows/{workflowName}/run-nowait: Start async; returns handler_id
  • GET /results/{handler_id}: Fetch result if finished; 202 if not ready
  • GET /events/{handler_id}?sse=true|false: Stream events as SSE (text/event-stream) or NDJSON (application/x-ndjson)
  • GET /health: App server health

Notes:

  • start_event uses LlamaIndex’s JSON serializer; pass a JSON string of the serialized event when you need a custom StartEvent
  • Event streams include serializer metadata (qualified name and value) to reconstruct types
  • All endpoints are namespaced per deployment at /deployments/{name}

Next: Learn how to integrate and build your UI in UI build and dev integration.