Agent Deployments
Self-Hosting Documentation Access
This section requires a password to access. Interested in self-hosting? Contact sales to learn more.
Users of your LlamaCloud install can build agents in code, deploy them through the UI or API, and get an HTTP endpoint at /deployments/<agent-name> that runs their agent as a pod in the cluster. LlamaCloud’s parent ingress already routes /deployments/* to the backend, so there’s no agent-specific ingress setup to do. But the components that actually build and run agent pods (a control plane API, a Kubernetes operator that reconciles LlamaDeployment custom resources, and an S3-compatible bucket for build artifacts and backups) live in a separate llama-agents Helm chart.
As a cluster operator you have two choices: run llama-agents as an in-cluster subchart of LlamaCloud, or manage it separately. In-cluster is the common case.
Set llamaAgents.deploy: true and the LlamaCloud chart pulls the subchart in, wires the backend feature flag, and renders a NetworkPolicy allowing agent pods to reach the backend on port 8000.
llamaAgents: deploy: trueSubchart values
Section titled “Subchart values”Pass-through values go under llama-agents-subchart. The full surface lives in the upstream llama-agents chart README; the values below are the common ones.
Object storage
Section titled “Object storage”The subchart needs an S3-compatible bucket for build artifacts, backups, and git repos. Credentials come from a K8s Secret containing S3_ACCESS_KEY and S3_SECRET_KEY.
llama-agents-subchart: controlPlane: objectStorage: s3: bucket: "my-llama-agents-bucket" region: "us-east-1" endpointUrl: "" # leave empty for AWS; set for MinIO/s3proxy/etc. secretRef: "llama-agents-s3-credentials" backupEncryptionSecretRef: "llama-agents-backup-password" # optionalEverything is namespaced inside the bucket under three configurable prefixes, so one bucket can be shared with other workloads (including LlamaCloud’s own file-storage buckets) as long as the prefixes don’t collide. A dedicated bucket is simpler if you want S3 lifecycle rules or IAM policies scoped only to agent artifacts.
| Key | Default | Purpose |
|---|---|---|
controlPlane.objectStorage.buildKeyPrefix | builds/ | Built agent artifacts |
controlPlane.objectStorage.backupKeyPrefix | backups/ | Backup archives (encrypted when backupEncryptionSecretRef is set) |
controlPlane.objectStorage.codeRepoKeyPrefix | git/ | Code repositories |
App namespace
Section titled “App namespace”Agent pods run in a separate llama-agents namespace by default, not the LlamaCloud release namespace. The control plane and operator themselves stay in the release namespace, and the LlamaCloud chart creates the apps namespace automatically.
llama-agents-subchart: apps: namespace: my-agents-nsAn empty string runs agent pods in the release namespace alongside the control plane and operator.
Agent pod resources
Section titled “Agent pod resources”The operator creates pods for each LlamaDeployment. The requests and limits applied to those pods come from these defaults:
llama-agents-subchart: operator: defaultAppRequests: cpu: "750m" memory: "2Gi" defaultAppLimits: cpu: "" # empty = no limit memory: "4096Mi" maxDeployments: 0 # per-namespace cap, 0 = unlimitedFor anything beyond uniform defaults (nodeSelector, tolerations, affinity, container-level overrides), set operator.llamaDeploymentTemplate.spec.podSpec. The operator uses it as the base PodSpec for every managed pod.
The subchart ships two CRDs, LlamaDeployment and LlamaDeploymentTemplate. Helm’s CRD lifecycle is the awkward part: CRDs bundled in a chart’s crds/ directory install on first helm install, but helm upgrade never touches them and helm uninstall never removes them. So fresh installs work out of the box, but when the subchart bumps a CRD schema nothing updates unless you do it yourself.
For that, install the companion llama-agents-crds chart separately and upgrade it on its own cadence. The LlamaCloud chart advertises the version it’s validated against via llamaAgents.crdVersion in its values.yaml; pin to that when installing:
helm upgrade --install llama-agents-crds \ --version <llamaAgents.crdVersion> \ oci://registry-1.docker.io/llamaindex/llama-agents-crdsIt uses helm.sh/resource-policy: keep, so helm uninstall on the CRD chart leaves the CRDs in place and won’t cascade-delete your LlamaDeployment resources.
For production BYOC we recommend installing llama-agents-crds from the start rather than relying on the first-install bundling, so CRD upgrades stay decoupled from LlamaCloud releases.
Separately-managed control plane
Section titled “Separately-managed control plane”Skip deploy: true if you want to run the llama-agents chart on its own release cadence, in a different namespace, or in a different cluster with pod-level connectivity back to this one. Leave deploy: false and point LlamaCloud at the control plane’s service address:
llamaAgents: controlPlaneUrl: "http://llama-agents-control-plane.llama-agents.svc.cluster.local:8000"In this mode the LlamaCloud chart doesn’t deploy the subchart or render the NetworkPolicy, and CRDs don’t need to be present in this cluster. Use an internal address. The control plane API is not designed to be internet-exposed.
Escape hatches
Section titled “Escape hatches”Two knobs change how agent pods talk to the backend. Most installs never need either.
| Value | Default | When to set |
|---|---|---|
llamaAgents.allowBackendEgress | true | Disable the chart’s default egress NetworkPolicy if you author your own. Only renders when deploy: true. |
llamaAgents.useBackendPublicUrl | false | Route agent LLAMA_CLOUD_BASE_URL through the public ingress host instead of the in-cluster ClusterIP. Set this when your cluster’s network policy stack blocks agent-to-backend ClusterIP traffic. |
useBackendPublicUrl is applied at deployment creation time, so existing deployments keep their original URL until they are recreated.