Troubleshooting Guide
This guide helps you diagnose and resolve common issues when deploying LlamaCloud on Azure. Use this after completing the Azure Setup Guide if you encounter problems.
General Debugging Commands
Section titled “General Debugging Commands”Pod Status and Logs
Section titled “Pod Status and Logs”# Check all pod statuskubectl get pods -o wide
# Describe problematic podskubectl describe pod <pod-name>
# Check logs for specific serviceskubectl logs deployment/llamacloud-backendkubectl logs deployment/llamacloud-jobs-workerkubectl logs deployment/llamacloud-s3proxy
Service and Secret Status
Section titled “Service and Secret Status”# Check serviceskubectl get svc
# Verify secrets existkubectl get secrets
# Check configmapskubectl get configmaps
Database Connection Issues
Section titled “Database Connection Issues”PostgreSQL Connection Problems
Section titled “PostgreSQL Connection Problems”Symptoms:
- Backend pods failing to start
- Database connection errors in logs
- “connection refused” or “timeout” errors
Solutions:
-
Verify database connection:
Terminal window # Test connection from AKSkubectl run -it --rm debug --image=postgres:15 --restart=Never -- psql "postgresql://username:password@server.postgres.database.azure.com:5432/llamacloud" -
Check secret values:
Terminal window kubectl get secret postgresql-secret -o yaml# Verify DATABASE_HOST, DATABASE_USER, etc. are correct -
Common fixes:
- Add AKS subnet to PostgreSQL firewall rules
- Verify SSL is enabled (required by Azure Database for PostgreSQL)
- Check database name exists
- Verify user permissions
Redis Connection Issues
Section titled “Redis Connection Issues”Symptoms:
- “Redis connection failed” in backend logs
- Authentication errors
- SSL/TLS errors
Solutions:
-
Test Redis connectivity:
Terminal window kubectl run -it --rm redis-test --image=redis:7 --restart=Never -- redis-cli -h your-redis.redis.cache.windows.net -p 6380 --tls -a your-access-key ping -
Check SSL configuration:
- Azure Redis requires SSL on port 6380
- Verify
REDIS_SCHEME: "rediss"
in secret - Ensure
REDIS_PORT: "6380"
for SSL
-
Verify access key:
- Copy primary access key exactly from Azure Portal
- No extra spaces or characters
Service Bus Connection Issues
Section titled “Service Bus Connection Issues”Symptoms:
- Jobs worker fails to start
- “Service Bus connection failed” errors
- Queue creation errors
Solutions:
-
Verify connection string format:
Endpoint=sb://namespace.servicebus.windows.net/;SharedAccessKeyName=policy;SharedAccessKey=key -
Check permissions:
- Shared access policy must have Manage, Send, and Listen rights
- Standard tier or higher required (Basic not supported)
-
Test connectivity:
Terminal window # From Azure Portal, test connection using Service Bus Explorer
Cosmos DB (MongoDB) Issues
Section titled “Cosmos DB (MongoDB) Issues”Symptoms:
- MongoDB connection errors
- “SSL/TLS handshake failed”
- “API type not supported”
Solutions:
-
Verify MongoDB API:
- Must use MongoDB API, not SQL API
- Check API type in Cosmos DB Overview
-
Check connection string:
mongodb://account:key@account.mongo.cosmos.azure.com:10255/?ssl=true&replicaSet=globaldb&retrywrites=false&maxIdleTimeMS=120000&appName=@account@ -
SSL requirements:
- SSL is required for Cosmos DB
- Connection string includes
ssl=true
Storage Issues
Section titled “Storage Issues”Blob Storage / S3Proxy Problems
Section titled “Blob Storage / S3Proxy Problems”Symptoms:
- File upload failures
- S3Proxy pod crashlooping
- “Access denied” errors
Solutions:
-
Check s3proxy logs:
Terminal window kubectl logs deployment/llamacloud-s3proxy -
Verify container names:
- All required containers must exist
- Names are case-sensitive
- Check containers in Azure Portal
-
Required containers:
llama-platform-parsed-documentsllama-platform-etlllama-platform-external-componentsllama-platform-file-parsingllama-platform-raw-filesllama-cloud-parse-outputllama-platform-file-screenshotsllama-platform-extract-output -
Check s3proxy configuration:
- Review s3proxy configuration docs
Azure OpenAI Issues
Section titled “Azure OpenAI Issues”Model Deployment Problems
Section titled “Model Deployment Problems”Symptoms:
- “Model not found” errors
- “Deployment not found” errors
- API version errors
Solutions:
-
Check job service logs:
Terminal window kubectl logs deployment/llamacloud-jobs-serviceWe run LLM integration validators on pod startup. You can find useful error logs for LLM integrations.
-
Verify deployment names:
- Use deployment name, not model name
- Check in Azure Portal → Model deployments
-
Check quotas:
- Ensure sufficient TPM quota allocated
- Verify deployment is not paused
-
API version:
- Use supported version:
2024-12-01-preview
- Check Azure OpenAI documentation for latest
- Use supported version:
-
Test direct access:
Terminal window curl -H "api-key: YOUR_KEY" \"https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT/completions?api-version=2024-12-01-preview"
Authentication Issues
Section titled “Authentication Issues”Microsoft Entra ID OIDC Problems
Section titled “Microsoft Entra ID OIDC Problems”Symptoms:
- Authentication redirects fail
- “Invalid client” errors
- OIDC discovery errors
Solutions:
-
Verify app registration:
- Check client ID is correct
- Verify redirect URIs are configured
- Ensure client secret is valid (not expired)
-
Check discovery URL:
https://login.microsoftonline.com/{tenant-id}/v2.0/.well-known/openid-configuration -
Test OIDC endpoint:
Terminal window curl https://login.microsoftonline.com/{tenant-id}/v2.0/.well-known/openid-configuration
Pod-Specific Issues
Section titled “Pod-Specific Issues”Backend Pod Issues
Section titled “Backend Pod Issues”Common problems:
- Environment variable errors
- Secret mounting failures
- Database migration failures
Debug steps:
kubectl logs deployment/llamacloud-backend --tail=100kubectl describe deployment llamacloud-backendkubectl get events --sort-by='.lastTimestamp'
Frontend Pod Issues
Section titled “Frontend Pod Issues”Common problems:
- Build failures
- Configuration errors
- Ingress connectivity
Debug steps:
kubectl logs deployment/llamacloud-frontend --tail=100kubectl port-forward svc/llamacloud-frontend 3000:3000
Jobs Worker Issues
Section titled “Jobs Worker Issues”Common problems:
- Queue connectivity
- Job processing failures
- Memory/CPU limits
Debug steps:
kubectl logs deployment/llamacloud-jobs-worker --tail=100kubectl top pod -l app=llamacloud-jobs-worker
Network and Security Issues
Section titled “Network and Security Issues”AKS Networking Problems
Section titled “AKS Networking Problems”Symptoms:
- Pods cannot reach Azure services
- DNS resolution failures
- Intermittent connectivity
Solutions:
-
Check network security groups:
- Verify outbound rules allow Azure service connections
- Check subnet NSG rules
-
Verify DNS:
Terminal window kubectl run -it --rm nslookup --image=busybox --restart=Never -- nslookup your-postgres.postgres.database.azure.com -
Test private endpoints:
- If using private endpoints, verify routing
- Check private DNS zones
Ingress Issues
Section titled “Ingress Issues”Symptoms:
- Cannot access LlamaCloud UI externally
- SSL certificate errors
- Load balancer failures
Solutions:
-
Check ingress controller:
Terminal window kubectl get ingresskubectl logs -n ingress-nginx deployment/nginx-ingress-controller -
Verify DNS configuration:
- Domain points to load balancer IP
- SSL certificates are valid
-
Test load balancer:
Terminal window kubectl get svc -n ingress-nginx
Performance Issues
Section titled “Performance Issues”Slow Performance
Section titled “Slow Performance”Common causes:
- Insufficient resources
- Database performance issues
- Network latency
Solutions:
-
Check resource usage:
Terminal window kubectl top podskubectl top nodes -
Scale resources:
Terminal window kubectl scale deployment llamacloud-backend --replicas=3 -
Optimize Azure services:
- Increase PostgreSQL compute tier
- Use Premium Redis tier
- Enable auto-scaling for Cosmos DB
Memory/CPU Issues
Section titled “Memory/CPU Issues”Symptoms:
- Pod restarts
- OOMKilled events
- High CPU usage
Solutions:
-
Check resource limits:
Terminal window kubectl describe pod <pod-name> -
Increase limits in values.yaml:
backend:resources:limits:memory: 4Gicpu: 2
Error Code Reference
Section titled “Error Code Reference”Common HTTP Errors
Section titled “Common HTTP Errors”- 500 Internal Server Error: Check backend logs, database connectivity
- 502 Bad Gateway: Check if backend pods are running
- 503 Service Unavailable: Check service health, scaling issues
- 401 Unauthorized: OIDC configuration issues
- 403 Forbidden: Azure service permission issues
Common Database Errors
Section titled “Common Database Errors”- Connection refused: Firewall or network issues
- Authentication failed: Wrong credentials
- SSL required: Missing SSL configuration
- Database does not exist: Database name mismatch
Getting Help
Section titled “Getting Help”Collect Diagnostic Information
Section titled “Collect Diagnostic Information”Before contacting support, gather:
# Basic cluster infokubectl get pods -o widekubectl get svckubectl get secretskubectl get configmaps
# Logs from all serviceskubectl logs deployment/llamacloud-backend > backend.logkubectl logs deployment/llamacloud-frontend > frontend.logkubectl logs deployment/llamacloud-jobs-worker > jobs.logkubectl logs deployment/llamacloud-s3proxy > s3proxy.log
# Cluster eventskubectl get events --sort-by='.lastTimestamp'
# Resource usagekubectl top podskubectl top nodes
Contact Support
Section titled “Contact Support”- LlamaCloud Support: support@llamaindex.ai
- Include: Deployment configuration, error logs, Azure resource details
- Avoid: Sharing secrets, credentials, or sensitive data