FastAPI + LlamaIndex RAG Example (Ollama)
This example demonstrates how to build a simple Retrieval-Augmented Generation (RAG) API using LlamaIndex and FastAPI, powered by a local LLM via Ollama.
Features
Section titled “Features”- Uses
llama3for text generation andnomic-embed-textfor embeddings via Ollama - Local, free LLM (no API keys required)
- Document ingestion and indexing
- Simple query API endpoint
- Clean, production-style structure
Prerequisites
Section titled “Prerequisites”- Python 3.9+
- Ollama installed locally
Pull a model using Ollama:
ollama pull llama3Install dependencies:
pip install -r requirements.txtuvicorn app:app --reloadAlternatively, you can test the API using FastAPI’s built-in Swagger UI:
Open your browser and visit: http://localhost:8000/docs
Example Request
Section titled “Example Request”curl -X POST "http://localhost:8000/query" \-H "Content-Type: application/json" \-d '{"query": "What is this example about?"}'- This example uses a local LLM by default.
- It is intended as a minimal, beginner-friendly demonstration of integrating LlamaIndex with FastAPI.
! pip install fastapi uvicorn llama-index-llms-ollama llama-index-embeddings-ollama ollamafrom fastapi import FastAPIfrom pydantic import BaseModel
from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.core.settings import Settingsfrom llama_index.llms.ollama import Ollamafrom llama_index.embeddings.ollama import OllamaEmbedding
app = FastAPI(title="LlamaIndex FastAPI RAG (Ollama)")
# Configure local LLM and embedding model via OllamaSettings.llm = Ollama(model="llama3")Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
# Load documents and build index at startupdocuments = SimpleDirectoryReader("data").load_data()index = VectorStoreIndex.from_documents(documents)query_engine = index.as_query_engine()
class QueryRequest(BaseModel): query: str
@app.post("/query")def query_documents(request: QueryRequest): # Query indexed documents using a local LLM via Ollama. response = query_engine.query(request.query) return {"response": str(response)}Note for AI agents: this documentation is built for programmatic access.
- Overview of all docs: https://developers.llamaindex.ai/llms.txt
- Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md
- Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters.
- A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/