FastAPI + LlamaIndex RAG Example (Ollama)

This example demonstrates how to build a simple Retrieval-Augmented Generation (RAG) API using LlamaIndex and FastAPI, powered by a local LLM via Ollama.

Features

Uses llama3 for text generation and nomic-embed-text for embeddings via Ollama
Local, free LLM (no API keys required)
Document ingestion and indexing
Simple query API endpoint
Clean, production-style structure

Prerequisites

Python 3.9+
Ollama installed locally

Setup

Pull a model using Ollama:

ollama pull llama3

Install dependencies:

pip install -r requirements.txt

Run

uvicorn app:app --reload

Alternatively, you can test the API using FastAPI’s built-in Swagger UI:

Open your browser and visit: http://localhost:8000/docs

Example Request

curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "What is this example about?"}'

Notes

This example uses a local LLM by default.
It is intended as a minimal, beginner-friendly demonstration of integrating LlamaIndex with FastAPI.

! pip install fastapi uvicorn llama-index-llms-ollama llama-index-embeddings-ollama ollama

from fastapi import FastAPI
from pydantic import BaseModel

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.settings import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

app = FastAPI(title="LlamaIndex FastAPI RAG (Ollama)")

# Configure local LLM and embedding model via Ollama
Settings.llm = Ollama(model="llama3")
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")

# Load documents and build index at startup
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()


class QueryRequest(BaseModel):
    query: str


@app.post("/query")
def query_documents(request: QueryRequest):
    # Query indexed documents using a local LLM via Ollama.
    response = query_engine.query(request.query)
    return {"response": str(response)}

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/