FastAPI + LlamaIndex RAG Example (Ollama)
This example demonstrates how to build a simple Retrieval-Augmented Generation (RAG) API using LlamaIndex and FastAPI, powered by a local LLM via Ollama.
Features
Section titled “Features”- Uses
llama3for text generation andnomic-embed-textfor embeddings via Ollama - Local, free LLM (no API keys required)
- Document ingestion and indexing
- Simple query API endpoint
- Clean, production-style structure
Prerequisites
Section titled “Prerequisites”- Python 3.9+
- Ollama installed locally
Pull a model using Ollama:
ollama pull llama3Install dependencies:
pip install -r requirements.txtuvicorn app:app --reloadAlternatively, you can test the API using FastAPI’s built-in Swagger UI:
Open your browser and visit: http://localhost:8000/docs
Example Request
Section titled “Example Request”curl -X POST "http://localhost:8000/query" \-H "Content-Type: application/json" \-d '{"query": "What is this example about?"}'- This example uses a local LLM by default.
- It is intended as a minimal, beginner-friendly demonstration of integrating LlamaIndex with FastAPI.
! pip install fastapi uvicorn llama-index-llms-ollama llama-index-embeddings-ollama ollamafrom fastapi import FastAPIfrom pydantic import BaseModel
from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.core.settings import Settingsfrom llama_index.llms.ollama import Ollamafrom llama_index.embeddings.ollama import OllamaEmbedding
app = FastAPI(title="LlamaIndex FastAPI RAG (Ollama)")
# Configure local LLM and embedding model via OllamaSettings.llm = Ollama(model="llama3")Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
# Load documents and build index at startupdocuments = SimpleDirectoryReader("data").load_data()index = VectorStoreIndex.from_documents(documents)query_engine = index.as_query_engine()
class QueryRequest(BaseModel): query: str
@app.post("/query")def query_documents(request: QueryRequest): # Query indexed documents using a local LLM via Ollama. response = query_engine.query(request.query) return {"response": str(response)}