Ingestion Pipeline
An IngestionPipeline uses a concept of Transformations that are applied to input data.
These Transformations are applied to your input data, and the resulting nodes are either returned or inserted into a vector database (if given).
Installation
Section titled “Installation”npm i llamaindex @llamaindex/openai @llamaindex/qdrantUsage Pattern
Section titled “Usage Pattern”The simplest usage is to instantiate an IngestionPipeline like so:
import fs from "node:fs/promises";import { OpenAI, OpenAIEmbedding } from "@llamaindex/openai";import { Document, IngestionPipeline, MetadataMode, TitleExtractor, SentenceSplitter,} from "llamaindex";
async function main() { // Load essay from abramov.txt in Node const path = "node_modules/llamaindex/examples/abramov.txt";
const essay = await fs.readFile(path, "utf-8");
// Create Document object with essay const document = new Document({ text: essay, id_: path }); const pipeline = new IngestionPipeline({ transformations: [ new SentenceSplitter({ chunkSize: 1024, chunkOverlap: 20 }), new TitleExtractor(), new OpenAIEmbedding(), ], });
// run the pipeline const nodes = await pipeline.run({ documents: [document] });
// print out the result of the pipeline run for (const node of nodes) { console.log(node.getContent(MetadataMode.NONE)); }}
main().catch(console.error);Connecting to Vector Databases
Section titled “Connecting to Vector Databases”When running an ingestion pipeline, you can also chose to automatically insert the resulting nodes into a remote vector store.
Then, you can construct an index from that vector store later on.
import fs from "node:fs/promises";
import { OpenAIEmbedding } from "@llamaindex/openai";import { QdrantVectorStore } from "@llamaindex/qdrant";import { Document, IngestionPipeline, MetadataMode, TitleExtractor, SentenceSplitter, VectorStoreIndex,} from "llamaindex";
async function main() { // Load essay from abramov.txt in Node const path = "node_modules/llamaindex/examples/abramov.txt";
const essay = await fs.readFile(path, "utf-8");
const vectorStore = new QdrantVectorStore({ host: "http://localhost:6333", });
// Create Document object with essay const document = new Document({ text: essay, id_: path }); const pipeline = new IngestionPipeline({ transformations: [ new SentenceSplitter({ chunkSize: 1024, chunkOverlap: 20 }), new TitleExtractor(), new OpenAIEmbedding(), ], vectorStore, });
// run the pipeline const nodes = await pipeline.run({ documents: [document] });
// create an index const index = VectorStoreIndex.fromVectorStore(vectorStore);}
main().catch(console.error);API Reference
Section titled “API Reference”Note for AI agents: this documentation is built for programmatic access.
- Overview of all docs: https://developers.llamaindex.ai/llms.txt
- Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md
- Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters.
- A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/