Ingestion Pipeline
An IngestionPipeline
uses a concept of Transformations
that are applied to input data.
These Transformations
are applied to your input data, and the resulting nodes are either returned or inserted into a vector database (if given).
Installation
Section titled āInstallationānpm i llamaindex @llamaindex/openai @llamaindex/qdrant
Usage Pattern
Section titled āUsage PatternāThe simplest usage is to instantiate an IngestionPipeline like so:
import fs from "node:fs/promises";import { OpenAI, OpenAIEmbedding } from "@llamaindex/openai";import { Document, IngestionPipeline, MetadataMode, TitleExtractor, SentenceSplitter,} from "llamaindex";
async function main() { // Load essay from abramov.txt in Node const path = "node_modules/llamaindex/examples/abramov.txt";
const essay = await fs.readFile(path, "utf-8");
// Create Document object with essay const document = new Document({ text: essay, id_: path }); const pipeline = new IngestionPipeline({ transformations: [ new SentenceSplitter({ chunkSize: 1024, chunkOverlap: 20 }), new TitleExtractor(), new OpenAIEmbedding(), ], });
// run the pipeline const nodes = await pipeline.run({ documents: [document] });
// print out the result of the pipeline run for (const node of nodes) { console.log(node.getContent(MetadataMode.NONE)); }}
main().catch(console.error);
Connecting to Vector Databases
Section titled āConnecting to Vector DatabasesāWhen running an ingestion pipeline, you can also chose to automatically insert the resulting nodes into a remote vector store.
Then, you can construct an index from that vector store later on.
import fs from "node:fs/promises";
import { OpenAIEmbedding } from "@llamaindex/openai";import { QdrantVectorStore } from "@llamaindex/qdrant";import { Document, IngestionPipeline, MetadataMode, TitleExtractor, SentenceSplitter, VectorStoreIndex,} from "llamaindex";
async function main() { // Load essay from abramov.txt in Node const path = "node_modules/llamaindex/examples/abramov.txt";
const essay = await fs.readFile(path, "utf-8");
const vectorStore = new QdrantVectorStore({ host: "http://localhost:6333", });
// Create Document object with essay const document = new Document({ text: essay, id_: path }); const pipeline = new IngestionPipeline({ transformations: [ new SentenceSplitter({ chunkSize: 1024, chunkOverlap: 20 }), new TitleExtractor(), new OpenAIEmbedding(), ], vectorStore, });
// run the pipeline const nodes = await pipeline.run({ documents: [document] });
// create an index const index = VectorStoreIndex.fromVectorStore(vectorStore);}
main().catch(console.error);