Skip to content

Loading Data

Before you can start indexing your documents, you need to load them into memory. A reader is a module that loads data from a file into a Document object.

To install readers call:

Install @llamaindex/readers

If you want to use the reader module, you need to install @llamaindex/readers

npm i @llamaindex/readers

We offer readers for different file formats.

import { CSVReader } from '@llamaindex/readers/csv';
import { DocxReader } from '@llamaindex/readers/docx';
import { HTMLReader } from '@llamaindex/readers/html';
import { ImageReader } from '@llamaindex/readers/image';
import { JSONReader } from '@llamaindex/readers/json';
import { MarkdownReader } from '@llamaindex/readers/markdown';
import { ObsidianReader } from '@llamaindex/readers/obsidian';
import { PDFReader } from '@llamaindex/readers/pdf';
import { TextFileReader } from '@llamaindex/readers/text';

Open in StackBlitz

LlamaIndex.TS supports easy loading of files from folders using the SimpleDirectoryReader class.

It is a simple reader that reads all files from a directory and its subdirectories and delegates the actual reading to the reader specified in the fileExtToReader map.

../../examples/readers/src/simple-directory-reader.ts

Currently, the following readers are mapped to specific file types:

You can modify the reader three different ways:

  • overrideReader overrides the reader for all file types, including unsupported ones.
  • fileExtToReader maps a reader to a specific file type. Can override reader for existing file types or add support for new file types.
  • defaultReader sets a fallback reader for files with unsupported extensions. By default it is TextFileReader.

SimpleDirectoryReader supports up to 9 concurrent requests. Use the numWorkers option to set the number of concurrent requests. By default it runs in sequential mode, i.e. set to 1.

../../examples/readers/src/custom-simple-directory-reader.ts

Tips when using in non-Node.js environments

Section titled “Tips when using in non-Node.js environments”

When using @llamaindex/readers in a non-Node.js environment (such as Vercel Edge, Cloudflare Workers, etc.) Some classes are not exported from top-level entry file.

The reason is that some classes are only compatible with Node.js runtime, (e.g. PDFReader) which uses Node.js specific APIs (like fs, child_process, crypto).

If you need any of those classes, you have to import them instead directly through their file path in the package.

As the PDFReader is not working with the Edge runtime, here’s how to use the SimpleDirectoryReader with the LlamaParseReader to load PDFs:

import { SimpleDirectoryReader } from "@llamaindex/readers/directory";
import { LlamaParseReader } from "llama-cloud-services";
export const DATA_DIR = "./data";
export async function getDocuments() {
const reader = new SimpleDirectoryReader();
// Load PDFs using LlamaParseReader
return await reader.loadData({
directoryPath: DATA_DIR,
fileExtToReader: {
pdf: new LlamaParseReader({ resultType: "markdown" }),
},
});
}

Note: Reader classes have to be added explicitly to the fileExtToReader map in the Edge version of the SimpleDirectoryReader.

You’ll find a complete example with LlamaIndexTS here: https://github.com/run-llama/create_llama_projects/tree/main/nextjs-edge-llamaparse

Load file natively using Node.js Customization Hooks

Section titled “Load file natively using Node.js Customization Hooks”

We have a helper utility to allow you to import a file in Node.js script.

Terminal window
node --import @llamaindex/readers/node ./script.js
import csv from './path/to/data.csv';
const text = csv.getText()