JSON Mode
In JSON mode, LlamaParse will return a data structure representing the parsed object.
Installation
Section titled “Installation”npm i llamaindex llama-cloud-services
For Json mode, you need to use loadJson
. The resultType
is automatically set with this method.
More information about indexing the results on the next page.
import { LlamaParseReader } from "llama-cloud-services";
const reader = new LlamaParseReader();async function main() { // Load the file and return an array of json objects const jsonObjs = await reader.loadJson("../data/uber_10q_march_2022.pdf"); // Access the first "pages" (=a single parsed file) object in the array const jsonList = jsonObjs[0]["pages"]; // Further process the jsonList object as needed.}
Output
Section titled “Output”The result format of the response, written to jsonObjs
in the example, follows this structure:
{ "pages": [ ..page objects.. ], "job_metadata": { "credits_used": int, "credits_max": int, "job_credits_usage": int, "job_pages": int, "job_is_cache_hit": boolean }, "job_id": string , "file_path": string, }}
Page objects
Section titled “Page objects”Within page objects, the following keys may be present depending on your document.
page
: The page number of the document.text
: The text extracted from the page.md
: The markdown version of the extracted text.images
: Any images extracted from the page.items
: An array of heading, text and table objects in the order they appear on the page.
JSON Mode with SimpleDirectoryReader
Section titled “JSON Mode with SimpleDirectoryReader”All Readers share a loadData
method with SimpleDirectoryReader
that promises to return a uniform Document with Metadata. This makes JSON mode incompatible with SimpleDirectoryReader.
However, a simple work around is to create a new reader class that extends LlamaParseReader
and adds a new method or overrides loadData
, wrapping around JSON mode, extracting the required values, and returning a Document object.
import { Document } from "llamaindex";import { LlamaParseReader } from "llama-cloud-services";
class LlamaParseReaderWithJson extends LlamaParseReader { // Override the loadData method override async loadData(filePath: string): Promise<Document[]> { // Call loadJson method that was inherited by LlamaParseReader const jsonObjs = await super.loadJson(filePath); let documents: Document[] = [];
jsonObjs.forEach((jsonObj) => { // Making sure it's an array before iterating over it if (Array.isArray(jsonObj.pages)) { } const docs = jsonObj.pages.map( (page: { text: string; page: number }) => new Document({ text: page.text, metadata: { page: page.page } }), ); documents = documents.concat(docs); }); return documents; }}
Now we have documents with page number as metadata. This new reader can be used like any other and be integrated with SimpleDirectoryReader. Since it extends LlamaParseReader
, you can use the same params.
You can assign any other values of the JSON response to the Document as needed.