Skip to content
Guide
Index V2 (Beta)

Custom Vector Stores

Configure an index to export raw parsed output instead of writing to a managed vector store, then load the data into your own database.

By default, an Index V2 index parses, chunks, embeds, and writes to a managed vector store automatically. If you’d rather own the vector store (because you’re self-hosting the platform with a database we don’t manage, or you simply want to keep your data in your own infrastructure), you can disable the managed export and download the raw parsed output instead.

In this mode, LlamaCloud still handles directory management, parsing, and sync orchestration. You take over chunking, embedding, and writing to your database.

Pass vector_target="DISABLED" when creating the index. The pipeline falls back to a download-only export, so parsed output is written to object storage instead of a managed vector store.

index = await client.beta.indexes.create(
source_directory_id=directory.id,
vector_target="DISABLED",
)

Wait for the index to reach ready the same way as a normal index (see the getting started guide).

Each file in the source directory produces a JSON payload containing per-page markdown (header, text, footer), structured items, attachments, and provenance metadata. List the files in the directory and fetch each payload via a presigned URL.

import httpx
# List files in the source directory
files_resp = await client.beta.directories.files.list(
directory_id=directory.id,
)
# Each directory file ID points at a parsed-output JSON object in storage.
# Resolve its presigned URL and download.
async with httpx.AsyncClient() as http:
for f in files_resp.items:
presigned = await client.files.get(f.file_id)
response = await http.get(presigned.url)
response.raise_for_status()
parsed = response.json()
# parsed["parse"] holds the parsed output payload

Re-run this loop after each sync to pick up changes. The content_fingerprint_hash and source_modified_at fields on each payload let you skip files that haven’t changed since the last export.

From there, the steps are the same as any custom RAG pipeline:

  1. Concatenate per-page markdown while tracking page boundaries by character offset.
  2. Chunk the text with your preferred chunker.
  3. Embed the chunks with your preferred embedding model.
  4. Write the chunks to your database, keyed on (parsed_directory_file_id, export_config_id) so re-parsed files fully replace earlier exports rather than duplicating.

The index-v2-data-sinks repo contains complete, runnable reference exporters for several databases:

  • MongoDB (Atlas vector search)
  • PostgreSQL + pgvector
  • Qdrant
  • Pinecone
  • Turbopuffer
  • Azure AI Search

Each exporter handles index/collection provisioning, deterministic IDs for idempotent re-exports, per-file delete-then-insert, and snapshot listing for incremental syncs. The shared Exporter protocol in export/base.py is the contract to implement when adding a new sink:

class Exporter(Protocol):
async def export(self, outputs, *, project_id, embeddings=None, embedding_model=None): ...
async def delete_file(self, *, parsed_directory_file_id, export_config_id): ...
async def delete_files(self, *, parsed_directory_file_ids, export_config_id): ...
async def list_snapshots(self, *, export_config_id): ...

Point your coding agent at the repo as a worked example, copy a sink you like as a starting point, or implement the protocol against the store of your choice.

  • Retrieval, chat, and file operations on client.beta.* query the managed vector store. With vector_target="DISABLED", those endpoints have nothing to query — you serve retrieval from your own database.
  • You’re responsible for cleaning up records in your database when files are removed from the source directory. The list_snapshots() pattern in the reference implementations shows how to diff against the current directory state.