Skip to content
Guide
Index V2 (Beta)

File Operations

Search for files, grep within parsed content, and read file text from an index.

File operations let you work with individual files inside an index. You can search for files by name, grep through their parsed content with regex, and read the full text of a file — all without needing to access the original documents.

These endpoints are available under client.beta.retrieval.* in the SDK.

Find files by exact name or substring match.

# Exact name match
results = await client.beta.retrieval.find(
index_id="<your-index-id>",
file_name="quarterly-report.pdf",
)
# Substring match (case-insensitive)
results = await client.beta.retrieval.find(
index_id="<your-index-id>",
file_name_contains="quarterly",
)
for f in results.items:
print(f"{f.file_id}: {f.file_name}")

Search for a regex pattern within a file’s parsed text content. This is useful for finding specific terms, patterns, or sections within a document.

results = await client.beta.retrieval.grep(
index_id="<your-index-id>",
file_id="<file-id>",
pattern="revenue|profit",
context_chars=100, # characters of context around each match
)
for match in results.items:
print(f"Offset {match.start_char}-{match.end_char}: {match.content}")
ParameterTypeDefaultDescription
index_idstringrequiredID of the index.
file_idstringrequiredID of the file to search within.
patternstringrequiredRegex pattern to match.
context_charsintnullCharacters of context to include around each match.

Read the parsed text content of a file. Supports pagination via offset and max_length for large documents.

# Read the full file
result = await client.beta.retrieval.read(
index_id="<your-index-id>",
file_id="<file-id>",
)
print(result.content)
# Read a specific range
result = await client.beta.retrieval.read(
index_id="<your-index-id>",
file_id="<file-id>",
offset=1000,
max_length=5000,
)
print(result.content)
ParameterTypeDefaultDescription
index_idstringrequiredID of the index.
file_idstringrequiredID of the file to read.
offsetint0Starting character offset.
max_lengthintnullMaximum characters to return from the offset. Returns the full file if omitted.