File Operations
Search for files, grep within parsed content, and read file text from an index.
Overview
Section titled “Overview”File operations let you work with individual files inside an index. You can search for files by name, grep through their parsed content with regex, and read the full text of a file — all without needing to access the original documents.
These endpoints are available under client.beta.retrieval.* in the SDK.
Search files
Section titled “Search files”Find files by exact name or substring match.
# Exact name matchresults = await client.beta.retrieval.find( index_id="<your-index-id>", file_name="quarterly-report.pdf",)
# Substring match (case-insensitive)results = await client.beta.retrieval.find( index_id="<your-index-id>", file_name_contains="quarterly",)
for f in results.items: print(f"{f.file_id}: {f.file_name}")// Substring matchconst results = await client.beta.retrieval.find({ index_id: "<your-index-id>", file_name_contains: "quarterly",});
for (const f of results.items) { console.log(`${f.file_id}: ${f.file_name}`);}Grep within a file
Section titled “Grep within a file”Search for a regex pattern within a file’s parsed text content. This is useful for finding specific terms, patterns, or sections within a document.
results = await client.beta.retrieval.grep( index_id="<your-index-id>", file_id="<file-id>", pattern="revenue|profit", context_chars=100, # characters of context around each match)
for match in results.items: print(f"Offset {match.start_char}-{match.end_char}: {match.content}")const results = await client.beta.retrieval.grep({ index_id: "<your-index-id>", file_id: "<file-id>", pattern: "revenue|profit", context_chars: 100,});
for (const match of results.items) { console.log(`Offset ${match.start_char}-${match.end_char}: ${match.content}`);}Grep parameters
Section titled “Grep parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
index_id | string | required | ID of the index. |
file_id | string | required | ID of the file to search within. |
pattern | string | required | Regex pattern to match. |
context_chars | int | null | Characters of context to include around each match. |
Read a file
Section titled “Read a file”Read the parsed text content of a file. Supports pagination via offset and max_length for large documents.
# Read the full fileresult = await client.beta.retrieval.read( index_id="<your-index-id>", file_id="<file-id>",)print(result.content)
# Read a specific rangeresult = await client.beta.retrieval.read( index_id="<your-index-id>", file_id="<file-id>", offset=1000, max_length=5000,)print(result.content)// Read the full fileconst result = await client.beta.retrieval.read({ index_id: "<your-index-id>", file_id: "<file-id>",});console.log(result.content);
// Read a specific rangeconst partial = await client.beta.retrieval.read({ index_id: "<your-index-id>", file_id: "<file-id>", offset: 1000, max_length: 5000,});console.log(partial.content);Read parameters
Section titled “Read parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
index_id | string | required | ID of the index. |
file_id | string | required | ID of the file to read. |
offset | int | 0 | Starting character offset. |
max_length | int | null | Maximum characters to return from the offset. Returns the full file if omitted. |