# Documents

## Create Batch Pipeline Documents

`List<CloudDocument> pipelines().documents().create(DocumentCreateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())`

**post** `/api/v1/pipelines/{pipeline_id}/documents`

Batch create documents for a pipeline.

### Parameters

- `DocumentCreateParams params`

  - `Optional<String> pipelineId`

  - `List<CloudDocumentCreate> body`

    - `Metadata metadata`

    - `String text`

    - `Optional<String> id`

    - `Optional<List<String>> excludedEmbedMetadataKeys`

    - `Optional<List<String>> excludedLlmMetadataKeys`

    - `Optional<List<Long>> pagePositions`

      indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].

### Example

```java
package com.llamacloud_prod.api.example;

import com.llamacloud_prod.api.client.LlamaCloudClient;
import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient;
import com.llamacloud_prod.api.core.JsonValue;
import com.llamacloud_prod.api.models.pipelines.documents.CloudDocument;
import com.llamacloud_prod.api.models.pipelines.documents.CloudDocumentCreate;
import com.llamacloud_prod.api.models.pipelines.documents.DocumentCreateParams;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        DocumentCreateParams params = DocumentCreateParams.builder()
            .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e")
            .addBody(CloudDocumentCreate.builder()
                .metadata(CloudDocumentCreate.Metadata.builder()
                    .putAdditionalProperty("foo", JsonValue.from("bar"))
                    .build())
                .text("text")
                .build())
            .build();
        List<CloudDocument> cloudDocuments = client.pipelines().documents().create(params);
    }
}
```

#### Response

```json
[
  {
    "id": "id",
    "metadata": {
      "foo": "bar"
    },
    "text": "text",
    "excluded_embed_metadata_keys": [
      "string"
    ],
    "excluded_llm_metadata_keys": [
      "string"
    ],
    "page_positions": [
      0
    ],
    "status_metadata": {
      "foo": "bar"
    }
  }
]
```

## Paginated List Pipeline Documents

`DocumentListPage pipelines().documents().list(DocumentListParamsparams = DocumentListParams.none(), RequestOptionsrequestOptions = RequestOptions.none())`

**get** `/api/v1/pipelines/{pipeline_id}/documents/paginated`

Return a list of documents for a pipeline.

### Parameters

- `DocumentListParams params`

  - `Optional<String> pipelineId`

  - `Optional<String> fileId`

  - `Optional<Long> limit`

  - `Optional<Boolean> onlyApiDataSourceDocuments`

  - `Optional<Boolean> onlyDirectUpload`

  - `Optional<Long> skip`

  - `Optional<StatusRefreshPolicy> statusRefreshPolicy`

    - `CACHED("cached")`

    - `TTL("ttl")`

### Returns

- `class CloudDocument:`

  Cloud document stored in S3.

  - `String id`

  - `Metadata metadata`

  - `String text`

  - `Optional<List<String>> excludedEmbedMetadataKeys`

  - `Optional<List<String>> excludedLlmMetadataKeys`

  - `Optional<List<Long>> pagePositions`

    indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].

  - `Optional<StatusMetadata> statusMetadata`

### Example

```java
package com.llamacloud_prod.api.example;

import com.llamacloud_prod.api.client.LlamaCloudClient;
import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient;
import com.llamacloud_prod.api.models.pipelines.documents.DocumentListPage;
import com.llamacloud_prod.api.models.pipelines.documents.DocumentListParams;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        DocumentListPage page = client.pipelines().documents().list("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e");
    }
}
```

#### Response

```json
{
  "documents": [
    {
      "id": "id",
      "metadata": {
        "foo": "bar"
      },
      "text": "text",
      "excluded_embed_metadata_keys": [
        "string"
      ],
      "excluded_llm_metadata_keys": [
        "string"
      ],
      "page_positions": [
        0
      ],
      "status_metadata": {
        "foo": "bar"
      }
    }
  ],
  "limit": 0,
  "offset": 0,
  "total_count": 0
}
```

## Get Pipeline Document

`CloudDocument pipelines().documents().get(DocumentGetParamsparams, RequestOptionsrequestOptions = RequestOptions.none())`

**get** `/api/v1/pipelines/{pipeline_id}/documents/{document_id}`

Return a single document for a pipeline.

### Parameters

- `DocumentGetParams params`

  - `String pipelineId`

  - `Optional<String> documentId`

### Returns

- `class CloudDocument:`

  Cloud document stored in S3.

  - `String id`

  - `Metadata metadata`

  - `String text`

  - `Optional<List<String>> excludedEmbedMetadataKeys`

  - `Optional<List<String>> excludedLlmMetadataKeys`

  - `Optional<List<Long>> pagePositions`

    indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].

  - `Optional<StatusMetadata> statusMetadata`

### Example

```java
package com.llamacloud_prod.api.example;

import com.llamacloud_prod.api.client.LlamaCloudClient;
import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient;
import com.llamacloud_prod.api.models.pipelines.documents.CloudDocument;
import com.llamacloud_prod.api.models.pipelines.documents.DocumentGetParams;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        DocumentGetParams params = DocumentGetParams.builder()
            .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e")
            .documentId("document_id")
            .build();
        CloudDocument cloudDocument = client.pipelines().documents().get(params);
    }
}
```

#### Response

```json
{
  "id": "id",
  "metadata": {
    "foo": "bar"
  },
  "text": "text",
  "excluded_embed_metadata_keys": [
    "string"
  ],
  "excluded_llm_metadata_keys": [
    "string"
  ],
  "page_positions": [
    0
  ],
  "status_metadata": {
    "foo": "bar"
  }
}
```

## Delete Pipeline Document

`pipelines().documents().delete(DocumentDeleteParamsparams, RequestOptionsrequestOptions = RequestOptions.none())`

**delete** `/api/v1/pipelines/{pipeline_id}/documents/{document_id}`

Delete a document from a pipeline; runs async (vectors first, then MongoDB record).

### Parameters

- `DocumentDeleteParams params`

  - `String pipelineId`

  - `Optional<String> documentId`

### Example

```java
package com.llamacloud_prod.api.example;

import com.llamacloud_prod.api.client.LlamaCloudClient;
import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient;
import com.llamacloud_prod.api.models.pipelines.documents.DocumentDeleteParams;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        DocumentDeleteParams params = DocumentDeleteParams.builder()
            .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e")
            .documentId("document_id")
            .build();
        client.pipelines().documents().delete(params);
    }
}
```

## Get Pipeline Document Status

`ManagedIngestionStatusResponse pipelines().documents().getStatus(DocumentGetStatusParamsparams, RequestOptionsrequestOptions = RequestOptions.none())`

**get** `/api/v1/pipelines/{pipeline_id}/documents/{document_id}/status`

Return a single document for a pipeline.

### Parameters

- `DocumentGetStatusParams params`

  - `String pipelineId`

  - `Optional<String> documentId`

### Returns

- `class ManagedIngestionStatusResponse:`

  - `Status status`

    Status of the ingestion.

    - `NOT_STARTED("NOT_STARTED")`

    - `IN_PROGRESS("IN_PROGRESS")`

    - `SUCCESS("SUCCESS")`

    - `ERROR("ERROR")`

    - `PARTIAL_SUCCESS("PARTIAL_SUCCESS")`

    - `CANCELLED("CANCELLED")`

  - `Optional<LocalDateTime> deploymentDate`

    Date of the deployment.

  - `Optional<LocalDateTime> effectiveAt`

    When the status is effective

  - `Optional<List<Error>> error`

    List of errors that occurred during ingestion.

    - `String jobId`

      ID of the job that failed.

    - `String message`

      List of errors that occurred during ingestion.

    - `Step step`

      Name of the job that failed.

      - `MANAGED_INGESTION("MANAGED_INGESTION")`

      - `DATA_SOURCE("DATA_SOURCE")`

      - `FILE_UPDATER("FILE_UPDATER")`

      - `PARSE("PARSE")`

      - `TRANSFORM("TRANSFORM")`

      - `INGESTION("INGESTION")`

      - `METADATA_UPDATE("METADATA_UPDATE")`

  - `Optional<String> jobId`

    ID of the latest job.

### Example

```java
package com.llamacloud_prod.api.example;

import com.llamacloud_prod.api.client.LlamaCloudClient;
import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient;
import com.llamacloud_prod.api.models.pipelines.ManagedIngestionStatusResponse;
import com.llamacloud_prod.api.models.pipelines.documents.DocumentGetStatusParams;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        DocumentGetStatusParams params = DocumentGetStatusParams.builder()
            .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e")
            .documentId("document_id")
            .build();
        ManagedIngestionStatusResponse managedIngestionStatusResponse = client.pipelines().documents().getStatus(params);
    }
}
```

#### Response

```json
{
  "status": "NOT_STARTED",
  "deployment_date": "2019-12-27T18:11:19.117Z",
  "effective_at": "2019-12-27T18:11:19.117Z",
  "error": [
    {
      "job_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
      "message": "message",
      "step": "MANAGED_INGESTION"
    }
  ],
  "job_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e"
}
```

## Sync Pipeline Document

`JsonValue pipelines().documents().sync(DocumentSyncParamsparams, RequestOptionsrequestOptions = RequestOptions.none())`

**post** `/api/v1/pipelines/{pipeline_id}/documents/{document_id}/sync`

Sync a specific document for a pipeline.

### Parameters

- `DocumentSyncParams params`

  - `String pipelineId`

  - `Optional<String> documentId`

### Returns

- `class DocumentSyncResponse:`

### Example

```java
package com.llamacloud_prod.api.example;

import com.llamacloud_prod.api.client.LlamaCloudClient;
import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient;
import com.llamacloud_prod.api.models.pipelines.documents.DocumentSyncParams;
import com.llamacloud_prod.api.models.pipelines.documents.DocumentSyncResponse;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        DocumentSyncParams params = DocumentSyncParams.builder()
            .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e")
            .documentId("document_id")
            .build();
        DocumentSyncResponse response = client.pipelines().documents().sync(params);
    }
}
```

#### Response

```json
{}
```

## List Pipeline Document Chunks

`List<TextNode> pipelines().documents().getChunks(DocumentGetChunksParamsparams, RequestOptionsrequestOptions = RequestOptions.none())`

**get** `/api/v1/pipelines/{pipeline_id}/documents/{document_id}/chunks`

Return a list of chunks for a pipeline document.

### Parameters

- `DocumentGetChunksParams params`

  - `String pipelineId`

  - `Optional<String> documentId`

### Example

```java
package com.llamacloud_prod.api.example;

import com.llamacloud_prod.api.client.LlamaCloudClient;
import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient;
import com.llamacloud_prod.api.models.pipelines.documents.DocumentGetChunksParams;
import com.llamacloud_prod.api.models.pipelines.documents.TextNode;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        DocumentGetChunksParams params = DocumentGetChunksParams.builder()
            .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e")
            .documentId("document_id")
            .build();
        List<TextNode> textNodes = client.pipelines().documents().getChunks(params);
    }
}
```

#### Response

```json
[
  {
    "class_name": "class_name",
    "embedding": [
      0
    ],
    "end_char_idx": 0,
    "excluded_embed_metadata_keys": [
      "string"
    ],
    "excluded_llm_metadata_keys": [
      "string"
    ],
    "extra_info": {
      "foo": "bar"
    },
    "id_": "id_",
    "metadata_seperator": "metadata_seperator",
    "metadata_template": "metadata_template",
    "mimetype": "mimetype",
    "relationships": {
      "foo": {
        "node_id": "node_id",
        "class_name": "class_name",
        "hash": "hash",
        "metadata": {
          "foo": "bar"
        },
        "node_type": "1"
      }
    },
    "start_char_idx": 0,
    "text": "text",
    "text_template": "text_template"
  }
]
```

## Upsert Batch Pipeline Documents

`List<CloudDocument> pipelines().documents().upsert(DocumentUpsertParamsparams, RequestOptionsrequestOptions = RequestOptions.none())`

**put** `/api/v1/pipelines/{pipeline_id}/documents`

Batch create or update a document for a pipeline.

### Parameters

- `DocumentUpsertParams params`

  - `Optional<String> pipelineId`

  - `List<CloudDocumentCreate> body`

    - `Metadata metadata`

    - `String text`

    - `Optional<String> id`

    - `Optional<List<String>> excludedEmbedMetadataKeys`

    - `Optional<List<String>> excludedLlmMetadataKeys`

    - `Optional<List<Long>> pagePositions`

      indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].

### Example

```java
package com.llamacloud_prod.api.example;

import com.llamacloud_prod.api.client.LlamaCloudClient;
import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient;
import com.llamacloud_prod.api.core.JsonValue;
import com.llamacloud_prod.api.models.pipelines.documents.CloudDocument;
import com.llamacloud_prod.api.models.pipelines.documents.CloudDocumentCreate;
import com.llamacloud_prod.api.models.pipelines.documents.DocumentUpsertParams;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        DocumentUpsertParams params = DocumentUpsertParams.builder()
            .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e")
            .addBody(CloudDocumentCreate.builder()
                .metadata(CloudDocumentCreate.Metadata.builder()
                    .putAdditionalProperty("foo", JsonValue.from("bar"))
                    .build())
                .text("text")
                .build())
            .build();
        List<CloudDocument> cloudDocuments = client.pipelines().documents().upsert(params);
    }
}
```

#### Response

```json
[
  {
    "id": "id",
    "metadata": {
      "foo": "bar"
    },
    "text": "text",
    "excluded_embed_metadata_keys": [
      "string"
    ],
    "excluded_llm_metadata_keys": [
      "string"
    ],
    "page_positions": [
      0
    ],
    "status_metadata": {
      "foo": "bar"
    }
  }
]
```

## Domain Types

### Cloud Document

- `class CloudDocument:`

  Cloud document stored in S3.

  - `String id`

  - `Metadata metadata`

  - `String text`

  - `Optional<List<String>> excludedEmbedMetadataKeys`

  - `Optional<List<String>> excludedLlmMetadataKeys`

  - `Optional<List<Long>> pagePositions`

    indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].

  - `Optional<StatusMetadata> statusMetadata`

### Cloud Document Create

- `class CloudDocumentCreate:`

  Create a new cloud document.

  - `Metadata metadata`

  - `String text`

  - `Optional<String> id`

  - `Optional<List<String>> excludedEmbedMetadataKeys`

  - `Optional<List<String>> excludedLlmMetadataKeys`

  - `Optional<List<Long>> pagePositions`

    indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].

### Text Node

- `class TextNode:`

  Provided for backward compatibility.

  - `Optional<String> className`

  - `Optional<List<Double>> embedding`

    Embedding of the node.

  - `Optional<Long> endCharIdx`

    End char index of the node.

  - `Optional<List<String>> excludedEmbedMetadataKeys`

    Metadata keys that are excluded from text for the embed model.

  - `Optional<List<String>> excludedLlmMetadataKeys`

    Metadata keys that are excluded from text for the LLM.

  - `Optional<ExtraInfo> extraInfo`

    A flat dictionary of metadata fields

  - `Optional<String> id`

    Unique ID of the node.

  - `Optional<String> metadataSeperator`

    Separator between metadata fields when converting to string.

  - `Optional<String> metadataTemplate`

    Template for how metadata is formatted, with {key} and {value} placeholders.

  - `Optional<String> mimetype`

    MIME type of the node content.

  - `Optional<Relationships> relationships`

    A mapping of relationships to other node information.

    - `class RelatedNodeInfo:`

      - `String nodeId`

      - `Optional<String> className`

      - `Optional<String> hash`

      - `Optional<Metadata> metadata`

      - `Optional<NodeType> nodeType`

        - `_1("1")`

        - `_2("2")`

        - `_3("3")`

        - `_4("4")`

        - `_5("5")`

    - `List<RelatedNodeInfo>`

      - `String nodeId`

      - `Optional<String> className`

      - `Optional<String> hash`

      - `Optional<Metadata> metadata`

      - `Optional<NodeType> nodeType`

        - `_1("1")`

        - `_2("2")`

        - `_3("3")`

        - `_4("4")`

        - `_5("5")`

  - `Optional<Long> startCharIdx`

    Start char index of the node.

  - `Optional<String> text`

    Text content of the node.

  - `Optional<String> textTemplate`

    Template for how text is formatted, with {content} and {metadata_str} placeholders.
