Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
255 changes: 204 additions & 51 deletions _api-reference/document-apis/index-document.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,64 +11,56 @@
**Introduced 1.0**
{: .label .label-purple}

You can use the `Index document` operation to add a single document to your index.
The Index Document API adds a JSON document to a specified index and makes it searchable. If a document with the same ID already exists, the API updates the document and increments its version number.


## Endpoints

```json
PUT <index>/_doc/<_id>
POST <index>/_doc
PUT {index}/_doc/{id}
POST {index}/_doc

PUT <index>/_create/<_id>
POST <index>/_create/<_id>
PUT {index}/_create/{id}
POST {index}/_create/{id}
```

- PUT adds or updates documents in the index with a specified ID. Used for controlled document creation or updates.
- POST adds documents with auto-generated IDs to the index. Useful for adding new documents without specifying IDs.
- `_create` is a type identifier indicating that document creation should only occur if the document with the specified ID doesn't already exist.
- `<index>` represents the name of the index to which the document will be added.
- `<_id>` represents the unique identifier of the document.
Use the following endpoint combinations to control how documents are indexed:

## Adding a sample index

Sample data can be added to the index with curl commands in the terminal or through the API.

To test the Document APIs, add a document by following these steps:
1. Open OpenSearch Dashboards.
2. Navigate to the actions menu.
3. In the **Management** section, choose **Dev Tools**.
4. Enter a command, and then select the green triangle play button to send the request. The following are some example commands.
- `PUT {index}/_doc/{id}`: Adds a new document with a specified ID or updates an existing document with the same ID.
- `POST {index}/_doc`: Adds a new document and automatically generates a unique ID.
- `PUT {index}/_create/{id}` or `POST {index}/_create/{id}`: Adds a new document with a specified ID only if a document with that ID does not already exist. If the document exists, the operation fails.


## Path parameters

Parameter | Type | Description | Required
:--- | :--- | :--- | :---
&lt;index&gt; | String | Name of the index. | Yes
&lt;id&gt; | String | A unique identifier to attach to the document. To automatically generate an ID, use `POST <target>/doc` in your request instead of PUT. | No
The following table lists the available path parameters.

| Parameter | Data type | Description |
| :--- | :--- | :--- |
| `index` | String | The name of the index. If the index does not exist, OpenSearch creates it automatically unless automatic index creation is disabled. Required. |
| `id` | String | The unique document ID. Required when using PUT. Omit this parameter when using POST to let OpenSearch automatically generate a unique ID. |

## Query parameters

In your request, you must specify the index you want to add your document to. If the index doesn't already exist, OpenSearch automatically creates the index and adds in your document. All other parameters are optional.

Parameter | Type | Description | Required
:--- | :--- | :--- | :---
if_seq_no | Integer | Only perform the index operation if the document has the specified sequence number. | No
if_primary_term | Integer | Only perform the index operation if the document has the specified primary term.| No
op_type | Enum | Specifies the type of operation to complete with the document. Valid values are `create` (index a document only if it doesn't exist) and `index`. If a document ID is included in the request, then the default is `index`. Otherwise, the default is `create`. | No
pipeline | String | Route the index operation to a certain pipeline. | No
routing | String | value used to assign the index operation to a specific shard. | No
refresh | Enum | If true, OpenSearch refreshes shards to make the operation visible to searching. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. | No
timeout | Time | How long to wait for a response from the cluster. Default is `1m`. | No
version | Integer | The document's version number. | No
version_type | Enum | Controls external version conflict handling for the index operation. Valid values are `external` (index the document only if the specified version number is greater than the document's current version) and `external_gte` (index the document only if the specified version number is greater than or equal to the document's current version). For example, to index version 3 of a document, use `/_doc/1?version=3&version_type=external`. | No
wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. | No
require_alias | Boolean | Specifies whether the target index must be an index alias. Default is `false`. | No
The following table lists the available query parameters. All query parameters are optional.

| Parameter | Data type | Description |
| :--- | :--- | :--- |
| `if_seq_no` | Integer | Only performs the operation if the document's current sequence number matches the specified value. Used for optimistic concurrency control. See [Optimistic concurrency control](#optimistic-concurrency-control). |
| `if_primary_term` | Integer | Only performs the operation if the document's current primary term matches the specified value. Used for optimistic concurrency control. See [Optimistic concurrency control](#optimistic-concurrency-control). |
| `op_type` | Enum | The operation type. Valid values are `create` (indexes a document only if it does not already exist) and `index` (creates a new document or updates an existing document). If a document ID is specified, the default is `index`. Otherwise, the default is `create`. |
| `pipeline` | String | The ID of the ingest pipeline to use for preprocessing the document before indexing. |
| `routing` | String | A custom routing value used to route the operation to a specific shard. See [Routing](#routing). |
| `refresh` | Enum | Whether to refresh the affected shards after the operation. Valid values are `true` (refresh immediately), `false` (do not refresh), and `wait_for` (wait for a refresh to occur before responding). Default is `false`. See [Refresh](#refresh). |
| `timeout` | Time | The amount of time to wait for the primary shard to become available if it is unavailable. Default is `1m`. See [Timeout](#timeout). |
| `version` | Integer | The explicit version number for concurrency control. The document is only indexed if its current version matches this value. See [Versioning](#versioning). |
| `version_type` | Enum | The version type for external versioning. Valid values are `external` (only indexes if the specified version is greater than the stored version) and `external_gte` (only indexes if the specified version is greater than or equal to the stored version). Default is `internal`. See [Versioning](#versioning). |
| `wait_for_active_shards` | String | The number of active shard copies required before proceeding with the operation. Valid values are `all` or a positive integer up to the total number of shards. Default is `1` (only the primary shard). See [Wait for active shards](#wait-for-active-shards). |
| `require_alias` | Boolean | Whether the target index name must be an index alias. If `true` and the target is not an alias, the request fails. Default is `false`. |

## Example requests

The following example requests create a sample index document for an index named `sample_index`:
The following example requests create a sample index document for an index named `sample_index`.


### Example PUT request
Expand Down Expand Up @@ -172,15 +164,176 @@

## Response body fields

Field | Description
:--- | :---
_index | The name of the index.
_id | The document's ID.
_version | The document's version.
result | The result of the index operation.
_shards | Detailed information about the cluster's shards.
total | The total number of shards.
successful | The number of shards OpenSearch successfully added the document to.
failed | The number of shards OpenSearch failed to add the document to.
_seq_no | The sequence number assigned when the document was indexed.
_primary_term | The primary term assigned when the document was indexed.
The following table lists all response body fields.

| Field | Data type | Description |
| :--- | :--- | :--- |
| `_index` | String | The name of the index to which the document was added. |
| `_id` | String | The document's unique identifier. |
| `_version` | Integer | The document's version number. Incremented each time the document is updated. |
| `result` | String | The result of the indexing operation. Possible values are `created` (a new document was created) and `updated` (an existing document was updated). |
| `_shards` | Object | Information about the replication process. |
| `_shards.total` | Integer | The number of shard copies (primary and replicas) on which the operation should be executed. |
| `_shards.successful` | Integer | The number of shard copies on which the operation succeeded. When the operation succeeds, this value is at least 1 (the primary shard). |
| `_shards.failed` | Integer | The number of shard copies on which the operation failed. If the operation succeeds, this value is 0. |
| `_seq_no` | Integer | The sequence number assigned to the document for this indexing operation. Sequence numbers are used to ensure that an older version of a document does not overwrite a newer version. See [Optimistic concurrency control](#optimistic-concurrency-control). |
| `_primary_term` | Integer | The primary term assigned to the document for this indexing operation. See [Optimistic concurrency control](#optimistic-concurrency-control). |


## Automatic index creation

By default, if the specified index does not exist, the Index Document API automatically creates it and applies any configured index templates. The API also creates a dynamic mapping for new fields if no explicit mapping exists.

Automatic index creation is controlled by the `action.auto_create_index` setting. By default, this setting is `true`, allowing any index to be created automatically. You can modify this setting to allow or block index creation based on specific patterns or disable automatic index creation entirely. For more information, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/).

## Optimistic concurrency control

You can use the `if_seq_no` and `if_primary_term` parameters to perform conditional indexing based on the document's current sequence number and primary term. This ensures that the operation only succeeds if the document has not been modified since you last retrieved it.

For example, to update a document only if it has sequence number 3 and primary term 1, include these parameters in your request:

```json
PUT sample-index/_doc/1?if_seq_no=3&if_primary_term=1
{
"name": "Updated Example",
"price": 39.99
}
```

If the sequence number or primary term does not match the current values, OpenSearch returns a version conflict error (HTTP 409), allowing you to retrieve the latest version and retry the operation.

## Automatic ID generation

When you use the POST method without specifying a document ID, OpenSearch automatically generates a unique ID for the document. The `op_type` is automatically set to `create`, ensuring a new document is always created.

The following example indexes a document without specifying an ID, allowing OpenSearch to generate one automatically:

```json
POST sample-index/_doc
{
"user": "john_doe",
"post_date": "2024-01-15T10:30:00",
"message": "Hello, OpenSearch!"
}
```

The response includes the automatically generated ID:

```json
{
"_index": "sample-index",
"_id": "W0tpsmIBdwcYyG50zbta",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
}
}
```

The generated ID is a Base64-encoded UUID that ensures uniqueness across your cluster.

## Routing

By default, OpenSearch determines which shard stores a document by computing a hash of the document's ID. You can override this behavior by providing a custom `routing` parameter value.

The following example routes the document to a shard based on the routing value `user123`:

```json
POST sample-index/_doc?routing=user123
{
"user": "john_doe",
"message": "Hello, world!"
}
```

When you use custom routing during indexing, you must provide the same routing value when retrieving, updating, or deleting the document. Otherwise, OpenSearch cannot locate the document.

## Distributed model

The index operation is directed to the primary shard based on the document's routing value (either the document ID or a custom routing value). Once the primary shard completes the operation, OpenSearch distributes the update to all applicable replica shards in the replication group.

This distributed approach ensures that all shard copies remain synchronized. The primary shard coordinates the replication process and waits for confirmation from the required number of active shards before acknowledging success to the client.

## Wait for active shards

To improve write operation resiliency, you can configure the Index Document API to wait for a certain number of active shard copies before proceeding. By default, the operation waits only for the primary shard to be active (`wait_for_active_shards=1`).

You can set `wait_for_active_shards` to `all` or any positive integer up to the total number of shard copies (`number_of_replicas + 1`). If the required number of active shards is not available, the operation waits and retries until the shards become available or a timeout occurs.

For example, consider a cluster with three nodes (A, B, and C) and an index with `number_of_replicas` set to 3, resulting in 4 shard copies (one primary and three replicas). By default, an indexing operation proceeds as long as the primary shard is available, even if nodes B and C are down and node A hosts the primary shard copy.

If you set `wait_for_active_shards=3` on the request, the indexing operation requires 3 active shard copies before proceeding. This requirement can be met when all 3 nodes are running, with each node containing a copy of the shard. However, if you set `wait_for_active_shards=all` (or `4`), the indexing operation does not proceed because you need all 4 copies active, but only 3 nodes exist. The operation times out unless a new node joins the cluster to host the fourth shard copy.

The following example requires at least 2 active shard copies (the primary and one replica) before proceeding:

```json
PUT sample-index/_doc/1?wait_for_active_shards=2
{
"name": "Example",
"price": 29.99
}
```

This setting reduces the risk of writing to an insufficient number of shard copies but does not eliminate it entirely. The check occurs before the write operation begins. Once the operation is underway, replication can still fail on some replicas while succeeding on the primary. The `_shards` section of the response indicates how many shard copies succeeded or failed.

## Refresh

The `refresh` parameter controls when indexed documents become visible to search operations. For most use cases, use the default value (`false`) for optimal performance.

Valid options are:

- `false` (default): The document becomes visible according to the index refresh interval (by default, 1 second).
- `true`: Forces an immediate refresh after indexing, making the document immediately searchable. Use sparingly, as frequent refreshes can significantly impact performance.
- `wait_for`: Waits for the next scheduled refresh before responding. More efficient than `true` for batch operations.

## Timeout

If the primary shard is unavailable when you submit an index request (for example, during recovery or relocation), the operation waits for up to 1 minute by default before failing. You can adjust this behavior using the `timeout` parameter:

```json
PUT sample-index/_doc/1?timeout=5m
{
"name": "Example",
"price": 29.99
}
```

## Versioning

Every indexed document has a version number. By default, OpenSearch uses internal versioning, starting at 1 and incrementing with each update or delete operation.

For external versioning (such as maintaining version numbers in a separate database), set the `version_type` parameter to control how OpenSearch handles version conflicts. The following table lists the available version types.

| Version type | Description |
| :--- | :--- |
| `internal` | Only indexes the document if the specified version is identical to the version of the stored document. This is the default version type. |
| `external` or `external_gt` | Only indexes the document if the specified version is strictly greater than the version of the stored document or if there is no existing document. The specified version is used as the new version and stored with the document. The supplied version must be a non-negative long integer. |
| `external_gte` | Only indexes the document if the specified version is greater than or equal to the version of the stored document. If there is no existing document, the operation succeeds. The specified version is used as the new version and stored with the document. The supplied version must be a non-negative long integer. |

The `external_gte` version type is intended for special use cases and should be used with care. If used incorrectly, it can result in data loss.

For example, to index a document using external versioning:

```json
PUT sample-index/_doc/1?version=5&version_type=external
{
"name": "Example",
"price": 29.99,
"description": "Updated from external system"
}
```

If the provided version does not meet the requirements of the specified version type, OpenSearch returns a version conflict error. Versioning is completely real time and is not affected by the near-real-time aspects of search operations.

## Noop updates

Check failure on line 333 in _api-reference/document-apis/index-document.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/index-document.md#L333

[OpenSearch.Spelling] Error: Noop. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Noop. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_api-reference/document-apis/index-document.md", "range": {"start": {"line": 333, "column": 4}}}, "severity": "ERROR"}

When you update a document using the Index Document API, OpenSearch always creates a new version of the document, even if the document content has not changed. This behavior can be inefficient if you frequently reindex documents with the same content.

If you need to avoid creating unnecessary document versions, use the [Update Document API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/update-document/) with the `detect_noop` parameter set to `true`. The Update API fetches the existing document, compares it to the new content, and only creates a new version if the content has changed.

The Index Document API does not support noop detection because it does not fetch the old source for comparison. Whether noop updates are problematic depends on several factors, including how frequently your data source sends updates that do not change the document and the query load on the shard receiving the updates.

Check failure on line 339 in _api-reference/document-apis/index-document.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/index-document.md#L339

[OpenSearch.Spelling] Error: noop. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: noop. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_api-reference/document-apis/index-document.md", "range": {"start": {"line": 339, "column": 41}}}, "severity": "ERROR"}
Loading