Skip to content

Document --max-shard-size-bytes support for shards larger than 80 GiB#12204

Open
sumobrian wants to merge 1 commit intoopensearch-project:mainfrom
sumobrian:doc/max-shard-size-bytes
Open

Document --max-shard-size-bytes support for shards larger than 80 GiB#12204
sumobrian wants to merge 1 commit intoopensearch-project:mainfrom
sumobrian:doc/max-shard-size-bytes

Conversation

@sumobrian
Copy link
Copy Markdown
Member

Description

Documents how to configure RFS to migrate shards larger than the default 80 GiB limit using the --max-shard-size-bytes flag.

Changes

  • is-migration-assistant-right-for-you.md: Simplified the 80 GiB bullet to note that larger shards can be configured, with a link to the configuration options page for details.
  • configuration-options.md: Added a "Configuring large shard support" subsection under the RFS backfill config section, with a reindexFromSnapshotExtraArgs example and disk space guidance.
  • backfill.md: Added a Troubleshooting section with a "Shards appear stuck with no errors" entry describing the symptom and linking to the configuration fix.

Context

Users migrating clusters with shards exceeding 80 GiB encounter a situation where the shard appears stuck indefinitely with no errors in the backfill status output. The RFS worker silently rejects the shard due to the default --max-shard-size-bytes limit (80 GB), and the work item is repeatedly acquired and failed without progress. This documentation update makes the configuration path discoverable.

Issues Resolved

N/A

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference).

- `include_global_state: true` – Ensures that global cluster state is included.
- `compress: false` – Disables metadata compression, which is required for compatibility with RFS.
- Shards of up to **80 GiB** are supported by default. Larger shard sizes can be configured, **except in AWS GovCloud (US)**, where 80 GiB is the maximum.
- Shards of up to **80 GiB** are supported by default. Larger shard sizes can be configured. For details, see [Backfill migration using RFS]({{site.url}}{{site.baseurl}}/migration-assistant/migration-phases/deploy/configuration-options/#backfill-migration-using-rfs). **In AWS GovCloud (US)**, 80 GiB is the maximum supported shard size.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about : "In AWS GovCloud (US) with the ECS deployment, 80 GiB is the maximum supported shard size"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good callout, but I don’t want to fragment the documentation down to individual leaf nodes. Instead, the documentation should clearly direct users: for ECS, go here; for EKS, go here.

- `include_global_state: true` – Ensures that global cluster state is included.
- `compress: false` – Disables metadata compression, which is required for compatibility with RFS.
- Shards of up to **80 GiB** are supported by default. Larger shard sizes can be configured, **except in AWS GovCloud (US)**, where 80 GiB is the maximum.
- Shards of up to **80 GiB** are supported by default. Larger shard sizes can be configured. For details, see [Backfill migration using RFS]({{site.url}}{{site.baseurl}}/migration-assistant/migration-phases/deploy/configuration-options/#backfill-migration-using-rfs). **In AWS GovCloud (US)**, 80 GiB is the maximum supported shard size.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cross link anchor should change to #configuring-large-shard-support

By default, RFS supports shards of up to **80 GiB**. To migrate larger shards, pass the `--max-shard-size-bytes` flag through `reindexFromSnapshotExtraArgs`. For example, to support shards up to 200 GiB:

```json
"reindexFromSnapshotExtraArgs": "--max-shard-size-bytes 200000000000"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFS code uses binary GiB internally (80 * 1024 * 1024 * 1024L), so the flag value should also use binary

"reindexFromSnapshotExtraArgs": "--max-shard-size-bytes 214748364800"

```

Ensure that your worker nodes have sufficient local disk space, because RFS requires approximately **2x the shard size** in local storage to unpack and process the Lucene index. For more information about available RFS arguments, see the [DocumentsFromSnapshotMigration README](https://github.com/opensearch-project/opensearch-migrations/blob/main/DocumentsFromSnapshotMigration/README.md#arguments).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing a {% include copy.html %} block here


### Shards appear stuck with no errors

If `console backfill status --deep-check` shows shards that remain in progress indefinitely with no errors in the logs, the shard may exceed the default **80 GiB** size limit. Shards larger than this limit are silently rejected by RFS workers and will never complete. To resolve this, increase the `--max-shard-size-bytes` value in your deployment configuration. For details, see [Configuring large shard support]({{site.url}}{{site.baseurl}}/migration-assistant/migration-phases/deploy/configuration-options/#configuring-large-shard-support).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"silently rejected" seems too harsh here. How about : "are skipped by RFS workers without surfacing an error in the backfill status output"

- Update is-migration-assistant-right-for-you.md to note larger shards can be configured and link to configuration options
- Add 'Configuring large shard support' section to configuration-options.md with --max-shard-size-bytes usage and disk space requirements
- Add troubleshooting entry to backfill.md for shards that appear stuck due to exceeding the default size limit

Signed-off-by: Brian Presley <bjpres@amazon.com>
@sumobrian sumobrian force-pushed the doc/max-shard-size-bytes branch from 91053e8 to 6f3d00b Compare April 8, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 3.6 Tech review PR: Tech review in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants