Skip to content

[BUG] Scale down failure leaves leaked temporary cluster block (id=20), index permanently write-blocked #21188

@HUSTERGS

Description

@HUSTERGS

Describe the bug

When performing a scale down operation via prepareScaleSearchOnly(index, true), the operation adds a temporary index cluster block (id=20, 'preparing to scale down'). If the subsequent shard sync phase fails (for example, due to Transport/Network failure, node-side rejection, or exceptions in the underlying dataNode), the temporary block is never cleaned up, leaving the index in a write-blocked but not search-only state. This makes the index unusable for normal writes, with the block persisting until the entire cluster is restarted or a special recover/fix routine is applied. Index settings changes (including index.blocks.search_only and index.blocks.read_only) cannot clear this temporary block since it has a different UUID than the setting-based blocks.

Related component

Search

To Reproduce

  1. Create an index with segment replication and remote store enabled, plus at least one search replica.
  2. Ensure cluster is green.
  3. Set up a test (e.g., using MockTransportService) to intercept and force the TransportScaleIndexAction.NAME transport request on data node(s) to throw an exception.
  4. Trigger scale down: client().admin().indices().prepareScaleSearchOnly(index, true).get()
  5. Observe that the operation fails as expected (shard sync phase). Examine the cluster state:
    • clusterState.blocks().hasIndexBlockWithId(index, 20) returns true: The block lingers.
    • The index is not marked as search_only in settings (search_only setting is false).
    • Normal writes are now blocked (ClusterBlockException with block id 20).
  6. Try updating index.blocks.search_only or index.blocks.read_only to false; the block remains.
  7. Only a full cluster restart removes the temporary block from state.

Expected behavior

If scale down fails (e.g., due to shard sync failure), the temporary block should be removed and the index should return to normal operation. Write operations should be allowed, and normal state/setting updates should work. There should never be a persistent write-block with no way to recover except restart.

Additional Details

Opensearch Version: (e.g., 2.x~latest main)

Plugins:
None required to reproduce.

Impact:

  • If scale down fails, the index becomes permanently write-blocked because a temporary block is leaked.
  • Even index.blocks.read_only and index.blocks.search_only cannot clear it, since the UUID is different.
  • Only a cluster restart (or dirty state update tool) will clear the block.
  • This can be reproduced deterministically in IT using MockTransportService's addRequestHandlingBehavior to throw on TransportScaleIndexAction.NAME.

Potential fix:

The scale-down flow should ensure the temporary block is always removed if shard sync fails (e.g., rollback in the listener's onFailure callback from proceedWithScaleDown).

See the test example in this report; it's related to code in TransportScaleIndexAction, AddBlockClusterStateUpdateTask, and ScaleIndexShardSyncManager.

Metadata

Metadata

Assignees

No one assigned

    Labels

    SearchSearch query, autocomplete ...etcbugSomething isn't workinguntriaged

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions