Skip to content

Add spec.shards (named shards) API for MongoDB sharded clusters#1014

Draft
lsierant wants to merge 4 commits intomasterfrom
lsierant/named-shards
Draft

Add spec.shards (named shards) API for MongoDB sharded clusters#1014
lsierant wants to merge 4 commits intomasterfrom
lsierant/named-shards

Conversation

@lsierant
Copy link
Copy Markdown
Contributor

@lsierant lsierant commented Apr 23, 2026

Summary

Introduces spec.shards: [{shardName, shardId?}] as an alternative to spec.shardCount for declaring shards with stable, explicit identities. The two forms are mutually exclusive and collapsed internally via a new ResolvedShards() helper, so existing shardCount-based clusters are unaffected and no yaml change is required after upgrade.

Webhook validation guards migrations: going from spec.shardCount to spec.shards must preserve identity (each shardName at position i equals the previously implicit <mdb-name>-<i>) or the update is rejected. Subsequent shards -> shards updates enforce shardId immutability per shardName. The deprecated spec.shardSpecificPodSpec is forbidden in named mode.

Opens the door to two follow-up features not implemented here: VM-to-k8s migration (via shardId != shardName) and shard-removal-by-name (drain state machine).

Proof of Work

  • api/v1/mdb/named_shards_validation_test.go — validation cases: mutex, DNS-1123, uniqueness, immutability, migration-typo rejection, reorder rejection
  • controllers/operator/mongodbshardedcluster_controller_named_shards_test.go — full-reconcile tests against the fake client + mocked OM proving the core invariant: flipping shardCount -> spec.shards with identity-preserving names produces byte-identical StatefulSet specs and OM sharded-cluster configuration
  • docker/mongodb-kubernetes-tests/tests/shardedcluster/sharded_cluster_named_shards.py — e2e task e2e_sharded_cluster_named_shards covering: create with shardCount, scale up, rejected migrations (typo + reorder), fixed identity-preserving migration (asserts STS .metadata.generation and AC version unchanged), appending a custom-named shard, removing an index-based shard

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you added changelog file?
    • skip-changelog label applied — design/feature under investigation, changelog will be added before merge

Introduces `spec.shards: [{shardName, shardId?}]` as an alternative to
`spec.shardCount` for declaring shards with stable, explicit identities.
The two forms are mutually exclusive and collapsed internally via a new
`ResolvedShards()` helper, so existing shardCount-based clusters are
unaffected and no yaml change is required after upgrade.

Webhook validation guards migrations: going from spec.shardCount to
spec.shards must preserve identity (each shardName at position i equals
the previously implicit "<mdb-name>-<i>") or the update is rejected.
Subsequent shards -> shards updates enforce shardId immutability per
shardName. The deprecated spec.shardSpecificPodSpec is forbidden in
named mode.

Testing:
 - validation unit tests for mutex, DNS-1123, uniqueness, immutability,
   migration-typo rejection, and reorder rejection
 - full-reconcile tests against the fake client + mocked OM proving
   that flipping from shardCount to spec.shards with identity-
   preserving names does not change any StatefulSet spec or the OM
   sharded-cluster configuration (the core safety invariant)
 - e2e test `e2e_sharded_cluster_named_shards` covering: create with
   shardCount, scale up, rejected migrations (typo + reorder), fixed
   identity-preserving migration (asserts STS generation and AC version
   unchanged), appending a custom-named shard, and removing an
   index-based shard
@lsierant lsierant added the skip-changelog Use this label in Pull Request to not require new changelog entry file label Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.8.1 Release Notes

…rsistent shard state, OM shardId plumbing

- Rewrite removeUnusedStatefulsets to diff deployed shards vs desired by name.
  Tail-based removal deleted the wrong STS when removing a middle named
  shard; it now iterates names that disappeared and deletes each by
  shard-name + member-cluster. Trigger widened from count comparison to
  hasShardsToRemove so same-count swaps also run the cleanup.
- Persist ShardStateEntry list in ShardedClusterDeploymentState as the
  source of truth for deployed shard names across reconciles. Falls back
  to LastAchievedSpec.Shards and then to synthesised names from
  Status.ShardCount so legacy state written by older operator versions
  continues to produce byte-identical results.
- Split OM newShard into (id, rsName) and thread ShardIds through
  DeploymentShardedClusterMergeOptions so resolved ShardId lands in the
  automation config _id while the replica-set name remains the STS name.
  Legacy spec.shardCount path passes nil IDs — behaviour unchanged.
- Export SynthesizedShardName from api/v1/mdb for reconciler use.
- New unit test TestNamedShards_RemoveMiddleShardDeletesCorrectSts proves
  the fix; TestMigrateToNewDeploymentState updated for the new
  `shards` key in persisted state JSON.
…producer

The e2e test TestRemoveIndexBasedShard.test_remove_middle_shard failed on
ex=0 because TestCreateWithShardCount.shard_collection(shards_count=2) had
previously assigned zone-0/zone-1 with pinned chunk ranges to the first
two shards. MongoDB then blocks removeShard sh-named-shards-1 with
ZoneStillInUse ("only shard for zone zone-1 which has a chunk range"),
so the mongos agent can never acknowledge the new automation-config
version and the operator sits in Pending until the 1400s timeout.

- e2e: call `mongod_tester.prepare_for_shard_removal(...)` before
  updating spec.shards to drop sh-named-shards-1 — this flattens zone
  membership across the remaining shards so chunks in zone-1 can
  migrate off.
- unit: add TestNamedShards_RemoveMiddleShardDoesNotCreateSpuriousSts,
  the focused reproducer for the exact e2e scenario. It starts from
  spec.shards=[A,B,C,extra-shard-alpha] (with a custom-named shard so
  the synthesised tail name would no longer coincide with any real
  deployed shard), drops the middle, and asserts:
    * no spurious "mdbs-3" STS is ever created
    * the correct STS (mdbs-1) is deleted
    * OM shards[] array is exactly the desired set
    * no mdbs-1 or mdbs-3 processes remain after finalize
The existing shardIdentityImmutable validator caught shardId rewrites
(same shardName, different shardId) but not renames (same shardId,
different shardName). Case 2 now builds a map by shardId as well and
rejects a new shard whose shardId matches an old entry with a different
shardName, with an explicit "in-place renames are not supported" message.

Tests cover both the explicit-shardId rename case and the implicit case
(old side defaults shardId from shardName, new side pins the old shardName
as shardId while the shardName itself changes).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog Use this label in Pull Request to not require new changelog entry file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant