Skip to content

feat: topology-aware drift disruption ordering via DriftPolicy#2963

Open
BrunoChauvet wants to merge 17 commits intokubernetes-sigs:mainfrom
BrunoChauvet:feat/drift-topology-policy
Open

feat: topology-aware drift disruption ordering via DriftPolicy#2963
BrunoChauvet wants to merge 17 commits intokubernetes-sigs:mainfrom
BrunoChauvet:feat/drift-topology-policy

Conversation

@BrunoChauvet
Copy link
Copy Markdown

@BrunoChauvet BrunoChauvet commented Apr 11, 2026

Description

When many nodes are drifted at once, Karpenter fans out disruption across all availability zones simultaneously. This is fast, but it can reduce availability for workloads that are sensitive to how replacements roll across failure domains.

For instance stateful workloads replicating data across topology domains, may be impacted of nodes holding the same partition across distinct domains are concurrently disrupted.

This PR adds an opt-in DriftPolicy field to NodePool.Spec.Disruption that lets cluster operators control how drift disruption proceeds across topology domains. When topologyKey is set, Karpenter disrupts one domain at a time — finishing replacements in us-east-1a before moving to us-east-1b, for example — rather than spreading disruption across all zones at once. An optional maxConcurrentPerDomain field (absolute or percentage) controls how many nodes can be replaced simultaneously within the active domain.

The active domain is selected alphabetically, which gives stable and predictable behaviour across controller restarts and as replacements progress. Nodes that do not carry the topology label are unaffected and remain eligible for disruption regardless. Without DriftPolicy set, behaviour is unchanged from today.

# Disrupt one zone at a time, one node at a time
disruption:
  driftPolicy:
    topologyKey: topology.kubernetes.io/zone

# Disrupt one zone at a time, up to 3 nodes at once
disruption:
  driftPolicy:
    topologyKey: topology.kubernetes.io/zone
    maxConcurrentPerDomain: "3"

DriftPolicy is intentionally separate from Budget — budgets control how many disruptions are allowed in total; DriftPolicy controls the order in which domains are disrupted. The two are orthogonal and can be combined.

Documentation update aws/karpenter-provider-aws#9073

How was this change tested?

  • Unit tests for DriftPolicy API validation (valid keys, invalid keys, percentage values)
  • Integration tests for domain selection ordering, per-domain budget gating, configurable concurrency, unlabelled node fall-through, and the empty-first invariant within the active domain
  • CRDs regenerated and verified
  • make presubmit

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Apr 11, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 11, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @BrunoChauvet. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BrunoChauvet
Once this PR has been reviewed and has the lgtm label, please assign jonathan-innis for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 11, 2026
@BrunoChauvet BrunoChauvet force-pushed the feat/drift-topology-policy branch from 38a8f18 to a87e395 Compare April 11, 2026 16:50
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 11, 2026
Expand the field-level and type-level comments to explain the
what, why, and how of DriftPolicy:
- Clarify the relationship with Budgets (orthogonal, not overlapping)
- Document the alphabetical domain selection rationale
- Note that unlabelled nodes are unaffected
- Add examples to TopologyKey and MaxConcurrentPerDomain
@BrunoChauvet BrunoChauvet force-pushed the feat/drift-topology-policy branch from baa307a to 355d0f6 Compare April 11, 2026 17:32
@BrunoChauvet
Copy link
Copy Markdown
Author

/easycla

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 11, 2026
…yclo limit

Extract per-domain budget check into domainBudgetOK helper to bring
applyDriftPolicies complexity from 14 down to <= 11 (gocyclo limit).
…reformat

- Fix British spelling "behaviour" → "behavior" in nodepool.go godoc
  (golangci-lint misspell linter, locale: US)
- Remove extra column-alignment whitespace in drift_test.go struct literals
  (golangci-lint goimports formatter)
- Regenerate CRDs with updated YAML indentation style from controller-gen
  (go generate ./... + cp pkg/apis/crds kwok/charts after spelling fix)

The root cause was a sequencing issue: go generate ran before golangci-lint
fixed the spelling in nodepool.go, so previously committed CRDs still had the
British spelling. Running go generate after the spelling fix produces stable,
idempotent output that satisfies make verify.
Two DriftPolicy tests had wrong expectations:

1. "should respect maxConcurrentPerDomain budget gate":
   - nodeClaim1 has DeletionTimestamp (in-flight); budget=1 is exhausted
   - Controller correctly returns 0 new commands
   - Test was asserting HaveLen(1) — fixed to HaveLen(0)

2. "should allow second disruption when maxConcurrentPerDomain=2 with only 1 in-flight":
   - nodeClaim1 has DeletionTimestamp; SimulateScheduling skips it (errCandidateDeleting)
   - Controller correctly disrupts nodeClaim2 (the non-deleting candidate)
   - Test was asserting nodeClaim1.Name — fixed to nodeClaim2.Name
These were internal working documents, not part of the upstream contribution.
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 11, 2026
BrunoChauvet added a commit to BrunoChauvet/karpenter-provider-aws that referenced this pull request Apr 11, 2026
Add a new "Topology-Aware Drift Ordering (DriftPolicy)" subsection under
the Drift section of the disruption concepts page.

Covers:
- What problem DriftPolicy solves (fan-out blast radius during fleet-wide drift)
- The two fields: topologyKey and maxConcurrentPerDomain
- How active-domain selection works (in-flight first, then alphabetical)
- Interaction with existing NodePool Disruption Budgets
- Example NodePool YAML

Related upstream PR: kubernetes-sigs/karpenter#2963
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 14, 2026
Adds designs/drift-topology-policy.md per upstream maintainer request.
The RFC leads with Kafka/Cassandra use cases (quorum loss, rebalance
storms), explains why Budget.Nodes is insufficient, and covers the
four key design choices: Budget scope, CRD placement, alphabetical
domain selection, and drift-only scope.
@BrunoChauvet BrunoChauvet force-pushed the feat/drift-topology-policy branch from 04cf692 to d5176cb Compare April 16, 2026 16:11
@BrunoChauvet
Copy link
Copy Markdown
Author

/easycla

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 16, 2026
- Add EBS single-attach volume sequence to explain why concurrent
  cross-zone replacement creates simultaneous partition vacancies
- Explicitly address why PDB/surge eviction doesn't solve the problem:
  it controls eviction pacing (how), not node selection (which)
- Note custom eviction webhooks as complementary but insufficient
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants