Skip to content

feat: Add annotation-based priority control for static NodePool deprovisioning#2842

Open
adriananeci wants to merge 8 commits intokubernetes-sigs:mainfrom
adriananeci:annotation-based_priority_deprovisioning
Open

feat: Add annotation-based priority control for static NodePool deprovisioning#2842
adriananeci wants to merge 8 commits intokubernetes-sigs:mainfrom
adriananeci:annotation-based_priority_deprovisioning

Conversation

@adriananeci
Copy link
Copy Markdown

@adriananeci adriananeci commented Feb 5, 2026

Fixes #2841

Description

This PR adds support for the karpenter.sh/deprovisioning-priority annotation on NodeClaims, allowing users to explicitly control which nodes are removed first when scaling down static NodePools.

When scaling down static NodePools, users need control over which specific nodes are removed. Current behavior selects nodes based on internal heuristics (empty nodes, disruption cost), but doesn't allow explicit targeting. This is problematic when:

  1. Specific nodes are manually cordoned/drained and should be removed
  2. Certain nodes have known issues or degraded performance
  3. Users want to implement custom deprovisioning strategies
  4. Nodes need to be removed in a specific order during maintenance

Example scenarios from issue #2841:

  1. No manual nodeclaims deletion
  • Initial state: replicas=4, NodeClaims=[A, B, C, D], nodes don't have do-not-disrupt pods

  • User actions:

    • Drain, cordon and taint A, B manually
    • Scale down replicas=2
  • Expected:

    • Remove A, B nodes, keep C, D.
  • Reality:

    • Karpenter: Decides to remove C, D (based on disruption selection)
  1. Manual nodeclaims deletion
  • Initial state: replicas=4, NodeClaims=[A, B, C, D], nodes don't have do-not-disrupt pods

  • User actions:

    • Drain, cordon and taint A, B manually
    • Delete NodeClaims A, B manually
    • Scale down replicas=2
  • Expected:

    • Remove A, B nodes, keep C, D.
  • Reality:

    • Karpenter: Creates replacements E, F (sees replicas=4, actual=2) between nodeclaims deletion and replicas scale down to 2
    • Karpenter: Remove C, D (based on disruption cost heuristics)
    • nodepool ends up with new nodes(e, F), disrupting more than expected nodes.

This PR introduces karpenter.sh/deprovisioning-priority annotation with integer values (higher = deprovisioned first, default = 0). Priority is applied across all candidate selection tiers while maintaining safety guarantees.

Deprovisioning priority hierarchy:

  1. Do-not-disrupt status (always first - protects critical workloads)
  2. Priority annotation (higher values removed first)
  3. Disruption cost (existing heuristic as tiebreaker)

This hierarchy ensures safety-critical workloads remain protected while giving users fine-grained control over deprovisioning order.

Updated the docs from designs/static-capacity.md docs too

Usage Example

  # Annotate NodeClaims to prioritize for removal (higher = removed first)
  kubectl annotate nodeclaim unwanted-node-1 karpenter.sh/deprovisioning-priority=100
  kubectl annotate nodeclaim unwanted-node-2 karpenter.sh/deprovisioning-priority=90

  # Scale down - annotated nodes removed first
  kubectl scale nodepool my-pool --replicas=5

Use Cases

  1. Targeted removal: Mark specific nodes for removal when scaling down
  2. Manual intervention: Remove nodes that were cordoned/drained
  3. Rolling updates: Control replacement order during updates
  4. Zone rebalancing: Influence which zones lose capacity first

These changes should be fully backward compatibile since nodes without annotation default to priority 0
Also, existing deprovisioning behavior unchanged when annotation not used and invalid annotation values gracefully default to 0

Annotation based node priority deprovisioning inspiration came from https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/#pod-deletion-cost

How was this change tested?

New test coverage includes:

  • Priority sorting for unresolved NodeClaims
  • Priority sorting for empty nodes
  • Do-not-disrupt takes precedence over priority
  • Priority used as tiebreaker for non-empty nodes

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Feb 5, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adriananeci
Once this PR has been reviewed and has the lgtm label, please assign njtran for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 5, 2026
@adriananeci adriananeci force-pushed the annotation-based_priority_deprovisioning branch from 370f683 to b578bd7 Compare February 5, 2026 12:03
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 5, 2026
@adriananeci
Copy link
Copy Markdown
Author

Note for the reviewers: initial changes were minimal, but because they introduced a cyclomatic complexity issue I had to refactor the code a bit.

@coveralls
Copy link
Copy Markdown

coveralls commented Feb 5, 2026

Pull Request Test Coverage Report for Build 21879795034

Details

  • 53 of 57 (92.98%) changed or added relevant lines in 1 file are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 80.547%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/static/deprovisioning/controller.go 53 57 92.98%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/disruption/consolidation.go 4 91.26%
Totals Coverage Status
Change from base Build 21847703137: 0.02%
Covered Lines: 11900
Relevant Lines: 14774

💛 - Coveralls

@ellistarn
Copy link
Copy Markdown
Contributor

From a naming perspective, do we think that https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/#pod-deletion-cost is an appropriate analog? cc: @jamesmt-aws. I think we may want to support something like this for dynamic provisioning as well, to provide customers with a way to give Karpenter hints about their preferred consolidation choices.

@adriananeci
Copy link
Copy Markdown
Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow specific nodes to be removed while decreasing nodepool replicas in static mode

4 participants