|
| 1 | +# RFC: Allow disrupting nodes before creating replacements for static NodePools |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This design proposes an optional setting on static NodePools that allows for applying disruption actions without needing to spin up a replacement Node. Currently, Karpenter's drift resolution workflow requires a replacement Node to be spun up and available before the drifted node is terminated, which is not viable in certain static capacity scenarios. |
| 6 | + |
| 7 | +Fixes https://github.com/kubernetes-sigs/karpenter/issues/2905 |
| 8 | + |
| 9 | +## User stories |
| 10 | + |
| 11 | +1. As a cluster operator, I want Karpenter to automate drift resolutions for limited/rare capacity types (ex. GPU instances, constrained ODCRs etc.) without manual intervention. |
| 12 | + |
| 13 | +## Problem statement |
| 14 | + |
| 15 | +Karpenter is used to manage automated drift resolution safely by cluster operators when rolling out new node software (ex. new OS upgrades, Kubernetes versions etc.). Karpenter provides multiple safety controls such as Node disruption budgets and do-not-disrupt annotations on Nodes and pods. In addition, Karpenter also spins up replacement nodes before starting eviction and termination of the drifted node. However, in some scenrios, users have very static pools of capacity to choose from - two common cases are expensive instance types (ex. GPUs) and capacity reservations (ex. Amazon ODCRs) where it is not desirable to spin up additional high-cost or limited capacity before scaling down the drifted node. |
| 16 | + |
| 17 | +## Proposed design |
| 18 | + |
| 19 | +The proposed design is a fairly simple extension over the current NodePool API and disruption controller. |
| 20 | + |
| 21 | +** API changes ** |
| 22 | + |
| 23 | +The API introduces a new field (driftResolutionPolicy) modeled in a similar manner to consolidationPolicy. This allows optional extensibility to other drift resolution policies such as in-place upgrades (example: https://github.com/aws/karpenter-provider-aws/issues/8735) |
| 24 | + |
| 25 | +```yaml |
| 26 | +apiVersion: karpenter.sh/v1 |
| 27 | +kind: NodePool |
| 28 | +metadata: |
| 29 | + name: expensive-gpu-pool |
| 30 | +spec: |
| 31 | + disruption: |
| 32 | + # ... existing fields |
| 33 | + |
| 34 | + # New field |
| 35 | + replacementPolicy: CreateReplacement | DoNotCreateReplacement # Default: CreateReplacement |
| 36 | + |
| 37 | +``` |
| 38 | +
|
| 39 | +** Design considerations ** |
| 40 | +
|
| 41 | +One consideration is whether there is value is opening up the DoNotCreateReplacement strategy to static NodePools only vs all NodePools. |
| 42 | +
|
| 43 | +Option 1: Static NodePools only |
| 44 | +
|
| 45 | +Pros: |
| 46 | +* Simplifies rollout and works for current identified use cases |
| 47 | +* Keeps changes limited to the Static drift controller logic only which is quite simple. |
| 48 | +* Avoids the complication of the standard NodePool drift controller which doesn't always need to spin up a replacement as existing nodes in the cluster may be able to accomodate pods already. |
| 49 | + |
| 50 | +Cons: |
| 51 | +* There are likely use-cases that could benefit from this behaviour without using static NodePools, such as dynamic NodePools with shallow reserved pools. In these cases, it may be undesriable to go over the 'max' node count of the reserved pool solely for drift resolutions |
| 52 | +
|
| 53 | +Option 2: Implement for all NodePool types (Recommended) |
| 54 | +
|
| 55 | +Pros: |
| 56 | +* Works for all cases that prefer avoiding spinning up node replacements |
| 57 | +* Based on a quick read of the code, this should be a relatively easy change for all types |
| 58 | +
|
| 59 | +Cons: |
| 60 | +* Higher risk exposure of feature rollout, especially on standard NodePools. However, we should be able to mitigate with proper feature flagging |
| 61 | +
|
| 62 | +** Implementation notes ** |
| 63 | +
|
| 64 | +* For static drift, this should be a relatively easy check [here](https://github.com/kubernetes-sigs/karpenter/blob/9136011b9a9c1840cd0e844e2eee57716e757f15/pkg/controllers/disruption/staticdrift.go#L94-L96) to not add replacement NodeClaims in the case of the DoNotCreateReplacement strategy |
| 65 | +* For standard NodePool drift, the change is also likely relatively easy where we can skip the [scheduling simulation and replacement check](https://github.com/kubernetes-sigs/karpenter/blob/9136011b9a9c1840cd0e844e2eee57716e757f15/pkg/controllers/disruption/drift.go#L81) entirely if DoNotCreateReplacement strategy is used |
0 commit comments