Skip to content

Feature Request: Configurable Drift Detection Cadence / Interval #2914

@royceshen

Description

@royceshen

Description

What problem are you trying to solve?

Currently, Karpenter's drift remediation triggers node replacement immediately upon detecting drift (e.g. AMI change, NodePool spec update). There is no way to control the cadence or interval at which detected drift is acted upon.

In production environments, this causes unplanned pod disruptions outside of maintenance windows whenever a new AMI is released or a NodePool spec changes. Our current workaround is to manually toggle disruption.budgets to nodes: "0" to block all disruption, then open it back up during maintenance windows — which is operationally cumbersome and error-prone.

What we want is a drift.interval or similar field on the NodePool disruption spec that defers drift remediation to a configured cadence, while allowing drift detection to continue running as normal. For example:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general
spec:
  disruption:
    budgets:
      - nodes: "20%"
    drift:
      interval: 24h   # only remediate drift once every 24 hours

This would allow teams to:

  • Contain AMI drift remediation to planned maintenance windows
  • Coordinate node replacements with application release schedules
  • Reduce blast radius of large-scale drift events in production clusters

Omitting the field should preserve existing behavior (immediate remediation) for full backwards compatibility.


How important is this feature to you?

High. We operate multiple production EKS clusters and uncontrolled drift remediation timing is one of the main friction points when managing node lifecycle at scale. Without this, we are forced into manual operational toil every time an AMI update rolls out.


Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.needs-prioritytriage/needs-informationIndicates an issue needs more information in order to work on it.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions