Skip to content

AEP 9726 Capacity-Aware In-Place Updates#9757

Draft
omerap12 wants to merge 1 commit into
kubernetes:masterfrom
omerap12:inplace-capacity-aware-aep
Draft

AEP 9726 Capacity-Aware In-Place Updates#9757
omerap12 wants to merge 1 commit into
kubernetes:masterfrom
omerap12:inplace-capacity-aware-aep

Conversation

@omerap12

@omerap12 omerap12 commented Jun 5, 2026

Copy link
Copy Markdown
Member

What type of PR is this?

/kind documentation

What this PR does / why we need it:

AEP for #9726

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Signed-off-by: Omer Aplatony <omerap12@gmail.com>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/documentation Categorizes issue or PR as related to documentation. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 5, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If SIG Autoscaling contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added area/vertical-pod-autoscaler Issues or PRs related to the Vertical Pod Autoscaler component do-not-merge/needs-area Indicates that a PR should not merge because it lacks an area label. labels Jun 5, 2026
@omerap12

omerap12 commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

/cc @adrianmoisey @maxcao13

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: omerap12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area Indicates that a PR should not merge because it lacks an area label. label Jun 5, 2026
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2026
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 5, 2026
```
The function then evaluates whether the update mode is InPlace, the feature gate is enabled, and the node has sufficient allocatable capacity for the recommendation:
```go
if updateMode == vpa_types.UpdateModeInPlace && node != nil && features.Enabled(features.InPlaceCapacityAware) && !checkAllocatableNodeForInPlace(pod, recommendation, node.Status.Allocatable) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node.Status.Allocatable contains the allocatable resources, not the available resources, so this value isn't sufficient to determine how much space is available on the node

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable

'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but if pods are scheduled on the node, that value doesn't change, so this value is only useful for nodes that don't have other pods (with requests defined) scheduled on them

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original issue says:

Currently, the updater is unaware of the capacity constraints of the node on which a pod is running. As a result, it may attempt an in-place resize without verifying whether the node has sufficient available resources.

I'm arguing what "sufficient available resources" means.

node.Status.Allocatable is what is available for pods, after Kubernetes and the system have taken their share.
Normally this value is close-ish to the total size of the node.

Ie, on my 20GB RAM and 4 CPU VM, I have these:

Capacity:
  cpu:                4
  memory:             20484252Ki
Allocatable:
  cpu:                3920m
  memory:             17445020Ki

GKE has taken away 80 Millicores and 2968MiB from my node for various tasks.
Meaning that the maximum sized pod I can have is 3920 Millicores and 17036MiB.

If I schedule some pods on this node, the "node.Status.Allocatable" value doesn't change, but my available resources decreases (which isn't a field stored on the node).

If you run a kubectl describe on a node, you see 3 sets of values: capacity, allocatable and allocated.
I had assume that the plan was to check if there was "available" space (allocatable minus allocated).

This AEP seems to be doing its calculation on the allocatable value (which is what I think the PodReasonInfeasible status comes from).

The problem with this, if we cap the pod's resize to the allocatable value, when we'll stop getting PodReasonInfeasible from API server (which is good), but will likely start getting PodReasonDeferred. And I assume that if a single pod is scheduled on the node (that has resources allocated) the likelihood of this deferred situation ever resolving is very low, especially if the node had a DaemonSet pod schedule on it (which I assume is a common use case).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user I want the VPA to resize that pods to fill up the available space, so that my workloads are protected.

That's totally makes sense.

I say let's bring this topic to sig-node - maybe there is another solution here that we are missing. if there is none we can then debate what is the best way forward (which is using a cluster wide pod informer vs direct API calls to the Kube api server ). does that work for you?
@maxcao13

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say let's bring this topic to sig-node - maybe there is another solution here that we are missing.

Agreed, or if there isn't a solution currently, may be k/k can provide one? (ie: a status field on the pod with details about what size it could be?)

if there is none we can then debate what is the best way forward (which is using a cluster wide pod informer vs direct API calls to the Kube api server ). does that work for you?

I think we should take a step back and debate if this is even a problem needing to be solved, before we debate on what the solution is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, or if there isn't a solution currently, may be k/k can provide one? (ie: a status field on the pod with details about what size it could be?

Yeah. I agree

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with Adrian is saying here, with that this may be a premature optimization. IMO I'd like to avoid making changes or giving more work to people (sig-node) without a clear benefit or a PoC showing the effects of the mitigation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. As Adrian mentioned, it makes sense for the VPA to fill available resource "holes" on a node. For example, if a node has 1 CPU available and the pod recommendation is 2 CPUs, and the VPA is configured with updatePolicy: InPlace, the updater could attempt an in-place resize to 1 CPU instead of simply marking the recommendation as infeasible and leaving that available capacity unused.

  2. We could also reduce unnecessary API calls. Currently, we only discover that a resize attempt is infeasible by checking the status or receiving an admission error. Wouldn't it make sense for the updater to determine ahead of time that the resize would be infeasible and skip the attempt altogether (or cap it as mentioned above)?

Again, I'm not convinced these features should be implemented. This PR is also intended to start a discussion around whether they make sense and are desirable.

Comment on lines +176 to +178
- Tests are stable for 3 releases.
- No open bugs against the feature gate.
- Positive user feedback.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description here says that this section needs to describe graduation from alpha to beta and then to GA.
The plan isn't clear from these bullet points

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I should have opened this as a draft this is just a quick AEP for discussion.

Comment on lines +80 to +81
1. Reduce CPU cycles spent by the updater on infeasible resize attempts.
2. Reduce API server load incurred by admission checks for infeasible in-place resize requests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal here is to reduce CPU cycles and API load, I think we need to have a measured before and after to ensure that we're achieving this goal

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I should have opened this as a draft since this is just a PR mainly for discussion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand, this is a discussion about the AEP

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what I meant that is just a draft.

@omerap12 omerap12 marked this pull request as draft June 5, 2026 22:41
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler Issues or PRs related to the Vertical Pod Autoscaler component cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/documentation Categorizes issue or PR as related to documentation. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants