Skip to content

[Resource quotas] Redistribute unclaimed capacity to similar node groups #9567

@norbertcyran

Description

@norbertcyran

/area cluster-autoscaler

#9494 discovered and fixed a bug in granular resource quotas, in which balancing across similar node groups didn't respect the resource quotas of those similar node groups. The fix was to cap the scale ups in the similar node groups to their corresponding quotas after the balancing.

As discussed in #9494 (comment), that leads to suboptimal results. Example scenario: we have CapacityQuotas set to 3 nodes per each zone, and CA grabs unschedulable pods that need 9 new nodes. Theoretically, it can be satisfied within one scale up loop, but applyLimits will limit the node count to 3. If I'm not mistaken, if node groups' max sizes were used instead of capacity quotas, each node group would get 3 new nodes. Similarly, if zone a has 5 nodes remaining in the quota, and zones b and c have 1 remaining node, the current scale up logic will:

  • pick some node group as the best option (honestly I'm not sure which one, probably neither will have a better score than another)
  • if zone a is picked, scale up will be capped to 5 due to quotas
  • balancing will balance the scale up across the zones, so we will get something like (2, 2, 1)
  • scale up in zone b will be capped to 1 due to quotas, so the final scale up will be (2, 1, 1)
  • if zone b or c is picked instead in the 1st step, we get only 1 node in the scale up

We can see that the optimal scenario would be to claim all the remaining quota, and initiate a (5, 1, 1) scale up. This is how NodeGroup.MaxSize() logic works. We should probably throw away applyLimits, and handle quotas similarly as we handle node groups' max size:

Metadata

Metadata

Labels

area/cluster-autoscalerIssues or PRs related to the Cluster Autoscaler componentkind/featureCategorizes issue or PR as related to a new feature.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions