Fix the equivalence.PodGroup's mutation during scale up simulations for skipped node groups#9827
Conversation
|
This issue is currently awaiting triage. If SIG Autoscaling contributors determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi @shaikenov. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shaikenov The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
…ions. In kubernetes#9346 there is a bug which allows the mutation of equivalence pod group (eg) state to become Schedulable if there is a skipped node group which satisfies the pod predicates. Later down the line this eg is considered schedulable which is not true. The fix is to do the extra SchedulablePodGroups() simulations on the cloned egs.
41d3b51 to
ac5472f
Compare
| SchedulableGroups: clonedGroups, | ||
| Schedulable: eg.Schedulable, | ||
| } | ||
| } |
There was a problem hiding this comment.
Can you add a unit test for this func? To catch any cases that the orchestrator_test.go change will miss
| SchedulingErrors: clonedErrors, | ||
| SchedulableGroups: clonedGroups, | ||
| Schedulable: eg.Schedulable, | ||
| } |
There was a problem hiding this comment.
what if a new field is added to PodGroup struct in the future? A future author may not realize they also have to add the field here and it will get dropped. Maybe you can start with a shallow copy of the struct and then overwrite the SchedulingErrors and SchedulableGroups?
There was a problem hiding this comment.
@rrangith I think that it might actually be more unsafe than forgetting to add the field here. If someone forgets to add a new field here, that ideally should fail fast and get caught by the tests, even if it slips through the UTs, it will be in general way more obvious where the problem is. With a shallow copy we might end up in the exact same situation we're fixing in this PR.
| podsAwaitEvaluation := []string{} | ||
| for _, pod := range scaleUpStatus.PodsAwaitEvaluation { | ||
| podsAwaitEvaluation = append(podsAwaitEvaluation, pod.Name) | ||
| } | ||
| // This assertion ensures that the skipped node group simulation did not mutate the Schedulable status of the original equivalence groups. | ||
| // Without cloning the "partial scale-up successful" case would fail as p1 would be marked as schedulable on ng1 and would be added to podsAwaitEvaluation. | ||
| assert.Empty(t, podsAwaitEvaluation) |
There was a problem hiding this comment.
correct me if I'm wrong, but isn't this just checking assert.Empty(t, scaleUpStatus.PodsAwaitEvaluation?
What type of PR is this?
/kind bug
What this PR does / why we need it:
In #9346 there is a bug which allows the mutation of equivalence pod group (
eg) state to become Schedulable if there is a skipped node group which satisfies the pod predicates. Later down the line thisegis considered schedulable which is not true.The fix is to do the extra
SchedulablePodGroups()simulations on the clonedeg(s).Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: