test/integration: synchronize scheduler shutdown to fix metrics data race#138312
test/integration: synchronize scheduler shutdown to fix metrics data race#138312mm4tt wants to merge 1 commit intokubernetes:masterfrom
Conversation
…race Wait for the scheduler goroutine to exit before resetting global metrics. Previously, metrics updates from a still-running scheduler could race with legacyregistry.Reset() at the end of a performance test workload. This change refactors StartScheduler into StartSchedulerWithDone to provide a synchronization channel that is closed when the scheduler actually stops, ensuring a clean teardown before registry cleanup. Fixes kubernetes#137328
|
Keywords which can automatically close issues and hashtag(#) mentions are not allowed in commit messages. The list of commits with invalid commit messages:
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Please note that we're already in Code Freeze for the upcoming v1.36.0 release. Adding the milestone to this PR is strictly prohibited without proper approval. If this PR needs to be included in the v1.36.0 release:
We're also in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Fri Apr 10 05:33:33 UTC 2026. |
|
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mm4tt The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/assign @macsko |
|
/test pull-kubernetes-scheduler-perf |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
What type of PR is this?
/kind failing-test
/kind flake
What this PR does / why we need it:
This PR fixes a data race detected in TestSchedulerPerf (e.g., during preemption scenarios as reported in #137328).
The race occurred because legacyregistry.Reset() was being called at the end of a workload while the scheduler goroutine (launched via sched.Run(tCtx)) was still active. Even after workload evaluation is done, the scheduler could still perform a few more operations like updating metrics. If these updates happened concurrently with Reset(), it triggered the race detector.
Which issue(s) this PR is related to:
Fixes #137328.
Special notes for your reviewer:
Key changes:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: