Skip to content

Fix device-plugin-test flaky test#138289

Open
esotsal wants to merge 1 commit intokubernetes:masterfrom
esotsal:fix-device-plugin-test-flaky
Open

Fix device-plugin-test flaky test#138289
esotsal wants to merge 1 commit intokubernetes:masterfrom
esotsal:fix-device-plugin-test-flaky

Conversation

@esotsal
Copy link
Copy Markdown
Contributor

@esotsal esotsal commented Apr 9, 2026

What type of PR is this?

/kind bug
/kind failing-test

What this PR does / why we need it:

If assignedStr and expectedStr is empty this test fails.

This commit tries to fix this using assignedStr and expectedStr instead of assigned and expected.

Which issue(s) this PR is related to:

N/A

Special notes for your reviewer:

A failed test demonstrating the problem can be found here

STEP: Verifying the device assignment after kubelet restart using podresources API - k8s.io/kubernetes/test/e2e_node/device_plugin_test.go:989 @ 04/09/26 09:16:30.057
I0409 09:16:30.059739    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:32.062366    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:34.064608    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:36.066967    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:38.069218    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:40.071822    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:42.075605    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:44.078898    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:46.082287    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:48.085617    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:50.088669    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:52.090570    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:54.092691    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:56.096538    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
I0409 09:16:58.099280    1155 device_plugin_test.go:1106] device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423: devices expected "" assigned ""
[FAILED] Timed out after 30.000s.
device-plugin-errors-4242/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423/device-plugin-test-757aaa59-ef30-4304-a015-af221203c423 failed admission, so it must not appear in podresources list

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 9, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Please note that we're already in Code Freeze for the upcoming v1.36.0 release.

Adding the milestone to this PR is strictly prohibited without proper approval. If this PR needs to be included in the v1.36.0 release:

  1. Technical review: get the PR reviewed and approved as usual (/lgtm and /approve)
  2. Inclusion in release: ping @sig-release-leads on the #sig-release Slack channel and suggest to add the v1.36.0 milestone to the PR

We're also in Test Freeze for the release-1.36 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.36.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Thu Apr 9 11:32:12 UTC 2026.

@k8s-ci-robot k8s-ci-robot added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 9, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Apr 9, 2026
@k8s-ci-robot k8s-ci-robot removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 9, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: esotsal
Once this PR has been reviewed and has the lgtm label, please assign endocrimes for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from ffromani and matthyx April 9, 2026 12:02
@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Apr 9, 2026

/cc @kannon92 @pohly

@k8s-ci-robot k8s-ci-robot requested review from kannon92 and pohly April 9, 2026 12:06
@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Apr 9, 2026

/test pull-kubernetes-node-kubelet-serial-device-manager

Copy link
Copy Markdown
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I'm not sure I got the intent correctly though, question inline.

assignedStr := strings.Join(assigned.UnsortedList(), ",")
framework.Logf("%s: devices expected %q assigned %q", ident, expectedStr, assignedStr)
if !assigned.Equal(expected) {
if assignedStr != expectedStr {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the strings are built from unsorted list (arguably a UX mistake) so they can be syntactically different while expressing the same set. OTOH, Equal() previously called was supposed to handle compare two sets semantically.
I'm not sure this is the right fix.

Copy link
Copy Markdown
Contributor Author

@esotsal esotsal Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, checking the golang library used, the calls will used the following

image

I don't get it why the assignedStr and expectedStr printed is empty but the Equal fails? Will continue digging into this. I see in other device-plugin tests that a clean up is done before the tests perhaps this might be worth investigating.

Copy link
Copy Markdown
Contributor

@ffromani ffromani Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common gotcha we encountered is when one object is nil and the other is empty but non-nil. The string representation is the same ("") but the Equality check fails, depending on the implementation. I didn't check if this is the case here.
It it is, the fix is to make sure we always compare non-nil objects (but possibly empty objects)

Copy link
Copy Markdown
Contributor Author

@esotsal esotsal Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining.

@esotsal esotsal force-pushed the fix-device-plugin-test-flaky branch 2 times, most recently from 9102ae9 to 7be1f6d Compare April 9, 2026 13:27
@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Apr 9, 2026

/test pull-kubernetes-node-kubelet-serial-device-manager

assignedStr := strings.Join(assigned.UnsortedList(), ",")
framework.Logf("%s: devices expected %q assigned %q", ident, expectedStr, assignedStr)
if !assigned.Equal(expected) {
if !expected.Equal(assigned) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a == b, then certainly b == a? What is the type that is being used here?

@esotsal esotsal force-pushed the fix-device-plugin-test-flaky branch from 7be1f6d to 6c23520 Compare April 10, 2026 04:46
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@esotsal: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-kubelet-serial-device-manager 7be1f6d link false /test pull-kubernetes-node-kubelet-serial-device-manager
pull-kubernetes-node-kubelet-serial-containerd 6c23520 link false /test pull-kubernetes-node-kubelet-serial-containerd

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

If assignedStr and expectedStr is empty this test
fails. This commit tries to fix this
@esotsal esotsal force-pushed the fix-device-plugin-test-flaky branch from 6c23520 to f96b34d Compare April 10, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Development

Successfully merging this pull request may close these issues.

4 participants