Skip to content

pkg/lightning: fix TestRegionJobBaseWorker waitgroup race#66703

Merged
ti-chi-bot[bot] merged 2 commits intopingcap:masterfrom
D3Hunter:codex/fix-issue-66702-regression
Mar 5, 2026
Merged

pkg/lightning: fix TestRegionJobBaseWorker waitgroup race#66703
ti-chi-bot[bot] merged 2 commits intopingcap:masterfrom
D3Hunter:codex/fix-issue-66702-regression

Conversation

@D3Hunter
Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter commented Mar 5, 2026

What problem does this PR solve?

Issue Number: close #66702

Problem Summary:

TestRegionJobBaseWorker/if_the_region_has_no_leader,_rescan_the_region is flaky in nextgen CI. In test helper prepareAndExecute, jobInCh <- job happened before jobWg.Add(1), so worker-side Done could race ahead and trigger sync: negative WaitGroup counter.

What changed and how does it work?

  • Reordered test helper accounting to call jobWg.Add(1) before jobInCh <- job.
  • This guarantees waitgroup accounting is established before workers can decrement it.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Tests
    • Improved job worker test to tighten synchronization around enqueue timing, reducing flakiness for timing-sensitive scenarios and ensuring more reliable verification of job enqueueing and processing behavior.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 5, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Mar 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

Moved the WaitGroup increment to occur before enqueuing the job and added an optional after-send callback to the test helper to control timing, preventing a race that could cause a negative WaitGroup counter in the job worker tests.

Changes

Cohort / File(s) Summary
Job Worker Test
pkg/lightning/backend/local/job_worker_test.go
Moved jobWg.Add(1) to execute before jobInCh <- job; added an afterSendFn callback parameter to prepareAndExecute to allow deferred actions after sending; updated imports and test usage accordingly to avoid WaitGroup race that could lead to negative counter.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I am a rabbit in the CI night,
I nudge the counter to set it right,
Send then wait, no panic in sight,
Tests now hop steady, calm, and light. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main fix: reordering WaitGroup operations to eliminate a race condition in the TestRegionJobBaseWorker test.
Linked Issues check ✅ Passed The PR directly addresses the root cause identified in #66702 by reordering jobWg.Add(1) before jobInCh <- job, preventing the negative WaitGroup counter panic described in the issue.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the identified race condition in TestRegionJobBaseWorker; no unrelated modifications are present in the test file changes.
Description check ✅ Passed The pull request description follows the required template with complete sections: issue number, problem summary, what changed, test checklist, and release notes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 5, 2026
@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 5, 2026

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter D3Hunter marked this pull request as ready for review March 5, 2026 07:08
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 5, 2026
@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 5, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/lightning/backend/local/job_worker_test.go`:
- Around line 164-171: The test uses time.Sleep in the t.Run case to wait for
worker startup; replace that with a channel handshake to make startup
deterministic: create a chan struct{} in the test, pass a preRunFn to
prepareAndExecute that closes/signals the channel when the worker is ready, and
before enqueuing/waiting on the job block until that channel is signaled; update
references to prepareAndExecute and the preRunFn callback (and the test case
name "track waitgroup before enqueue to avoid negative counter") so the
worker-first scheduling is explicit and no sleep is used.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 67f83917-17ad-4d23-808b-c01597fdfea1

📥 Commits

Reviewing files that changed from the base of the PR and between c909831 and 6549f6f.

📒 Files selected for processing (1)
  • pkg/lightning/backend/local/job_worker_test.go

@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Mar 5, 2026

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 5, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.2657%. Comparing base (c909831) to head (f4b1b1e).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #66703        +/-   ##
================================================
+ Coverage   77.6902%   78.2657%   +0.5755%     
================================================
  Files          2008       1938        -70     
  Lines        549382     536987     -12395     
================================================
- Hits         426816     420277      -6539     
+ Misses       120901     116275      -4626     
+ Partials       1665        435      -1230     
Flag Coverage Δ
integration 44.0730% <ø> (-4.1172%) ⬇️
unit 76.6859% <ø> (+0.3404%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 48.8279% <ø> (-12.0552%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

D3Hunter commented Mar 5, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 5, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: GMHDBJD, joechenrh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 5, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 5, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-05 07:14:49.436589407 +0000 UTC m=+428734.014668591: ☑️ agreed by GMHDBJD.
  • 2026-03-05 08:40:05.602090826 +0000 UTC m=+433850.180170020: ☑️ agreed by joechenrh.

Copy link
Copy Markdown

@pantheon-ai pantheon-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Code looks good. No issues found.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

D3Hunter commented Mar 5, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 5, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

D3Hunter commented Mar 5, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 5, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot merged commit 4d4043c into pingcap:master Mar 5, 2026
31 of 32 checks passed
@D3Hunter D3Hunter deleted the codex/fix-issue-66702-regression branch March 5, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lightning: TestRegionJobBaseWorker is flaky in nextgen CI (sync: negative WaitGroup counter)

3 participants