pkg/lightning: fix TestRegionJobBaseWorker waitgroup race by D3Hunter · Pull Request #66703 · pingcap/tidb

D3Hunter · 2026-03-05T07:05:19Z

What problem does this PR solve?

Issue Number: close #66702

Problem Summary:

TestRegionJobBaseWorker/if_the_region_has_no_leader,_rescan_the_region is flaky in nextgen CI. In test helper prepareAndExecute, jobInCh <- job happened before jobWg.Add(1), so worker-side Done could race ahead and trigger sync: negative WaitGroup counter.

What changed and how does it work?

Reordered test helper accounting to call jobWg.Add(1) before jobInCh <- job.
This guarantees waitgroup accounting is established before workers can decrement it.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

Tests
- Improved job worker test to tighten synchronization around enqueue timing, reducing flakiness for timing-sensitive scenarios and ensuring more reliable verification of job enqueueing and processing behavior.

ti-chi-bot · 2026-03-05T07:05:21Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-03-05T07:05:28Z

📝 Walkthrough

Walkthrough

Moved the WaitGroup increment to occur before enqueuing the job and added an optional after-send callback to the test helper to control timing, preventing a race that could cause a negative WaitGroup counter in the job worker tests.

Changes

Cohort / File(s)	Summary
Job Worker Test `pkg/lightning/backend/local/job_worker_test.go`	Moved `jobWg.Add(1)` to execute before `jobInCh <- job`; added an `afterSendFn` callback parameter to `prepareAndExecute` to allow deferred actions after sending; updated imports and test usage accordingly to avoid WaitGroup race that could lead to negative counter.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I am a rabbit in the CI night,
I nudge the counter to set it right,
Send then wait, no panic in sight,
Tests now hop steady, calm, and light. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the main fix: reordering WaitGroup operations to eliminate a race condition in the TestRegionJobBaseWorker test.
Linked Issues check	✅ Passed	The PR directly addresses the root cause identified in `#66702` by reordering jobWg.Add(1) before jobInCh <- job, preventing the negative WaitGroup counter panic described in the issue.
Out of Scope Changes check	✅ Passed	All changes are scoped to fixing the identified race condition in TestRegionJobBaseWorker; no unrelated modifications are present in the test file changes.
Description check	✅ Passed	The pull request description follows the required template with complete sections: issue number, problem summary, what changed, test checklist, and release notes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tiprow · 2026-03-05T07:05:42Z

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pkg/lightning/backend/local/job_worker_test.go

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/lightning/backend/local/job_worker_test.go`:
- Around line 164-171: The test uses time.Sleep in the t.Run case to wait for
worker startup; replace that with a channel handshake to make startup
deterministic: create a chan struct{} in the test, pass a preRunFn to
prepareAndExecute that closes/signals the channel when the worker is ready, and
before enqueuing/waiting on the job block until that channel is signaled; update
references to prepareAndExecute and the preRunFn callback (and the test case
name "track waitgroup before enqueue to avoid negative counter") so the
worker-first scheduling is explicit and no sleep is used.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 67f83917-17ad-4d23-808b-c01597fdfea1

📥 Commits

Reviewing files that changed from the base of the PR and between c909831 and 6549f6f.

📒 Files selected for processing (1)

pkg/lightning/backend/local/job_worker_test.go

pkg/lightning/backend/local/job_worker_test.go

pantheon-ai · 2026-03-05T07:29:21Z

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

_{ℹ️ Learn more details on Pantheon AI.}

codecov · 2026-03-05T07:52:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.2657%. Comparing base (c909831) to head (f4b1b1e).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #66703        +/-   ##
================================================
+ Coverage   77.6902%   78.2657%   +0.5755%     
================================================
  Files          2008       1938        -70     
  Lines        549382     536987     -12395     
================================================
- Hits         426816     420277      -6539     
+ Misses       120901     116275      -4626     
+ Partials       1665        435      -1230

Flag	Coverage Δ
integration	`44.0730% <ø> (-4.1172%)`	⬇️
unit	`76.6859% <ø> (+0.3404%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`56.7974% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`48.8279% <ø> (-12.0552%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

D3Hunter · 2026-03-05T07:57:45Z

/retest

tiprow · 2026-03-05T07:58:08Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2026-03-05T08:40:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: GMHDBJD, joechenrh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/lightning/OWNERS~~ [GMHDBJD]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-03-05T08:40:07Z

[LGTM Timeline notifier]

Timeline:

2026-03-05 07:14:49.436589407 +0000 UTC m=+428734.014668591: ☑️ agreed by GMHDBJD.
2026-03-05 08:40:05.602090826 +0000 UTC m=+433850.180170020: ☑️ agreed by joechenrh.

pantheon-ai

✅ Code looks good. No issues found.

D3Hunter · 2026-03-05T09:52:05Z

/retest

tiprow · 2026-03-05T09:52:28Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-03-05T11:42:22Z

/retest

tiprow · 2026-03-05T11:42:46Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

lightning: fix TestRegionJobBaseWorker waitgroup race

6549f6f

ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Mar 5, 2026

ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 5, 2026

D3Hunter marked this pull request as ready for review March 5, 2026 07:08

ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 5, 2026

GMHDBJD approved these changes Mar 5, 2026

View reviewed changes

ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 5, 2026

joechenrh reviewed Mar 5, 2026

View reviewed changes

pkg/lightning/backend/local/job_worker_test.go Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

pkg/lightning/backend/local/job_worker_test.go Outdated Show resolved Hide resolved

lightning: keep minimal waitgroup race fix

f4b1b1e

ti-chi-bot bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 5, 2026

joechenrh approved these changes Mar 5, 2026

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 5, 2026

pantheon-ai bot reviewed Mar 5, 2026

View reviewed changes

ti-chi-bot bot merged commit 4d4043c into pingcap:master Mar 5, 2026
31 of 32 checks passed

D3Hunter deleted the codex/fix-issue-66702-regression branch March 5, 2026 13:39

Conversation

D3Hunter commented Mar 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot bot commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

tiprow bot commented Mar 5, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pantheon-ai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

D3Hunter commented Mar 5, 2026

Uh oh!

tiprow bot commented Mar 5, 2026

Uh oh!

ti-chi-bot bot commented Mar 5, 2026

Uh oh!

ti-chi-bot bot commented Mar 5, 2026

[LGTM Timeline notifier]

Uh oh!

pantheon-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

D3Hunter commented Mar 5, 2026

Uh oh!

tiprow bot commented Mar 5, 2026

Uh oh!

D3Hunter commented Mar 5, 2026

Uh oh!

tiprow bot commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

D3Hunter commented Mar 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

pantheon-ai bot commented Mar 5, 2026 •

edited

Loading

codecov bot commented Mar 5, 2026 •

edited

Loading