Bug Report
1. Minimal reproduce step (Required)
Observed in CI run:
Failure happens in:
//pkg/lightning/backend/local:local_test (shard 27 of 50)
TestRegionJobBaseWorker/if_the_region_has_no_leader,_rescan_the_region
The first attempt failed with panic:
sync: negative WaitGroup counter
- stack includes:
pkg/lightning/backend/local/job_worker.go:95 (mockJobWgDone failpoint path)
pkg/lightning/backend/local/region_job.go:312 (regionJob.done -> wg.Done)
Same shard then passed on retry and Bazel marked it FLAKY.
Root-cause analysis from code path:
- In test helper
prepareAndExecute:
jobInCh <- job is executed before jobWg.Add(1).
- In the no-leader subtest, failpoint
mockJobWgDone is set to return(3).
- Worker can process job immediately and execute
w.jobWg.Add(-3) before producer goroutine executes jobWg.Add(1).
- If this interleaving happens, WaitGroup counter goes negative and panics.
Relevant code:
pkg/lightning/backend/local/job_worker_test.go:134-135
pkg/lightning/backend/local/job_worker.go:93-96
pkg/lightning/backend/local/region_job.go:312
2. What did you expect to see? (Required)
TestRegionJobBaseWorker should be deterministic and should not panic with sync: negative WaitGroup counter.
3. What did you see instead (Required)
A flaky panic in CI:
- First attempt fails with
sync: negative WaitGroup counter.
- Retry passes, and test target is reported as
FLAKY.
4. What is your TiDB version? (Required)
N/A for SQL runtime (this is a unit-test-only failure in CI).
Test context:
- TiDB repo
master nextgen unit test pipeline
- CI run
pull_unit_test_next_gen #11975 on 2026-03-05
Potential fix direction:
- In
prepareAndExecute, move jobWg.Add(1) before jobInCh <- job to avoid race between producer accounting and worker-side decrement/failpoint behavior.
Bug Report
1. Minimal reproduce step (Required)
Observed in CI run:
pull_unit_test_next_gen#11975(2026-03-05)Failure happens in:
//pkg/lightning/backend/local:local_test (shard 27 of 50)TestRegionJobBaseWorker/if_the_region_has_no_leader,_rescan_the_regionThe first attempt failed with panic:
sync: negative WaitGroup counterpkg/lightning/backend/local/job_worker.go:95(mockJobWgDonefailpoint path)pkg/lightning/backend/local/region_job.go:312(regionJob.done -> wg.Done)Same shard then passed on retry and Bazel marked it
FLAKY.Root-cause analysis from code path:
prepareAndExecute:jobInCh <- jobis executed beforejobWg.Add(1).mockJobWgDoneis set toreturn(3).w.jobWg.Add(-3)before producer goroutine executesjobWg.Add(1).Relevant code:
pkg/lightning/backend/local/job_worker_test.go:134-135pkg/lightning/backend/local/job_worker.go:93-96pkg/lightning/backend/local/region_job.go:3122. What did you expect to see? (Required)
TestRegionJobBaseWorkershould be deterministic and should not panic withsync: negative WaitGroup counter.3. What did you see instead (Required)
A flaky panic in CI:
sync: negative WaitGroup counter.FLAKY.4. What is your TiDB version? (Required)
N/A for SQL runtime (this is a unit-test-only failure in CI).
Test context:
masternextgen unit test pipelinepull_unit_test_next_gen #11975on 2026-03-05Potential fix direction:
prepareAndExecute, movejobWg.Add(1)beforejobInCh <- jobto avoid race between producer accounting and worker-side decrement/failpoint behavior.