Skip to content

lightning: TestRegionJobBaseWorker is flaky in nextgen CI (sync: negative WaitGroup counter) #66702

@D3Hunter

Description

@D3Hunter

Bug Report

1. Minimal reproduce step (Required)

Observed in CI run:

Failure happens in:

  • //pkg/lightning/backend/local:local_test (shard 27 of 50)
  • TestRegionJobBaseWorker/if_the_region_has_no_leader,_rescan_the_region

The first attempt failed with panic:

  • sync: negative WaitGroup counter
  • stack includes:
    • pkg/lightning/backend/local/job_worker.go:95 (mockJobWgDone failpoint path)
    • pkg/lightning/backend/local/region_job.go:312 (regionJob.done -> wg.Done)

Same shard then passed on retry and Bazel marked it FLAKY.

Root-cause analysis from code path:

  • In test helper prepareAndExecute:
    • jobInCh <- job is executed before jobWg.Add(1).
  • In the no-leader subtest, failpoint mockJobWgDone is set to return(3).
  • Worker can process job immediately and execute w.jobWg.Add(-3) before producer goroutine executes jobWg.Add(1).
  • If this interleaving happens, WaitGroup counter goes negative and panics.

Relevant code:

  • pkg/lightning/backend/local/job_worker_test.go:134-135
  • pkg/lightning/backend/local/job_worker.go:93-96
  • pkg/lightning/backend/local/region_job.go:312

2. What did you expect to see? (Required)

TestRegionJobBaseWorker should be deterministic and should not panic with sync: negative WaitGroup counter.

3. What did you see instead (Required)

A flaky panic in CI:

  • First attempt fails with sync: negative WaitGroup counter.
  • Retry passes, and test target is reported as FLAKY.

4. What is your TiDB version? (Required)

N/A for SQL runtime (this is a unit-test-only failure in CI).

Test context:

  • TiDB repo master nextgen unit test pipeline
  • CI run pull_unit_test_next_gen #11975 on 2026-03-05

Potential fix direction:

  • In prepareAndExecute, move jobWg.Add(1) before jobInCh <- job to avoid race between producer accounting and worker-side decrement/failpoint behavior.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions