lightning: restore two-level split/scatter while keeping limiter behavior | tidb-test=release-8.1.2 by OliverS929 · Pull Request #66312 · pingcap/tidb

OliverS929 · 2026-02-19T14:00:34Z

What problem does this PR solve?

Issue Number: ref #66311

Problem Summary:

#59609 introduced two-level split/scatter (coarse + fine) to reduce region concentration during large import workloads.
#62419 introduced split/ingest rate limiting but removed the coarse split/scatter stage.
With large split key sets, this can increase split concentration and skew region distribution.

What changed and how does it work?

Restore two-level split/scatter in pkg/lightning/backend/local/localhelper.go:
- run coarse split/scatter first when len(splitKeys) > 100
- then run fine-grained split/scatter on all split keys
Keep #62419 limiter behavior and apply it to both levels.
- both coarse and fine stages share the same limiter path
- batch cap by burst is still preserved
Add/extend unit tests in pkg/lightning/backend/local/localhelper_test.go:
- large key set triggers coarse + fine
- small key set remains fine-only
- coarse-stage failure returns immediately
- limiter is still enforced after two-level restoration

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Lightning: restore two-level split/scatter before ingest and keep rate limiting for both coarse and fine stages.

regression tests.

tiprow · 2026-02-19T14:00:54Z

Hi @OliverS929. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

OliverS929 · 2026-02-19T14:28:27Z

/test mysql-test

tiprow · 2026-02-19T14:28:51Z

@OliverS929: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test mysql-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2026-02-19T14:36:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (feature/release-8.1-gsort-test@93fa53f). Learn more about missing BASE report.

Additional details and impacted files

@@                         Coverage Diff                         @@
##             feature/release-8.1-gsort-test     #66312   +/-   ##
===================================================================
  Coverage                                  ?   71.0029%           
===================================================================
  Files                                     ?       1479           
  Lines                                     ?     427509           
  Branches                                  ?          0           
===================================================================
  Hits                                      ?     303544           
  Misses                                    ?     103338           
  Partials                                  ?      20627

Flag	Coverage Δ
unit	`71.0029% <100.0000%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`52.9656% <0.0000%> (?)`
parser	`∅ <0.0000%> (?)`
br	`41.5992% <0.0000%> (?)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pkg/lightning/backend/local/localhelper.go

D3Hunter · 2026-02-24T02:20:44Z

@pantheon-ai review

lance6716 · 2026-02-24T02:33:18Z

@pantheon-ai review

this one https://github.com/apps/pantheon-ai

D3Hunter · 2026-02-24T03:10:06Z

pkg/lightning/backend/local/localhelper.go

 		limiter = rate.NewLimiter(rate.Limit(eventLimit), burstPerSec*ratePerSecMultiplier)
 		batchCnt = min(batchCnt, burstPerSec)
 	}
+	if len(splitKeys) > 100 {


maybe smaller, such as 64, in nextgen, each region is about 1g, our current subtask size is around 100g, so nearly 100 regions, and it might not trigger this

also make it constant

I changed it to a named threshold constant and lowered to 64. PTAL.

D3Hunter · 2026-02-24T03:11:10Z

pkg/lightning/backend/local/localhelper_test.go

+type splitAndScatterCounterClient struct {
+	split.SplitClient
+	splitCount atomic.Int32
+}


maybe generate mock impl for split.SplitClient to make adding test easier, see example in gen_mock of Makefile

I take the approach to remove one-off split client test structs and reused/extened existing testSplitClient with callback injection added. PTAL.

lance6716 · 2026-02-24T03:15:15Z

@pantheon-ai[bot] PTAL

-- update

Don't know how to trigger it. The [bot] part is name is not included 🤔 @wjhuang2016 Can you tell us?

D3Hunter · 2026-02-24T03:17:29Z

remember to pick to master branch which also doesn't have it now

D3Hunter · 2026-02-24T03:18:49Z

@pantheon-bot review this pr

pantheon-ai · 2026-02-24T03:18:53Z

Hi @D3Hunter,

You're not in the allow list for this repository. Only users in the allow list can trigger Pantheon here. Please contact the repo maintainer if you need access.

_{Learn more about Pantheon AI}

wjhuang2016 · 2026-02-24T03:23:55Z

@pantheon-bot review this pr

pantheon-ai · 2026-02-24T03:23:59Z

Hi @wjhuang2016,

You're not in the allow list for this repository. Only users in the allow list can trigger Pantheon here. Please contact the repo maintainer if you need access.

_{Learn more about Pantheon AI}

OliverS929 · 2026-02-24T03:36:50Z

remember to pick to master branch which also doesn't have it now

Yeah will do once this PR is merged.

8rosemary8 · 2026-02-24T03:44:06Z

@pantheon-bot review this pr

pantheon-ai · 2026-02-24T03:44:09Z

Hi @8rosemary8,

You're not in the allow list for this repository. Only users in the allow list can trigger Pantheon here. Please contact the repo maintainer if you need access.

_{Learn more about Pantheon AI}

D3Hunter · 2026-02-24T03:52:59Z

@pantheon-bot review

pantheon-ai · 2026-02-24T03:53:05Z

This PR restores Lightning’s two-level split/scatter (coarse + fine) in localhelper.go: coarse split/scatter runs first when len(splitKeys) > 100, followed by fine-grained split/scatter on all keys, to reduce region concentration for large imports. It preserves the split/ingest limiter behavior from #62419 and applies it consistently to both stages (including burst caps). Unit tests were added/extended to cover large vs small key sets, coarse-stage failure handling, and limiter enforcement after the restoration.

_{Learn more about Pantheon AI}

and avoid ad-hoc one-off stubs.

D3Hunter

rest lgtm

D3Hunter · 2026-02-24T05:59:08Z

pkg/lightning/backend/local/localhelper.go

 	"golang.org/x/time/rate"
 )

+const coarseGrainedSplitKeysThreshold = 64


maybe comment the reason why we choose it

D3Hunter · 2026-02-24T05:59:30Z

/cherry-pick master

ti-chi-bot · 2026-02-24T05:59:32Z

@D3Hunter: once the present PR merges, I will cherry-pick it on top of master in the new PR and assign it to you.

Details

In response to this:

/cherry-pick master

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

D3Hunter · 2026-02-24T06:29:01Z

@lance6716 ptal

pantheon-ai · 2026-02-24T07:11:03Z

pkg/lightning/backend/local/localhelper.go

+	return local.splitAndScatterRegionInBatchesWithLimiter(ctx, splitKeys, batchCnt, limiter)
+}
+
+func getCoarseGrainedSplitKeys(splitKeys [][]byte) [][]byte {


Corner Case Analysis: getCoarseGrainedSplitKeys

I've analyzed this function for corner cases and found one critical bug and several observations:

🔴 P0 - Critical: Empty Input Panic

Issue: When splitKeys is empty (nil or len==0), the function panics with index out of range:

// When len(splitKeys) == 0: sqrtCnt := int(math.Sqrt(0)) = 0 // Loop doesn't execute (i < 0 is false) // But then this executes: if i-sqrtCnt != len(splitKeys)-1 // 0 - 0 != -1 is true splitKeys[len(splitKeys)-1] // splitKeys[-1] → PANIC!

Test confirmation: I created and ran comprehensive tests that confirmed this panic.

Current protection: The caller only invokes this when len(splitKeys) > 100, so it's not reachable in production currently. However, this is fragile if:

The function is called directly elsewhere

Future refactoring changes the guard condition

Unit tests call it directly

Recommendation: Add defensive guard:

func getCoarseGrainedSplitKeys(splitKeys [][]byte) [][]byte { if len(splitKeys) == 0 { return nil } // ... rest of function }

✅ All Other Cases Work Correctly

Tested extensively with inputs ranging from n=1 to n=122:

✅ No hangs detected

✅ Last key always included exactly once

✅ No duplicate keys introduced

✅ Output maintains sorted order

📝 Minor: Test Comment Needs Update

In the test comment:

// 121 keys => coarse pass(11 keys, 1 batch)

Should be:

// 121 keys => coarse pass(12 keys, 1 batch)

The function returns 12 coarse keys for n=121, not 11 (indices: 0,11,22,...,110,120).

📊 Test Coverage

The existing test covers the "last key not duplicated" case (n=122) well. Consider adding:

Empty input test (currently would panic)

Case where last key must be appended (e.g., n=100, n=121)

I can provide the test code if needed.

In current call path, empty splitKeys cannot reach this helper because caller has len(splitKeys) > coarseGrainedSplitKeysThreshold guard. But I can add an explicit len==0 guard here improves helper robustness and future-proofing, if that's a must for other reviewers.

OliverS929 · 2026-02-24T08:56:28Z

/retest

tiprow · 2026-02-24T08:56:52Z

@OliverS929: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2026-02-24T12:57:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, lance6716

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/lightning/OWNERS~~ [D3Hunter,lance6716]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-02-24T12:57:58Z

[LGTM Timeline notifier]

Timeline:

2026-02-24 05:59:17.197677109 +0000 UTC m=+161829.712471718: ☑️ agreed by D3Hunter.
2026-02-24 12:57:56.610609328 +0000 UTC m=+186949.125403956: ☑️ agreed by lance6716.

ti-chi-bot · 2026-02-24T13:07:10Z

@D3Hunter: new pull request created to branch master: #66354.
But this PR has conflicts, please resolve them!

Details

In response to this:

/cherry-pick master

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

lightning: restore two-level split/scatter with limiter and add related

8263c0e

regression tests.

ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 19, 2026

OliverS929 requested review from D3Hunter, lance6716 and tangenta February 19, 2026 14:03

OliverS929 changed the title ~~lightning: restore two-level split/scatter while keeping limiter behavior~~ lightning: restore two-level split/scatter while keeping limiter behavior | tidb-test=release-8.1.2 Feb 19, 2026

lance6716 reviewed Feb 21, 2026

View reviewed changes

pkg/lightning/backend/local/localhelper.go Outdated Show resolved Hide resolved

lightning: fix duplicate tail key in coarse split keys.

289be2f

D3Hunter reviewed Feb 24, 2026

View reviewed changes

lightning: Changed to use a named threshold constant and lowered to 64,

a524a09

and avoid ad-hoc one-off stubs.

D3Hunter approved these changes Feb 24, 2026

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Feb 24, 2026

ti-chi-bot bot added the approved label Feb 24, 2026

pantheon-ai bot reviewed Feb 24, 2026

View reviewed changes

lightning: document rationale for coarse split threshold 64

c1be232

lance6716 approved these changes Feb 24, 2026

View reviewed changes

ti-chi-bot bot added the lgtm label Feb 24, 2026

ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Feb 24, 2026

ti-chi-bot bot merged commit 4ffecda into pingcap:feature/release-8.1-gsort-test Feb 24, 2026
18 of 19 checks passed

ti-chi-bot mentioned this pull request Feb 24, 2026

ingest: use 2 level split & scatter #66354

Merged

13 tasks

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Feb 24, 2026

This is an automated cherry-pick of pingcap#66312

b996066

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

OliverS929 deleted the feature/release-8.1-gsort-test branch February 27, 2026 03:47

Conversation

OliverS929 commented Feb 19, 2026

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Uh oh!

tiprow bot commented Feb 19, 2026

Uh oh!

OliverS929 commented Feb 19, 2026

Uh oh!

tiprow bot commented Feb 19, 2026

Uh oh!

codecov bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

D3Hunter commented Feb 24, 2026

Uh oh!

lance6716 commented Feb 24, 2026

Uh oh!

D3Hunter Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

OliverS929 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

D3Hunter Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

OliverS929 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

lance6716 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

D3Hunter commented Feb 24, 2026

Uh oh!

D3Hunter commented Feb 24, 2026

Uh oh!

pantheon-ai bot commented Feb 24, 2026

Uh oh!

wjhuang2016 commented Feb 24, 2026

Uh oh!

pantheon-ai bot commented Feb 24, 2026

Uh oh!

OliverS929 commented Feb 24, 2026

Uh oh!

8rosemary8 commented Feb 24, 2026

Uh oh!

pantheon-ai bot commented Feb 24, 2026

Uh oh!

D3Hunter commented Feb 24, 2026

Uh oh!

pantheon-ai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

D3Hunter left a comment

Choose a reason for hiding this comment

Uh oh!

D3Hunter Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

D3Hunter commented Feb 24, 2026

Uh oh!

ti-chi-bot commented Feb 24, 2026

Uh oh!

D3Hunter commented Feb 24, 2026

Uh oh!

pantheon-ai bot Feb 24, 2026

Choose a reason for hiding this comment

Corner Case Analysis: getCoarseGrainedSplitKeys

🔴 P0 - Critical: Empty Input Panic

✅ All Other Cases Work Correctly

📝 Minor: Test Comment Needs Update

📊 Test Coverage

Uh oh!

OliverS929 Feb 24, 2026

Choose a reason for hiding this comment

codecov bot commented Feb 19, 2026 •

edited

Loading

lance6716 commented Feb 24, 2026 •

edited

Loading

pantheon-ai bot commented Feb 24, 2026 •

edited

Loading

Corner Case Analysis: `getCoarseGrainedSplitKeys`