lightning: restore two-level split/scatter while keeping limiter behavior | tidb-test=release-8.1.2#66312
Conversation
|
Hi @OliverS929. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test mysql-test |
|
@OliverS929: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/release-8.1-gsort-test #66312 +/- ##
===================================================================
Coverage ? 71.0029%
===================================================================
Files ? 1479
Lines ? 427509
Branches ? 0
===================================================================
Hits ? 303544
Misses ? 103338
Partials ? 20627
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@pantheon-ai review |
this one https://github.com/apps/pantheon-ai |
| limiter = rate.NewLimiter(rate.Limit(eventLimit), burstPerSec*ratePerSecMultiplier) | ||
| batchCnt = min(batchCnt, burstPerSec) | ||
| } | ||
| if len(splitKeys) > 100 { |
There was a problem hiding this comment.
maybe smaller, such as 64, in nextgen, each region is about 1g, our current subtask size is around 100g, so nearly 100 regions, and it might not trigger this
also make it constant
There was a problem hiding this comment.
I changed it to a named threshold constant and lowered to 64. PTAL.
| type splitAndScatterCounterClient struct { | ||
| split.SplitClient | ||
| splitCount atomic.Int32 | ||
| } |
There was a problem hiding this comment.
maybe generate mock impl for split.SplitClient to make adding test easier, see example in gen_mock of Makefile
There was a problem hiding this comment.
I take the approach to remove one-off split client test structs and reused/extened existing testSplitClient with callback injection added. PTAL.
|
@pantheon-ai[bot] PTAL -- update
Don't know how to trigger it. The |
|
remember to pick to master branch which also doesn't have it now |
|
@pantheon-bot review this pr |
|
Hi @D3Hunter, You're not in the allow list for this repository. Only users in the allow list can trigger Pantheon here. Please contact the repo maintainer if you need access. Learn more about Pantheon AI |
|
@pantheon-bot review this pr |
|
Hi @wjhuang2016, You're not in the allow list for this repository. Only users in the allow list can trigger Pantheon here. Please contact the repo maintainer if you need access. Learn more about Pantheon AI |
Yeah will do once this PR is merged. |
|
@pantheon-bot review this pr |
|
Hi @8rosemary8, You're not in the allow list for this repository. Only users in the allow list can trigger Pantheon here. Please contact the repo maintainer if you need access. Learn more about Pantheon AI |
|
@pantheon-bot review |
|
This PR restores Lightning’s two-level split/scatter (coarse + fine) in
|
and avoid ad-hoc one-off stubs.
| "golang.org/x/time/rate" | ||
| ) | ||
|
|
||
| const coarseGrainedSplitKeysThreshold = 64 |
There was a problem hiding this comment.
maybe comment the reason why we choose it
|
/cherry-pick master |
|
@D3Hunter: once the present PR merges, I will cherry-pick it on top of master in the new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
@lance6716 ptal |
| return local.splitAndScatterRegionInBatchesWithLimiter(ctx, splitKeys, batchCnt, limiter) | ||
| } | ||
|
|
||
| func getCoarseGrainedSplitKeys(splitKeys [][]byte) [][]byte { |
There was a problem hiding this comment.
Corner Case Analysis: getCoarseGrainedSplitKeys
I've analyzed this function for corner cases and found one critical bug and several observations:
🔴 P0 - Critical: Empty Input Panic
Issue: When splitKeys is empty (nil or len==0), the function panics with index out of range:
// When len(splitKeys) == 0:
sqrtCnt := int(math.Sqrt(0)) = 0
// Loop doesn't execute (i < 0 is false)
// But then this executes:
if i-sqrtCnt != len(splitKeys)-1 // 0 - 0 != -1 is true
splitKeys[len(splitKeys)-1] // splitKeys[-1] → PANIC!Test confirmation: I created and ran comprehensive tests that confirmed this panic.
Current protection: The caller only invokes this when len(splitKeys) > 100, so it's not reachable in production currently. However, this is fragile if:
- The function is called directly elsewhere
- Future refactoring changes the guard condition
- Unit tests call it directly
Recommendation: Add defensive guard:
func getCoarseGrainedSplitKeys(splitKeys [][]byte) [][]byte {
if len(splitKeys) == 0 {
return nil
}
// ... rest of function
}✅ All Other Cases Work Correctly
Tested extensively with inputs ranging from n=1 to n=122:
- ✅ No hangs detected
- ✅ Last key always included exactly once
- ✅ No duplicate keys introduced
- ✅ Output maintains sorted order
📝 Minor: Test Comment Needs Update
In the test comment:
// 121 keys => coarse pass(11 keys, 1 batch)Should be:
// 121 keys => coarse pass(12 keys, 1 batch)The function returns 12 coarse keys for n=121, not 11 (indices: 0,11,22,...,110,120).
📊 Test Coverage
The existing test covers the "last key not duplicated" case (n=122) well. Consider adding:
- Empty input test (currently would panic)
- Case where last key must be appended (e.g., n=100, n=121)
I can provide the test code if needed.
There was a problem hiding this comment.
In current call path, empty splitKeys cannot reach this helper because caller has len(splitKeys) > coarseGrainedSplitKeysThreshold guard. But I can add an explicit len==0 guard here improves helper robustness and future-proofing, if that's a must for other reviewers.
|
/retest |
|
@OliverS929: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: D3Hunter, lance6716 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4ffecda
into
pingcap:feature/release-8.1-gsort-test
|
@D3Hunter: new pull request created to branch DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

What problem does this PR solve?
Issue Number: ref #66311
Problem Summary:
#59609introduced two-level split/scatter (coarse + fine) to reduce region concentration during large import workloads.#62419introduced split/ingest rate limiting but removed the coarse split/scatter stage.What changed and how does it work?
pkg/lightning/backend/local/localhelper.go:len(splitKeys) > 100#62419limiter behavior and apply it to both levels.pkg/lightning/backend/local/localhelper_test.go:Check List
Tests
Side effects
Documentation
Release note