Skip to content

ingest: use 2 level split & scatter#66354

Merged
ti-chi-bot[bot] merged 4 commits intopingcap:masterfrom
ti-chi-bot:cherry-pick-66312-to-master
Mar 5, 2026
Merged

ingest: use 2 level split & scatter#66354
ti-chi-bot[bot] merged 4 commits intopingcap:masterfrom
ti-chi-bot:cherry-pick-66312-to-master

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

@ti-chi-bot ti-chi-bot commented Feb 24, 2026

This is an automated cherry-pick of #66312

### What problem does this PR solve?

Issue Number: ref #66311

Problem Summary:

NOTE: master doesn't have this feature, so there is no RESTORE

below is the original description

  • lightning: use 2 level split & scatter | tidb-test=release-8.1.2 #59609 introduced two-level split/scatter (coarse + fine) to reduce region concentration during large import workloads.
  • #62419 introduced split/ingest rate limiting but removed the coarse split/scatter stage.
  • With large split key sets, this can increase split concentration and skew region distribution.

What changed and how does it work?

  • Restore two-level split/scatter in pkg/lightning/backend/local/localhelper.go:
    • run coarse split/scatter first when len(splitKeys) > 100
    • then run fine-grained split/scatter on all split keys
  • Keep #62419 limiter behavior and apply it to both levels.
    • both coarse and fine stages share the same limiter path
    • batch cap by burst is still preserved
  • Add/extend unit tests in pkg/lightning/backend/local/localhelper_test.go:
    • large key set triggers coarse + fine
    • small key set remains fine-only
    • coarse-stage failure returns immediately
    • limiter is still enforced after two-level restoration

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Lightning: restore two-level split/scatter before ingest and keep rate limiting for both coarse and fine stages.

Summary by CodeRabbit

  • Refactor

    • Added a two-level batching approach that first processes a coarse subset of split keys, then completes the full split to improve efficiency for very large operations while preserving prior behavior for smaller sets.
    • Enhanced rate limiting during batched splitting to better manage resources and stability.
  • Tests

    • Added tests covering two-level batching, coarse-grained sampling, limiter enforcement, and related edge/error cases.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-master labels Feb 24, 2026
@ti-chi-bot
Copy link
Copy Markdown
Member Author

@D3Hunter This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Feb 24, 2026

I've accepted your request and will start reviewing the pull request. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

Open in Web
Learn more about Pantheon AI

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Feb 24, 2026

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Feb 24, 2026

@pantheon-ai[bot]: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter D3Hunter changed the title lightning: restore two-level split/scatter while keeping limiter behavior | tidb-test=release-8.1.2 (#66312) lightning: restore two-level split/scatter while keeping limiter behavior Feb 25, 2026
@D3Hunter D3Hunter changed the title lightning: restore two-level split/scatter while keeping limiter behavior ingest: use 2 level split & scatter Feb 25, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Mar 4, 2026

Review of updated commits could not be completed at this time due to a temporary infrastructure outage (execution providers unavailable). Please re-trigger the review once the platform recovers.

ℹ️ Learn more details on Pantheon AI.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

Adds a two-level splitting path: for large split-key sets, compute a coarse-grained subset (~√n), scatter those keys first, then continue batched splitting with rate limiting; includes helper functions and tests. No public API changes.

Changes

Cohort / File(s) Summary
Two-Level Split Optimization
pkg/lightning/backend/local/localhelper.go
Introduces a coarse-grained splitting threshold, getCoarseGrainedSplitKeys, and splitAndScatterRegionInBatchesWithLimiter. Implements a path that samples ~√n keys, scatters them first, then proceeds with full-key batched splitting and rate limiting.
Test Coverage & Mocks
pkg/lightning/backend/local/localhelper_test.go
Extends testSplitClient with splitKeysAndScatterF and modifies SplitKeysAndScatter to allow injected behavior. Adds TestSplitAndScatterRegionInBatchesTwoLevel, TestGetCoarseGrainedSplitKeys, helpers, and cases for large/small key sets, coarse-layer errors, limiter behavior, and duplicate-last-key checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibble keys both big and small,

First coarse, then fine, I split them all.
I scatter seeds across the ground,
Then stitch the rows in batches round.
Hop, hop — tidy regions found! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ingest: use 2 level split & scatter' directly relates to the main change of restoring two-level (coarse + fine) split/scatter functionality in the Lightning ingest process.
Description check ✅ Passed The PR description follows the template structure with all required sections completed: issue reference, problem summary, what changed and how it works, comprehensive check list with unit tests marked, and a release note.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…o-level split/scatter tests and coarse threshold
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Mar 4, 2026

Review of updated commits (dea115a) could not be completed — execution infrastructure is temporarily unavailable (CR2 provider overloaded). Please re-trigger the review once the platform recovers.

ℹ️ Learn more details on Pantheon AI.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/lightning/backend/local/localhelper_test.go (1)

374-378: ⚠️ Potential issue | 🔴 Critical

Add missing } to close TestTuneStoreWriteLimiter function.

TestTuneStoreWriteLimiter starting at line 343 is missing its closing brace. Line 378 begins TestSplitAndScatterRegionInBatchesTwoLevel while the previous function block remains unclosed, causing a compilation error.

🐛 Proposed fix
 	limiter.UpdateLimit(200)
 	ctx1, cancel1 := context.WithTimeout(context.Background(), time.Second*2)
 	defer cancel1()
 	testLimiter(ctx1, 200)
+}
+
 func TestSplitAndScatterRegionInBatchesTwoLevel(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/lightning/backend/local/localhelper_test.go` around lines 374 - 378, The
test function TestTuneStoreWriteLimiter is missing its closing brace; add a
single `}` to terminate TestTuneStoreWriteLimiter (the function that calls
limiter.UpdateLimit(200), context.WithTimeout and testLimiter(ctx1, 200)) so the
subsequent TestSplitAndScatterRegionInBatchesTwoLevel function is not nested and
the file compiles.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/lightning/backend/local/localhelper_test.go`:
- Around line 374-378: The test function TestTuneStoreWriteLimiter is missing
its closing brace; add a single `}` to terminate TestTuneStoreWriteLimiter (the
function that calls limiter.UpdateLimit(200), context.WithTimeout and
testLimiter(ctx1, 200)) so the subsequent
TestSplitAndScatterRegionInBatchesTwoLevel function is not nested and the file
compiles.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 959632cc-cd27-49af-94c3-81ebd373f009

📥 Commits

Reviewing files that changed from the base of the PR and between fc2d6e6 and dea115a.

📒 Files selected for processing (1)
  • pkg/lightning/backend/local/localhelper_test.go

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.2835%. Comparing base (5f3cbfe) to head (93d4472).
⚠️ Report is 45 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #66354        +/-   ##
================================================
+ Coverage   77.6724%   78.2835%   +0.6110%     
================================================
  Files          2006       1939        -67     
  Lines        548699     544168      -4531     
================================================
- Hits         426188     425994       -194     
+ Misses       120851     117722      -3129     
+ Partials       1660        452      -1208     
Flag Coverage Δ
integration 44.2626% <10.0000%> (-3.9225%) ⬇️
unit 76.6828% <100.0000%> (+0.3625%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 48.7794% <ø> (-12.0872%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Mar 4, 2026

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

ℹ️ Learn more details on Pantheon AI.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/lightning/backend/local/localhelper_test.go (1)

381-387: Extract duplicated makeSplitKeys helper to file scope.

The same helper is declared twice; pulling it into one file-level helper will keep future test edits centralized.

♻️ Proposed refactor
+func makeSplitKeys(n int) [][]byte {
+	keys := make([][]byte, n)
+	for i := 0; i < n; i++ {
+		keys[i] = []byte{byte(i >> 8), byte(i)}
+	}
+	return keys
+}
+
 func TestSplitAndScatterRegionInBatchesTwoLevel(t *testing.T) {
-	makeSplitKeys := func(n int) [][]byte {
-		keys := make([][]byte, n)
-		for i := 0; i < n; i++ {
-			keys[i] = []byte{byte(i >> 8), byte(i)}
-		}
-		return keys
-	}
-
 	t.Run("large split keys trigger coarse and fine layers", func(t *testing.T) {
 		...
 	})
 }

 func TestGetCoarseGrainedSplitKeys(t *testing.T) {
-	makeSplitKeys := func(n int) [][]byte {
-		keys := make([][]byte, n)
-		for i := 0; i < n; i++ {
-			keys[i] = []byte{byte(i >> 8), byte(i)}
-		}
-		return keys
-	}
-
 	t.Run("last key selected in loop is not appended twice", func(t *testing.T) {
 		...
 	})
 }

Also applies to: 443-449

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/lightning/backend/local/localhelper_test.go` around lines 381 - 387,
Extract the duplicated makeSplitKeys helper into a single file-scoped function
named makeSplitKeys in localhelper_test.go and remove the other duplicate
declarations; specifically, replace the two local copies (the one shown and the
separate copy around lines 443-449) with one top-level function so tests reuse
the same helper, keeping the function signature makeSplitKeys(n int) [][]byte
and its existing implementation that builds keys with []byte{byte(i >> 8),
byte(i)}.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/lightning/backend/local/localhelper_test.go`:
- Around line 381-387: Extract the duplicated makeSplitKeys helper into a single
file-scoped function named makeSplitKeys in localhelper_test.go and remove the
other duplicate declarations; specifically, replace the two local copies (the
one shown and the separate copy around lines 443-449) with one top-level
function so tests reuse the same helper, keeping the function signature
makeSplitKeys(n int) [][]byte and its existing implementation that builds keys
with []byte{byte(i >> 8), byte(i)}.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0545410e-10f8-4ad3-a6b4-7a3e8657e85b

📥 Commits

Reviewing files that changed from the base of the PR and between dea115a and 93d4472.

📒 Files selected for processing (1)
  • pkg/lightning/backend/local/localhelper_test.go

Copy link
Copy Markdown

@pantheon-ai pantheon-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Code looks good. No issues found.

@OliverS929
Copy link
Copy Markdown
Contributor

/retest

@OliverS929 OliverS929 assigned OliverS929 and unassigned D3Hunter Mar 5, 2026
@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 5, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, OliverS929

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 5, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 5, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-05 02:52:17.143059531 +0000 UTC m=+412981.721138724: ☑️ agreed by D3Hunter.
  • 2026-03-05 04:21:55.668262899 +0000 UTC m=+418360.246342093: ☑️ agreed by OliverS929.

@OliverS929
Copy link
Copy Markdown
Contributor

/unhold

@ti-chi-bot ti-chi-bot bot removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/needs-tests-checked labels Mar 5, 2026
@ti-chi-bot ti-chi-bot bot merged commit c909831 into pingcap:master Mar 5, 2026
32 checks passed
@ti-chi-bot ti-chi-bot bot deleted the cherry-pick-66312-to-master branch March 5, 2026 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-master

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants