Skip to content

pkg/ddl, pkg/tici: add TiCI pre-split for DDL global sort ingest | tidb-test=13ccf8de48e8db2290ff884598444d0508606bbf tiflash=feature-fts#67313

Merged
ti-chi-bot[bot] merged 1 commit intopingcap:release-fts-202602from
3pointer:release-fts-202602
Mar 26, 2026

Conversation

@3pointer
Copy link
Copy Markdown
Contributor

@3pointer 3pointer commented Mar 26, 2026

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

When TiDB builds FULLTEXT or HYBRID indexes with the TiCI backend in global sort mode, TiCI does not get an early pre-split signal before the write-and-ingest stage starts. As a result, TiCI cannot use the aggregated SortedKV metadata to analyze shard distribution and split internal shards ahead of ingest.

In addition, the TiDB side does not define the corresponding PreSplitImportShards RPC yet, so the pre-split request cannot be sent with the required metadata.

What changed and how does it work?

This PR is scoped to DDL backfill only (ADD FULLTEXT INDEX / ADD HYBRID INDEX). IMPORT INTO is not supported in this PR; follow-up changes can be made separately if needed.

This PR adds a TiCI pre-split hook in generateGlobalSortIngestPlan for TiCI-backed FULLTEXT and HYBRID index jobs.

The main changes are:

  • Add PreSplitImportShards RPC and related request/response messages to pkg/tici/tici.proto, then regenerate pkg/tici/tici.pb.go.
  • Add TiCI client support in pkg/tici/tici_manager_client.go to call PreSplitImportShards, including keyspace propagation and test failpoint hooks.
  • In generateGlobalSortIngestPlan, detect ActionAddFullTextIndex and ActionAddHybridIndex, aggregate merged SortedKVMeta groups, and build a TiCI pre-split request with:
    • global start_key / end_key
    • total_kv_size / total_kv_cnt
    • data_file_count / stat_file_count
    • per-group metadata in meta_groups
  • Call the TiCI pre-split RPC synchronously with a 1 minute timeout before generating the final ingest plan.
  • If the TiCI pre-split call fails, log the error and degrade to the existing global-sort ingest flow instead of failing the whole DDL job.
  • Add a guard in FULLTEXT index creation to require cloud storage, because TiCI pre-split only applies to the global-sort ingest path.
  • Add unit tests for both the DDL scheduler flow and the TiCI client request path.

Check List

Tests

  • Unit test

    Suggested / used test commands:

    • make failpoint-enable && (cd pkg/ddl && go test -run TestBackfillingSchedulerGlobalSortModeTiCIPreSplit --tags=intest; rc=$?; cd ../..; make failpoint-disable; exit $rc)
    • go test -run TestPreSplitImportShards --tags=intest ./pkg/tici
    • go test -run TestPreSplitImportShardsMock --tags=intest ./pkg/tici
  • Integration test

  • Manual test (add detailed scripts or steps below)

  • No need to test

    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Support calling TiCI PreSplitImportShards before global-sort ingest for FULLTEXT and HYBRID index backfill jobs.


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

## Release Notes

* **New Features**
  * Added pre-split import shards support for full-text and hybrid index creation to optimize shard distribution during index backfilling operations.
  * Enhanced distributed backfilling scheduler with TiCI integration for improved metadata-driven planning.

* **Tests**
  * Added comprehensive test coverage for pre-split import shards functionality and backfilling scheduler workflows.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 26, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Mar 26, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

The changes integrate a new TiCI PreSplitImportShards RPC into the DDL backfilling scheduler to support pre-splitting import shards before full index building. The RangeSplitter API is extended to return group size and key count metrics, enabling accurate tracking of split groups. TiCI client code is added with failpoint test support, and DDL backfilling logic now conditionally invokes TiCI pre-split for full-text and hybrid index jobs.

Changes

Cohort / File(s) Summary
TiCI Proto & RPC Definition
pkg/tici/tici.proto
Added PreSplitImportShards RPC and supporting message types (PreSplitImportShardsRequest, PreSplitImportShardsResponse, PreSplitImportShardMeta, PreSplitImportIndexResult) to enable meta-driven import shard pre-splitting.
TiCI Client & Manager
pkg/tici/tici_manager_client.go, pkg/tici/tici_manager_client_test.go, pkg/tici/BUILD.bazel
Implemented PreSplitImportShards method with failpoint interception for testing, exposed test helpers for request capture, added keyspace codec interface, and updated build dependencies and test sharding.
RangeSplitter Signature & Tests
pkg/lightning/backend/external/split.go, pkg/lightning/backend/external/split_test.go, pkg/lightning/backend/external/testutil.go, pkg/lightning/backend/external/merge_v2.go, pkg/dxf/importinto/planner.go
Extended SplitOneRangesGroup() to return groupSize and groupKeyCnt; updated all call sites to handle expanded return values; adjusted test assertions to validate new metrics.
DDL Backfilling TiCI Integration
pkg/ddl/backfilling_dist_scheduler.go, pkg/ddl/backfilling_dist_scheduler_internal_test.go, pkg/ddl/backfilling_dist_scheduler_test.go, pkg/ddl/BUILD.bazel
Added storageWithPDAndCodec interface validation, conditional TiCI pre-split execution for specific job types, range-group aggregation logic (1GiB grouping), failpoint hooks, and comprehensive test coverage including mock store validation and TiCI request assertion.

Sequence Diagram

sequenceDiagram
    participant Scheduler as DDL Backfilling<br/>Scheduler
    participant Storage as Storage<br/>(PD + Codec)
    participant RangeSplit as RangeSplitter
    participant TiCI as TiCI Client
    participant MetaService as TiCI Meta<br/>Service

    Scheduler->>Storage: validateStorage<br/>as storageWithPDAndCodec
    Storage-->>Scheduler: ✓ codec available

    Scheduler->>RangeSplit: SplitOneRangesGroup()
    RangeSplit-->>Scheduler: endKey, dataFiles,<br/>groupSize, groupKeyCnt

    Scheduler->>Scheduler: Aggregate groups<br/>to 1GiB report groups<br/>Deduplicate file counts

    Scheduler->>TiCI: buildTiCIPreSplitRequest<br/>(task, table, index IDs,<br/>aggregated KV stats,<br/>report groups)

    TiCI->>MetaService: PreSplitImportShards<br/>Request (timeout: 1min)
    MetaService-->>TiCI: Response (split_keys,<br/>shard_counts)

    TiCI-->>Scheduler: ✓ pre-split complete<br/>or log error & continue
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

release-note

Suggested reviewers

  • wjhuang2016
  • OliverS929
  • GMHDBJD

Poem

🐰 Whiskers twitching with delight,
Pre-split shards now get it right!
TiCI hops through grouped-up ranges,
Splitting import schemes arranges,
Metrics tracked from split to shore!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive and well-structured, covering the problem statement, solution approach, implementation details, testing, and release notes as required by the template.
Title check ✅ Passed The title clearly and specifically describes the main change: adding TiCI pre-split functionality for DDL global sort ingest operations, with affected packages listed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 26, 2026
@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 26, 2026

Hi @3pointer. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@3pointer 3pointer changed the title pkg/ddl, pkg/tici: add TiCI pre-split for DDL global sort ingest (#67… pkg/ddl, pkg/tici: add TiCI pre-split for DDL global sort ingest Mar 26, 2026
@3pointer 3pointer marked this pull request as ready for review March 26, 2026 02:55
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 26, 2026
@3pointer
Copy link
Copy Markdown
Contributor Author

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Mar 26, 2026
@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 26, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/tici/tici.proto (1)

600-629: Document the start_key / end_key contract.

KeyRange above explicitly defines its bounds, but these new messages do not. Since the scheduler is feeding split boundaries straight into this RPC, it would help to state whether end_key is exclusive so TiCI does not have to infer it.

Proposed comment update
 // One merged SortedKVMeta group used for import pre-split analysis.
 message PreSplitImportShardMeta {
   int64 ele_id = 1;
+  // Inclusive lower bound of this meta group.
   bytes start_key = 2;
+  // Exclusive upper bound of this meta group.
   bytes end_key = 3;
   uint64 total_kv_size = 4;
   uint64 total_kv_cnt = 5;
   int32 data_file_count = 6;
   int32 stat_file_count = 7;
 }
 
 message PreSplitImportShardsRequest {
   // TiDB unique task ID for this Import Into/Index Backfilling job.
   string tidb_task_id = 1;
   // Table ID of the target table.
   int64 table_id = 2;
   // Index ID of the target index. If this is an Import Into job that relates
   // to multiple indexes, this field should contain all the index IDs.
   repeated int64 index_ids = 3;
   uint64 scan_snapshot_ts = 4;
+  // Inclusive lower bound across all meta_groups.
   bytes start_key = 5;
+  // Exclusive upper bound across all meta_groups.
   bytes end_key = 6;
   uint64 total_kv_size = 7;

As per coding guidelines, "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tici/tici.proto` around lines 600 - 629, The proto messages
PreSplitImportShardMeta and PreSplitImportShardsRequest lack a clear contract
for the start_key/end_key semantics; update their comments to state the exact
bounds (e.g., whether start_key is inclusive and end_key is exclusive, how
empty/null keys are treated, and any required prefix/encoding assumptions) so
callers (and TiCI scheduler) don't have to infer behavior from KeyRange; add
this clarifying text to the comments above the start_key/end_key fields in both
PreSplitImportShardMeta and PreSplitImportShardsRequest and reference KeyRange
only to note consistency with its inclusive/exclusive convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/tici/tici_manager_client.go`:
- Around line 431-439: The mock interception call maybeMockPreSplitImportShards
is invoked before the request is enriched with KeyspaceId, so tests that
exercise the mock path see a different payload than the real RPC; move the
maybeMockPreSplitImportShards(req) invocation to after the request is
normalized/enriched (i.e., after req.KeyspaceId = t.getKeyspaceID()) in the
PreSplitImportShards method (and the other similar entry point referenced),
ensuring ManagerCtx.getKeyspaceID() is applied before calling
maybeMockPreSplitImportShards so the mock sees the same request as the real RPC.

---

Nitpick comments:
In `@pkg/tici/tici.proto`:
- Around line 600-629: The proto messages PreSplitImportShardMeta and
PreSplitImportShardsRequest lack a clear contract for the start_key/end_key
semantics; update their comments to state the exact bounds (e.g., whether
start_key is inclusive and end_key is exclusive, how empty/null keys are
treated, and any required prefix/encoding assumptions) so callers (and TiCI
scheduler) don't have to infer behavior from KeyRange; add this clarifying text
to the comments above the start_key/end_key fields in both
PreSplitImportShardMeta and PreSplitImportShardsRequest and reference KeyRange
only to note consistency with its inclusive/exclusive convention.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 11723cfd-b5c0-4d14-a299-6f6b509f63e1

📥 Commits

Reviewing files that changed from the base of the PR and between 45244ec and cf54bd8.

⛔ Files ignored due to path filters (1)
  • pkg/tici/tici.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (13)
  • pkg/ddl/BUILD.bazel
  • pkg/ddl/backfilling_dist_scheduler.go
  • pkg/ddl/backfilling_dist_scheduler_internal_test.go
  • pkg/ddl/backfilling_dist_scheduler_test.go
  • pkg/dxf/importinto/planner.go
  • pkg/lightning/backend/external/merge_v2.go
  • pkg/lightning/backend/external/split.go
  • pkg/lightning/backend/external/split_test.go
  • pkg/lightning/backend/external/testutil.go
  • pkg/tici/BUILD.bazel
  • pkg/tici/tici.proto
  • pkg/tici/tici_manager_client.go
  • pkg/tici/tici_manager_client_test.go

Comment on lines +431 to +439
func (t *ManagerCtx) PreSplitImportShards(ctx context.Context, req *PreSplitImportShardsRequest) error {
if handled, err := maybeMockPreSplitImportShards(req); handled {
return err
}
if req == nil {
return errors.New("pre split import shards request is nil")
}
req.KeyspaceId = t.getKeyspaceID()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Move the mock interception after request enrichment.

Both entry points run maybeMockPreSplitImportShards(req) before req.KeyspaceId is filled, so the captured payload in the new failpoint-based tests is not the same request that the real RPC sends. That makes the mock path blind to keyspace-propagation regressions.

Suggested fix
 func (t *ManagerCtx) PreSplitImportShards(ctx context.Context, req *PreSplitImportShardsRequest) error {
-	if handled, err := maybeMockPreSplitImportShards(req); handled {
-		return err
-	}
 	if req == nil {
 		return errors.New("pre split import shards request is nil")
 	}
 	req.KeyspaceId = t.getKeyspaceID()
+	if handled, err := maybeMockPreSplitImportShards(req); handled {
+		return err
+	}
 
 	t.mu.RLock()
 	defer t.mu.RUnlock()
 	...
 }
 func PreSplitImportShards(ctx context.Context, store keyspaceStorage, req *PreSplitImportShardsRequest) error {
-	if handled, err := maybeMockPreSplitImportShards(req); handled {
-		return err
-	}
+	if req == nil {
+		return errors.New("pre split import shards request is nil")
+	}
+	if store != nil {
+		req.KeyspaceId = uint32(store.GetCodec().GetKeyspaceID())
+	}
+	if handled, err := maybeMockPreSplitImportShards(req); handled {
+		return err
+	}
 	etcdClient, err := getEtcdClientFunc()
 	...
 }

Also applies to: 1038-1060

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tici/tici_manager_client.go` around lines 431 - 439, The mock
interception call maybeMockPreSplitImportShards is invoked before the request is
enriched with KeyspaceId, so tests that exercise the mock path see a different
payload than the real RPC; move the maybeMockPreSplitImportShards(req)
invocation to after the request is normalized/enriched (i.e., after
req.KeyspaceId = t.getKeyspaceID()) in the PreSplitImportShards method (and the
other similar entry point referenced), ensuring ManagerCtx.getKeyspaceID() is
applied before calling maybeMockPreSplitImportShards so the mock sees the same
request as the real RPC.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 67.04871% with 115 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-fts-202602@45244ec). Learn more about missing BASE report.

Additional details and impacted files
@@                   Coverage Diff                   @@
##             release-fts-202602     #67313   +/-   ##
=======================================================
  Coverage                      ?   76.7314%           
=======================================================
  Files                         ?       1962           
  Lines                         ?     558695           
  Branches                      ?          0           
=======================================================
  Hits                          ?     428695           
  Misses                        ?     128538           
  Partials                      ?       1462           
Flag Coverage Δ
integration 45.3443% <0.0000%> (?)
unit 73.9331% <67.0487%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 66.2237% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@3pointer
Copy link
Copy Markdown
Contributor Author

/retest

@3pointer 3pointer changed the title pkg/ddl, pkg/tici: add TiCI pre-split for DDL global sort ingest pkg/ddl, pkg/tici: add TiCI pre-split for DDL global sort ingest | tidb-test=13ccf8de48e8db2290ff884598444d0508606bbf tiflash=feature-fts Mar 26, 2026
@3pointer
Copy link
Copy Markdown
Contributor Author

/retest

@3pointer
Copy link
Copy Markdown
Contributor Author

/test pull-error-log-review

@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 26, 2026

@3pointer: No presubmit jobs available for pingcap/tidb@release-fts-202602

Details

In response to this:

/test pull-error-log-review

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: GMHDBJD, wjhuang2016

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 26, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 26, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-26 02:57:05.260705849 +0000 UTC m=+409821.296776109: ☑️ agreed by GMHDBJD.
  • 2026-03-26 08:03:05.335101135 +0000 UTC m=+428181.371171395: ☑️ agreed by wjhuang2016.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 26, 2026

@3pointer: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-error-log-review cf54bd8 link false /test pull-error-log-review

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot ti-chi-bot bot merged commit 2338f7d into pingcap:release-fts-202602 Mar 26, 2026
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants