importer: sample a portion of compressed files to speed up import spec generation (#64769) by ti-chi-bot · Pull Request #67654 · pingcap/tidb

ti-chi-bot · 2026-04-09T10:30:23Z

This is an automated cherry-pick of #64769

What problem does this PR solve?

Issue Number: close #64770

Problem Summary:

What changed and how does it work?

For compressed files, it may be time consuming to get compression ratio for each file. Since the ratio we got is also a rough value, here we only sample first 512 (maybe make it configurable) files for each compression type and use harmonic mean to get the average compression ratio.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Create 10,000 zstd files on ks3, and import with a 8C instance.

Before:

mysql> import into test.t1 from "s3://global-sort/joechenrh/zstd/*.csv.zst?access-key=xxxxxx&secret-access-key=xxxxxx&endpoint=xxxxxx&force-path-style=false&region=Beijing&provider=ks" with thread=8, detached;
+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------+-------+---------+------------------+---------------+----------------+----------------------------+------------+----------+------------+------------------+----------+-------------------------+---------------------+-----------------------+----------------+--------------+
| Job_ID | Group_Key | Data_Source                                                                                                                                                                             | Target_Table | Table_ID | Phase | Status  | Source_File_Size | Imported_Rows | Result_Message | Create_Time                | Start_Time | End_Time | Created_By | Last_Update_Time | Cur_Step | Cur_Step_Processed_Size | Cur_Step_Total_Size | Cur_Step_Progress_Pct | Cur_Step_Speed | Cur_Step_ETA |
+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------+-------+---------+------------------+---------------+----------------+----------------------------+------------+----------+------------+------------------+----------+-------------------------+---------------------+-----------------------+----------------+--------------+
|      1 | NULL      | s3://global-sort/joechenrh/zstd/*.csv.zst?access-key=xxxxxx&endpoint=xxxxxx&force-path-style=false&provider=ks&region=Beijing&secret-access-key=xxxxxx | `test`.`t1`  |      114 |       | pending | 35.98GiB         |          NULL |                | 2025-12-10 05:43:08.049237 | NULL       | NULL     | root@%     | NULL             | NULL     | NULL                    | NULL                | NULL                  | NULL           | NULL         |
+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------+-------+---------+------------------+---------------+----------------+----------------------------+------------+----------+------------+------------------+----------+-------------------------+---------------------+-----------------------+----------------+--------------+
1 row in set (3 min 9.709 sec)

After:

mysql> import into test.t1 from "s3://global-sort/joechenrh/zstd/*.csv.zst?access-key=xxxxxx&secret-access-key=xxxxxx&endpoint=xxxxxx&force-path-style=false&region=Beijing&provider=ks" with thread=8, detached;
+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------+-------+---------+------------------+---------------+----------------+----------------------------+------------+----------+------------+------------------+----------+-------------------------+---------------------+-----------------------+----------------+--------------+
| Job_ID | Group_Key | Data_Source                                                                                                                                                                             | Target_Table | Table_ID | Phase | Status  | Source_File_Size | Imported_Rows | Result_Message | Create_Time                | Start_Time | End_Time | Created_By | Last_Update_Time | Cur_Step | Cur_Step_Processed_Size | Cur_Step_Total_Size | Cur_Step_Progress_Pct | Cur_Step_Speed | Cur_Step_ETA |
+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------+-------+---------+------------------+---------------+----------------+----------------------------+------------+----------+------------+------------------+----------+-------------------------+---------------------+-----------------------+----------------+--------------+
|      1 | NULL      | s3://global-sort/joechenrh/zstd/*.csv.zst?access-key=xxxxxx&endpoint=xxxxxx&force-path-style=false&provider=ks&region=Beijing&secret-access-key=xxxxxx | `test`.`t1`  |      114 |       | pending | 35.98GiB         |          NULL |                | 2025-12-10 05:43:08.049237 | NULL       | NULL     | root@%     | NULL             | NULL     | NULL                    | NULL                | NULL                  | NULL           | NULL         |
+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------+-------+---------+------------------+---------------+----------------+----------------------------+------------+----------+------------+------------------+----------+-------------------------+---------------------+-----------------------+----------------+--------------+
1 row in set (11.757 sec)

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

New Features
- Improved real-size estimation for compressed Parquet files during data import by sampling compression ratios to produce more accurate size calculations.
Tests
- Added test coverage for compressed file handling in data import operations to validate estimation behavior.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot · 2026-04-09T10:30:28Z

@D3Hunter This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

ti-chi-bot · 2026-04-09T10:30:32Z

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

coderabbitai · 2026-04-09T10:30:47Z

📝 Walkthrough

Walkthrough

Adds sampling-based compression-aware real-size estimation to importer initialization, using per-format sampling (Parquet special-case) and a cached harmonic-mean compression estimator; also increases a test shard count and adds a test for scanning many compressed files.

Changes

Cohort / File(s)	Summary
Build config `pkg/executor/importer/BUILD.bazel`	Incremented `go_test` shard_count from `32` to `33`.
Importer logic `pkg/executor/importer/import.go`	Added `estimateCompressionRatio(...)`, a `compressionEstimator` with capped sampling and harmonic-mean aggregation, Parquet sampling path, and updated `LoadDataController.InitDataFiles` to detect format once (via `sync.Once`) and apply sampled size expansion when computing `fileMeta.RealSize`.
Tests `pkg/executor/importer/import_test.go`	Added `TestInitCompressedFiles` which creates many `*.csv.gz` files, enables a failpoint to force sampling behavior, and verifies `InitDataFiles` with a glob pattern succeeds.

Sequence Diagram

sequenceDiagram
    participant Client
    participant InitDataFiles
    participant Detector
    participant Sampler
    participant Estimator
    participant SizeCalc

    Client->>InitDataFiles: InitDataFiles(globPattern)
    InitDataFiles->>Detector: detectAndUpdateFormat() [sync.Once]
    Detector-->>InitDataFiles: sourceType

    InitDataFiles->>Sampler: sample files (bounded)
    Sampler-->>Estimator: sampled stats
    Estimator->>Estimator: compute harmonic mean ratio
    Estimator-->>InitDataFiles: sizeExpansionRatio

    InitDataFiles->>SizeCalc: estimate(file) * fileSize * sizeExpansionRatio
    SizeCalc-->>InitDataFiles: estimated real sizes
    InitDataFiles-->>Client: completed

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

importsdk, importer, importinto: add import size estimate #67241: Modifies importer sampling and file-size estimation logic; strongly related to the new sampling/estimation changes in this PR.

Suggested labels

component/import, size/XXL, ok-to-test

Suggested reviewers

D3Hunter

Poem

🐰 I nibble bytes in quiet rows,

I sample where the wind still blows,
A harmonic hop, a ratio true,
Millions of files — I peek at few,
Now import dances light and new 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly summarizes the main change: sampling compressed files instead of scanning all to speed up import spec generation.
Description check	✅ Passed	The PR description includes issue number, problem summary, explanation of changes, completed testing checklist, and manual test results demonstrating performance improvement.
Linked Issues check	✅ Passed	The PR implements sampling of compressed files with harmonic mean estimation to avoid opening every file, directly addressing issue `#64770`'s objective to reduce scanning time for large numbers of compressed input files.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing compression ratio sampling for import spec generation. The Bazel shard count adjustment and test additions support the main sampling feature.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

joechenrh · 2026-04-09T10:35:11Z

/unhold

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

pkg/executor/importer/import_test.go (1)
325-328: Trim the fixture count to the actual sampling boundary.

Creating 2048 files is heavier than needed, and the magic number will drift if maxSampledCompressedFiles changes. maxSampledCompressedFiles + 1 is enough to cross the new cutoff and keeps the test targeted.
♻️ Proposed fix
-	for i := range 2048 {
+	for i := 0; i < maxSampledCompressedFiles+1; i++ {
 		fileName := filepath.Join(tempDir, fmt.Sprintf("test_%d.csv.gz", i))
 		require.NoError(t, os.WriteFile(fileName, []byte{}, 0o644))
 	}
As per coding guidelines "Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/importer/import_test.go` around lines 325 - 328, The test
currently creates 2048 files unnecessarily; replace the hardcoded range(2048)
with a minimal deterministic value based on the sampling boundary (use
maxSampledCompressedFiles + 1) so the test only produces one more than the
cutoff and remains correct if maxSampledCompressedFiles changes; update the loop
that builds fileName and writes empty files (the block creating test_%d.csv.gz)
to iterate up to maxSampledCompressedFiles+1 instead of 2048.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/executor/importer/import_test.go`:
- Line 330: The test uses testfailpoint.Enable (in import_test.go) but the
package isn't imported; add the missing import for the testfailpoint package
(e.g., import "github.com/pingcap/tidb/util/testfailpoint") to the test's import
block so testfailpoint.Enable is defined and the test compiles.

In `@pkg/executor/importer/import.go`:
- Around line 1454-1485: The RealSize calculation is inconsistent between glob
and exact-path imports: the glob branch uses ce.estimate(...) combined with
sizeExpansionRatio (from
detectAndUpdateFormat/getSourceType/estimateCompressionRatio) while the
exact-path branch still calls mydump.EstimateRealSizeForFile; factor the new
logic into a shared helper (e.g., computeRealSize(ctx, ce, path, size,
sourceType, s) or similar) that calls
detectAndUpdateFormat/getSourceType/estimateCompressionRatio as needed, uses
mydump.ParseCompressionOnFileExtension and ce.estimate to compute base size and
then applies sizeExpansionRatio, and replace both usages of
mydump.EstimateRealSizeForFile and the inline ce.estimate/sizeExpansionRatio
code in the ParallelProcess lambda and the exact-path branch to call this helper
and set SourceFileMeta.RealSize.
- Around line 1308-1310: The fast-path returns the just-sampled per-file
compressRatio if another worker stores r.ratio[compressTp] between the initial
r.ratio.Load(compressTp) check and acquiring the mutex; to fix this, after
acquiring the mutex (r.mu) re-check r.ratio.Load(compressTp) and if an aggregate
is now present return that cached aggregate instead of the per-file
compressRatio; otherwise proceed to initialize and publish the aggregate as
before (ensure you use the same compressTp/compressRatio symbols and release the
mutex after).
- Around line 1227-1229: The parquet sampling error should be handled as a
best-effort fallback instead of returning an error that aborts InitDataFiles: in
estimateCompressionRatio, catch errors from mydump.SampleStatisticsFromParquet
(the call that currently returns rows, rowSize, err) and on failure log or warn
and fall back to using FileSize (or a default compression ratio) to compute and
return the compression estimate rather than propagating the error; ensure the
once.Do path in InitDataFiles no longer receives an error from
estimateCompressionRatio so one unreadable/corrupt parquet file won't abort spec
generation.

---

Nitpick comments:
In `@pkg/executor/importer/import_test.go`:
- Around line 325-328: The test currently creates 2048 files unnecessarily;
replace the hardcoded range(2048) with a minimal deterministic value based on
the sampling boundary (use maxSampledCompressedFiles + 1) so the test only
produces one more than the cutoff and remains correct if
maxSampledCompressedFiles changes; update the loop that builds fileName and
writes empty files (the block creating test_%d.csv.gz) to iterate up to
maxSampledCompressedFiles+1 instead of 2048.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cdafe954-5d67-439b-840b-23f2deb13b79

📥 Commits

Reviewing files that changed from the base of the PR and between a8ad696 and d833dd1.

📒 Files selected for processing (3)

pkg/executor/importer/BUILD.bazel
pkg/executor/importer/import.go
pkg/executor/importer/import_test.go

coderabbitai · 2026-04-09T10:47:54Z

+	rows, rowSize, err := mydump.SampleStatisticsFromParquet(ctx, filePath, store)
+	if err != nil {
+		return 1.0, err


⚠️ Potential issue | 🟠 Major

Keep parquet size estimation best-effort.

estimateCompressionRatio now returns an error when parquet sampling fails, and the once.Do path propagates that out of InitDataFiles. The old path degraded to FileSize on estimation errors, so one unreadable or corrupt sampled parquet file now aborts spec generation instead of just losing the optimization.

🐛 Proposed fix

rows, rowSize, err := mydump.SampleStatisticsFromParquet(ctx, filePath, store) if err != nil { - return 1.0, err + logutil.Logger(ctx).Warn("fail to sample parquet statistics, fallback to file size", + zap.String("path", filePath), + zap.Error(err), + ) + return 1.0, nil }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rows, rowSize, err := mydump.SampleStatisticsFromParquet(ctx, filePath, store)

if err != nil {

return 1.0, err

rows, rowSize, err := mydump.SampleStatisticsFromParquet(ctx, filePath, store)

if err != nil {

logutil.Logger(ctx).Warn("fail to sample parquet statistics, fallback to file size",

zap.String("path", filePath),

zap.Error(err),

)

return 1.0, nil

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/executor/importer/import.go` around lines 1227 - 1229, The parquet sampling error should be handled as a best-effort fallback instead of returning an error that aborts InitDataFiles: in estimateCompressionRatio, catch errors from mydump.SampleStatisticsFromParquet (the call that currently returns rows, rowSize, err) and on failure log or warn and fall back to using FileSize (or a default compression ratio) to compute and return the compression estimate rather than propagating the error; ensure the once.Do path in InitDataFiles no longer receives an error from estimateCompressionRatio so one unreadable/corrupt parquet file won't abort spec generation.

coderabbitai · 2026-04-09T10:47:54Z

+	if _, ok := r.ratio.Load(compressTp); ok {
+		return compressRatio
+	}


⚠️ Potential issue | 🟡 Minor

Return the published aggregate after cache initialization.

If another worker stores r.ratio[compressTp] between the fast-path Load and taking mu, Line 1309 returns the just-sampled per-file ratio instead of the cached aggregate. That leaves a small tail of files with mixed estimates after the cache is initialized.

🐛 Proposed fix

- if _, ok := r.ratio.Load(compressTp); ok { - return compressRatio + if v, ok := r.ratio.Load(compressTp); ok { + return v.(float64) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/executor/importer/import.go` around lines 1308 - 1310, The fast-path returns the just-sampled per-file compressRatio if another worker stores r.ratio[compressTp] between the initial r.ratio.Load(compressTp) check and acquiring the mutex; to fix this, after acquiring the mutex (r.mu) re-check r.ratio.Load(compressTp) and if an aggregate is now present return that cached aggregate instead of the per-file compressRatio; otherwise proceed to initialize and publish the aggregate as before (ensure you use the same compressTp/compressRatio symbols and release the mutex after).

coderabbitai

♻️ Duplicate comments (3)

pkg/executor/importer/import.go (3)
1231-1233: ⚠️ Potential issue | 🟠 Major

Keep parquet expansion sampling best-effort.

estimateCompressionRatio now returns the SampleParquetRowSize error, and the once.Do path propagates that out of InitDataFiles. One unreadable/corrupt sampled parquet file now aborts spec generation even though this value is only used to estimate RealSize. Falling back to 1.0/file size with a warning would preserve the old behavior.
🐛 Proposed fix
 rows, rowSize, err := mydump.SampleParquetRowSize(ctx, fileMeta, store)
 if err != nil {
-	return 1.0, err
+	logutil.Logger(ctx).Warn("fail to sample parquet statistics, fallback to file size",
+		zap.String("path", filePath),
+		zap.Error(err),
+	)
+	return 1.0, nil
 }
Also applies to: 1457-1464
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/importer/import.go` around lines 1231 - 1233, The parquet
sampling call in estimateCompressionRatio (calling mydump.SampleParquetRowSize)
currently returns its error and causes the once.Do path in InitDataFiles to
abort spec generation; change estimateCompressionRatio to treat
SampleParquetRowSize failures as non-fatal: catch the error, log a warning
including fileMeta/err, and fall back to using compression ratio = 1.0 (or file
size-based RealSize) instead of returning the error so InitDataFiles/once.Do
won't propagate the failure; update both call sites (around the rows,rowSize
assignment and the duplicate block at the other location) to preserve the old
best-effort behavior.
1295-1314: ⚠️ Potential issue | 🟡 Minor

Return the cached aggregate after the locked re-check.

If another worker stores r.ratio[compressTp] between the fast-path Load and taking mu, Lines 1312-1314 still return this file's sampled ratio instead of the published harmonic mean. That leaves a small tail of files with mixed estimates.
🐛 Proposed fix
-	if _, ok := r.ratio.Load(compressTp); ok {
-		return compressRatio
+	if v, ok := r.ratio.Load(compressTp); ok {
+		return v.(float64)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/importer/import.go` around lines 1295 - 1314, The code does a
second r.ratio.Load(compressTp) check under r.mu but still returns the local
compressRatio; change the locked re-check in the function handling file sampling
so that if r.ratio.Load(compressTp) is present after acquiring r.mu it returns
the cached aggregate (the value stored in r.ratio for compressTp) instead of
returning the just-sampled compressRatio; locate symbols r.ratio, compressTp,
r.mu, and compressRatio in the import.go sampling/ratio logic and update the
control flow to return the stored value when present, otherwise continue to
store/use compressRatio.
1404-1413: ⚠️ Potential issue | 🟠 Major

Use the same RealSize calculation for exact-path imports.

The glob branch now applies ce.estimate(...) * sizeExpansionRatio, but the exact-path branch still calls mydump.EstimateRealSizeForFile at Line 1413. IMPORT INTO '/a.parquet' and IMPORT INTO '/a*.parquet' can therefore derive different RealSize and chunk sizing for the same source. Please factor the new logic into a shared helper and call it from both branches.

Also applies to: 1452-1474
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/importer/import.go` around lines 1404 - 1413, The exact-path
branch is still calling mydump.EstimateRealSizeForFile while the glob branch
uses the new ce.estimate(...) * sizeExpansionRatio logic, causing inconsistent
RealSize and chunking; extract the new RealSize computation into a shared helper
(e.g., computeRealSize(ctx, engineContext, fileMeta, sizeExpansionRatio, s))
that encapsulates the ce.estimate(...) * sizeExpansionRatio fallback to
mydump.EstimateRealSizeForFile, then replace the direct call to
mydump.EstimateRealSizeForFile in the exact-path code that sets
fileMeta.RealSize (after
detectAndUpdateFormat/getSourceType/ParseCompressionOnFileExtension) with a call
to this helper, and make the same replacement in the other affected block
(around the 1452-1474 region) so both glob and exact-path use the identical
RealSize logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/executor/importer/import.go`:
- Around line 1231-1233: The parquet sampling call in estimateCompressionRatio
(calling mydump.SampleParquetRowSize) currently returns its error and causes the
once.Do path in InitDataFiles to abort spec generation; change
estimateCompressionRatio to treat SampleParquetRowSize failures as non-fatal:
catch the error, log a warning including fileMeta/err, and fall back to using
compression ratio = 1.0 (or file size-based RealSize) instead of returning the
error so InitDataFiles/once.Do won't propagate the failure; update both call
sites (around the rows,rowSize assignment and the duplicate block at the other
location) to preserve the old best-effort behavior.
- Around line 1295-1314: The code does a second r.ratio.Load(compressTp) check
under r.mu but still returns the local compressRatio; change the locked re-check
in the function handling file sampling so that if r.ratio.Load(compressTp) is
present after acquiring r.mu it returns the cached aggregate (the value stored
in r.ratio for compressTp) instead of returning the just-sampled compressRatio;
locate symbols r.ratio, compressTp, r.mu, and compressRatio in the import.go
sampling/ratio logic and update the control flow to return the stored value when
present, otherwise continue to store/use compressRatio.
- Around line 1404-1413: The exact-path branch is still calling
mydump.EstimateRealSizeForFile while the glob branch uses the new
ce.estimate(...) * sizeExpansionRatio logic, causing inconsistent RealSize and
chunking; extract the new RealSize computation into a shared helper (e.g.,
computeRealSize(ctx, engineContext, fileMeta, sizeExpansionRatio, s)) that
encapsulates the ce.estimate(...) * sizeExpansionRatio fallback to
mydump.EstimateRealSizeForFile, then replace the direct call to
mydump.EstimateRealSizeForFile in the exact-path code that sets
fileMeta.RealSize (after
detectAndUpdateFormat/getSourceType/ParseCompressionOnFileExtension) with a call
to this helper, and make the same replacement in the other affected block
(around the 1452-1474 region) so both glob and exact-path use the identical
RealSize logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2f2788ff-222b-4b12-b814-2199797e5e27

📥 Commits

Reviewing files that changed from the base of the PR and between d833dd1 and 47019ae.

📒 Files selected for processing (2)

pkg/executor/importer/import.go
pkg/executor/importer/import_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/executor/importer/import_test.go

joechenrh · 2026-04-09T11:41:37Z

/retest

Benjamin2037

LGTM

ti-chi-bot · 2026-04-09T12:53:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Benjamin2037, joechenrh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/executor/importer/OWNERS~~ [Benjamin2037]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

joechenrh · 2026-04-09T12:53:38Z

/retest

ti-chi-bot · 2026-04-09T12:53:42Z

[LGTM Timeline notifier]

Timeline:

2026-04-09 12:46:59.503569373 +0000 UTC m=+1046824.708929420: ☑️ agreed by Benjamin2037.
2026-04-09 12:53:40.860382734 +0000 UTC m=+1047226.065742791: ☑️ agreed by joechenrh.

joechenrh · 2026-04-09T14:42:36Z

/retest

joechenrh · 2026-04-10T01:02:25Z

/retest

joechenrh · 2026-04-10T02:17:36Z

/retest

joechenrh · 2026-04-10T02:43:29Z

/retest

codecov · 2026-04-10T03:34:03Z

Codecov Report

❌ Patch coverage is 60.20408% with 39 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-nextgen-20251011@a8ad696). Learn more about missing BASE report.

Additional details and impacted files

@@                      Coverage Diff                      @@
##             release-nextgen-20251011     #67654   +/-   ##
=============================================================
  Coverage                            ?   71.8595%           
=============================================================
  Files                               ?       1833           
  Lines                               ?     493020           
  Branches                            ?          0           
=============================================================
  Hits                                ?     354282           
  Misses                              ?     115390           
  Partials                            ?      23348

Flag	Coverage Δ
unit	`71.8595% <60.2040%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`56.3493% <0.0000%> (?)`
parser	`∅ <0.0000%> (?)`
br	`46.5631% <0.0000%> (?)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

joechenrh · 2026-04-10T03:37:47Z

/retest

dillon-zheng · 2026-04-10T03:45:51Z

/retest

joechenrh · 2026-04-10T04:30:48Z

/retest

This is an automated cherry-pick of pingcap#64769

d833dd1

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot mentioned this pull request Apr 9, 2026

importer: sample a portion of compressed files to speed up import spec generation #64769

Merged

13 tasks

ti-chi-bot bot added the do-not-merge/cherry-pick-not-approved label Apr 9, 2026

ti-chi-bot assigned D3Hunter Apr 9, 2026

ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 9, 2026

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

importer: resolve cherry-pick conflicts for pingcap#64769

47019ae

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

Benjamin2037 approved these changes Apr 9, 2026

View reviewed changes

ti-chi-bot bot added cherry-pick-approved Cherry pick PR approved by release team. approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. and removed do-not-merge/cherry-pick-not-approved labels Apr 9, 2026

joechenrh approved these changes Apr 9, 2026

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 9, 2026

ti-chi-bot bot merged commit 133f195 into pingcap:release-nextgen-20251011 Apr 10, 2026
18 checks passed

ti-chi-bot bot deleted the cherry-pick-64769-to-release-nextgen-20251011 branch April 10, 2026 04:55

Conversation

ti-chi-bot commented Apr 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot commented Apr 9, 2026

Uh oh!

ti-chi-bot bot commented Apr 9, 2026

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

joechenrh commented Apr 9, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

joechenrh commented Apr 9, 2026

Uh oh!

Benjamin2037 left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Apr 9, 2026

Uh oh!

joechenrh commented Apr 9, 2026

Uh oh!

ti-chi-bot bot commented Apr 9, 2026

[LGTM Timeline notifier]

Uh oh!

joechenrh commented Apr 9, 2026

Uh oh!

joechenrh commented Apr 10, 2026

Uh oh!

joechenrh commented Apr 10, 2026

Uh oh!

joechenrh commented Apr 10, 2026

Uh oh!

codecov bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

joechenrh commented Apr 10, 2026

Uh oh!

dillon-zheng commented Apr 10, 2026

Uh oh!

joechenrh commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ti-chi-bot commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 9, 2026 •

edited

Loading

codecov bot commented Apr 10, 2026 •

edited

Loading