Skip to content

importsdk, importer: fix sampled source size in import estimate#67492

Merged
ti-chi-bot[bot] merged 4 commits intopingcap:masterfrom
GMHDBJD:fix/importsdk-estimate-source-size-20260401
Apr 2, 2026
Merged

importsdk, importer: fix sampled source size in import estimate#67492
ti-chi-bot[bot] merged 4 commits intopingcap:masterfrom
GMHDBJD:fix/importsdk-estimate-source-size-20260401

Conversation

@GMHDBJD
Copy link
Copy Markdown
Collaborator

@GMHDBJD GMHDBJD commented Apr 1, 2026

What problem does this PR solve?

Issue Number: ref #67240

Problem Summary:

EstimateImportDataSize sampled source bytes from buffered reader progress for SQL/CSV files. That inflated the sampled source-size denominator and could collapse the final totalTiKVSize estimate to an unrealistically small value. For s3://tidbcloud-samples/sp500insight_new/, the estimate was totalSourceSize=211686583 but totalTiKVSize=147600.

What changed and how does it work?

  • Use parser-consumed Pos() deltas as sampled source bytes for SQL/CSV files instead of buffered ScannedPos() progress.
  • Keep parquet on a Row.Length fallback because parquet Pos() is row-count based.
  • Add a regression test covering short-row SQL input so sampled source size stays close to the actual SQL content size instead of buffered read-ahead.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Manual test details:

  • Run go test ./pkg/executor/importer -run 'TestSampleIndexSizeRatio' -tags=intest,deadlock
  • Run go test ./pkg/importsdk -run 'TestFileScanner/EstimateImportDataSize' -tags=intest,deadlock
  • Run make lint
  • Re-run the estimator against s3://tidbcloud-samples/sp500insight_new/
  • Confirm the estimate changes from totalTiKVSize=147600 to totalTiKVSize=348537047

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Bug Fixes

    • Improved data import sampling to more accurately measure source byte sizes across formats, reducing size-estimation errors during import.
    • Better handling of end-of-file and read errors to avoid incorrect size accounting.
  • New Features

    • Improved schema detection for imports with multiple SQL statements so the correct CREATE TABLE is selected for estimates.
  • Tests

    • Added tests validating size measurements for SQL imports and multi-statement schema scenarios.

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 1, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Apr 1, 2026

@GMHDBJD I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 1, 2026
@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 1, 2026

Hi @GMHDBJD. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

kvSizeSampler.sampleOneFile now iterates rows via parser.ReadRow()/parser.LastRow() and reuses rows with parser.RecycleRow(). Per-row source size is computed by a new sampledRowSourceSize helper (format-aware). EOF/offset termination and error-wrapping use the current start position. Added tests validating SQL-import source-size measurement and updated SQL schema parsing to select the appropriate CREATE TABLE among multiple statements.

Changes

Cohort / File(s) Summary
Sampler Logic
pkg/executor/importer/sampler.go
Drive sampling by parser.ReadRow()/parser.LastRow(); add sampledRowSourceSize for format-aware per-row byte accounting; use parser.RecycleRow(); adjust EOF/offset termination and wrap errors with current startPos.
Sampler Tests
pkg/executor/importer/sampler_test.go
Add subtest to TestSampleIndexSizeRatio that writes a temporary SQL file with INSERT statements, runs SampleFileImportKVSize for SQL import, and asserts SourceSize and TotalKVSize() are positive and within bounds relative to on-disk file length.
File scanner logic
pkg/importsdk/file_scanner.go
Change schema parsing to parse all SQL statements (ParseSQL), add buildEstimateCreateTableStmt and estimateCreateTableStmtMatchesMeta to choose the matching CREATE TABLE when multiple statements exist.
File scanner tests
pkg/importsdk/file_scanner_test.go
Add subtest EstimateImportDataSizeMultiStatementSchema with multi-statement .sql (CREATE DATABASE/USE/DROP/CREATE TABLE) plus CSV data; assert a single table estimate is returned and sizes are positive and consistent.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

component/import, size/XXL, ok-to-test

Suggested reviewers

  • joechenrh
  • OliverS929

Poem

🐇
I nibble bytes from row to row,
I read, recycle, then I go.
A hop, a count, a tiny cheer—
Sampled sizes, loud and clear. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main fix: correcting how sampled source size is computed during import estimation for SQL/CSV files.
Description check ✅ Passed The description includes the required issue reference (ref #67240), a clear problem statement with specific example impact, detailed explanation of changes, test verification steps, and properly marked checklist items.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/executor/importer/sampler_test.go`:
- Around line 242-290: The test
"sql_source_size_uses_consumed_bytes_not_buffered_progress" currently has only
20 rows so SampleFileImportKVSize reads to EOF; update the fixture so the
sampler only reads a prefix by either (a) increasing the number of inserted rows
to exceed the sampler's rowsPerFile threshold (e.g. >30 short rows) or (b)
temporarily reducing the sampler limit such as maxSampleFileSize before calling
SampleFileImportKVSize, then keep the assertions but change the expectation to
ensure sampled.SourceSize reflects the consumed prefix (not full file) — target
symbols: the test function name string, the created content builder,
NewLoadDataController/Plan usage, and the SampleFileImportKVSize call to force a
partial sample.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c006d2c9-e94c-4023-b132-79c0413df7d5

📥 Commits

Reviewing files that changed from the base of the PR and between dcf6f03 and feed1c7.

📒 Files selected for processing (2)
  • pkg/executor/importer/sampler.go
  • pkg/executor/importer/sampler_test.go

Comment thread pkg/executor/importer/sampler_test.go
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 70.00000% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.5790%. Comparing base (8412422) to head (b62a461).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67492        +/-   ##
================================================
- Coverage   77.7173%   77.5790%   -0.1383%     
================================================
  Files          1959       1943        -16     
  Lines        543377     543573       +196     
================================================
- Hits         422298     421699       -599     
- Misses       120238     121872      +1634     
+ Partials        841          2       -839     
Flag Coverage Δ
integration 40.9992% <0.0000%> (+4.8244%) ⬆️
unit 76.7519% <70.0000%> (+0.4088%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <ø> (ø)
parser ∅ <ø> (∅)
br 48.9178% <ø> (-12.0623%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 1, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 1, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread pkg/executor/importer/sampler.go Outdated
Comment on lines +389 to +391
// Sampling needs per-row source bytes, not buffered reader progress.
// SQL/CSV parsers expose byte offsets through Pos(), while parquet Pos()
// is row-count based and must fall back to the row-size estimate.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also overestimate source size for compressed file. Do we need to add comment for this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 1, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 1, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Total findings: 1
  • Inline comments: 1
  • Summary-only findings (no inline anchor): 0
Findings (highest risk first)

🟡 [Minor] (1)

  1. Parquet and fallback source-size branches lack targeted regression coverage (pkg/executor/importer/sampler.go:388; pkg/executor/importer/sampler_test.go:238)

return sourceSize, dataKVSize, indexKVSize, nil
}

func (s *kvSizeSampler) sampledRowSourceSize(parser mydump.Parser, startPos int64, row mydump.Row) int64 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] Parquet and fallback source-size branches lack targeted regression coverage

Why
The patch introduces format-specific source-size logic in sampledRowSourceSize, but the new test only validates the SQL consumed-bytes path and does not exercise the newly added Parquet or non-positive delta fallback branches.

Scope
pkg/executor/importer/sampler.go:388; pkg/executor/importer/sampler_test.go:238

Risk if unchanged
Future parser position behavior changes can silently skew sampled source-size estimation for Parquet or edge parser offsets, which may mis-tune IMPORT resource planning without an obvious failure signal.

Evidence
sampledRowSourceSize adds if s.cfg.Format == DataFormatParquet { return int64(row.Length) } and if rowDelta := endPos - startPos; rowDelta > 0 { ... } fallback logic, while the added test sql_source_size_uses_consumed_bytes_not_buffered_progress covers only SQL input.

Change request
add UT for it: add a case for DataFormatParquet and a case for the non-parquet endPos <= startPos fallback path so each new sizing branch is pinned by deterministic assertions.

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 1, 2026
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 1, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 1, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-01 13:22:34.715598014 +0000 UTC m=+357759.920958071: ☑️ agreed by D3Hunter.
  • 2026-04-01 13:44:46.583825993 +0000 UTC m=+359091.789186050: ☑️ agreed by joechenrh.

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 1, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 1, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 1, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 1, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 1, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 1, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 2, 2026

//retest

@hawkingrei
Copy link
Copy Markdown
Member

/retest

1 similar comment
@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 2, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 2, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 2, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 2, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 2, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 2, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hawkingrei
Copy link
Copy Markdown
Member

/retest

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 2, 2026

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2026
@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 2, 2026

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2026
@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 2, 2026

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, joechenrh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 2, 2026

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/importsdk/file_scanner_test.go (1)

270-303: Add one more case for >1 CREATE TABLE selection logic.

This subtest is good, but it only covers the single-CREATE TABLE path. Please add a sibling case with multiple CREATE TABLE statements to directly validate table/schema matching branch behavior in buildEstimateCreateTableStmt.

As per coding guidelines "Prefer extending existing test suites and fixtures over creating new scaffolding."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/importsdk/file_scanner_test.go` around lines 270 - 303, Add a sibling
subtest to EstimateImportDataSizeMultiStatementSchema that exercises the
multi-CREATE TABLE branch in buildEstimateCreateTableStmt: create files under a
new temp dir including a schema file containing multiple CREATE TABLE statements
(e.g., CREATE TABLE users... and CREATE TABLE orders...), associated CSVs for
each table, then instantiate NewFileScanner with defaultSDKConfig
(skipInvalidFiles=true) and call EstimateImportDataSize; assert that
estimate.Tables contains entries for both "users" and "orders" with positive
SourceSize/TiKVSize and that TotalSourceSize/TotalTiKVSize equal the sum of the
tables. Reference the existing subtest, NewFileScanner, EstimateImportDataSize,
and buildEstimateCreateTableStmt when adding the case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/importsdk/file_scanner.go`:
- Around line 444-451: The matching logic in estimateCreateTableStmtMatchesMeta
incorrectly rejects matches when tblMeta.DB is empty but createStmt.Table.Schema
is non-empty; update estimateCreateTableStmtMatchesMeta so that after verifying
the table name (createStmt.Table.Name) it treats an empty tblMeta.DB as a
wildcard and returns true (i.e., if tblMeta.DB == "" return true) before
comparing createStmt.Table.Schema.String() to tblMeta.DB; keep the existing
behavior for empty createStmt.Table.Schema.String().

---

Nitpick comments:
In `@pkg/importsdk/file_scanner_test.go`:
- Around line 270-303: Add a sibling subtest to
EstimateImportDataSizeMultiStatementSchema that exercises the multi-CREATE TABLE
branch in buildEstimateCreateTableStmt: create files under a new temp dir
including a schema file containing multiple CREATE TABLE statements (e.g.,
CREATE TABLE users... and CREATE TABLE orders...), associated CSVs for each
table, then instantiate NewFileScanner with defaultSDKConfig
(skipInvalidFiles=true) and call EstimateImportDataSize; assert that
estimate.Tables contains entries for both "users" and "orders" with positive
SourceSize/TiKVSize and that TotalSourceSize/TotalTiKVSize equal the sum of the
tables. Reference the existing subtest, NewFileScanner, EstimateImportDataSize,
and buildEstimateCreateTableStmt when adding the case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 939e3083-5fcc-43ee-b447-3da77be33169

📥 Commits

Reviewing files that changed from the base of the PR and between ce12082 and b62a461.

📒 Files selected for processing (2)
  • pkg/importsdk/file_scanner.go
  • pkg/importsdk/file_scanner_test.go

Comment on lines +444 to +451
func estimateCreateTableStmtMatchesMeta(createStmt *ast.CreateTableStmt, tblMeta *mydump.MDTableMeta) bool {
if !strings.EqualFold(createStmt.Table.Name.String(), tblMeta.Name) {
return false
}
if createStmt.Table.Schema.String() == "" {
return true
}
return strings.EqualFold(createStmt.Table.Schema.String(), tblMeta.DB)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle empty tblMeta.DB when matching schema-qualified CREATE TABLE statements.

If tblMeta.DB is empty, current logic rejects otherwise valid matches (CREATE TABLE db.tbl ...) and can fail estimation when multiple CREATE TABLE statements exist.

Suggested fix
 func estimateCreateTableStmtMatchesMeta(createStmt *ast.CreateTableStmt, tblMeta *mydump.MDTableMeta) bool {
 	if !strings.EqualFold(createStmt.Table.Name.String(), tblMeta.Name) {
 		return false
 	}
-	if createStmt.Table.Schema.String() == "" {
+	if createStmt.Table.Schema.String() == "" || tblMeta.DB == "" {
 		return true
 	}
 	return strings.EqualFold(createStmt.Table.Schema.String(), tblMeta.DB)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func estimateCreateTableStmtMatchesMeta(createStmt *ast.CreateTableStmt, tblMeta *mydump.MDTableMeta) bool {
if !strings.EqualFold(createStmt.Table.Name.String(), tblMeta.Name) {
return false
}
if createStmt.Table.Schema.String() == "" {
return true
}
return strings.EqualFold(createStmt.Table.Schema.String(), tblMeta.DB)
func estimateCreateTableStmtMatchesMeta(createStmt *ast.CreateTableStmt, tblMeta *mydump.MDTableMeta) bool {
if !strings.EqualFold(createStmt.Table.Name.String(), tblMeta.Name) {
return false
}
if createStmt.Table.Schema.String() == "" || tblMeta.DB == "" {
return true
}
return strings.EqualFold(createStmt.Table.Schema.String(), tblMeta.DB)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/importsdk/file_scanner.go` around lines 444 - 451, The matching logic in
estimateCreateTableStmtMatchesMeta incorrectly rejects matches when tblMeta.DB
is empty but createStmt.Table.Schema is non-empty; update
estimateCreateTableStmtMatchesMeta so that after verifying the table name
(createStmt.Table.Name) it treats an empty tblMeta.DB as a wildcard and returns
true (i.e., if tblMeta.DB == "" return true) before comparing
createStmt.Table.Schema.String() to tblMeta.DB; keep the existing behavior for
empty createStmt.Table.Schema.String().

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Apr 2, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 2, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hawkingrei
Copy link
Copy Markdown
Member

/retest

1 similar comment
@hawkingrei
Copy link
Copy Markdown
Member

/retest

@ti-chi-bot ti-chi-bot bot merged commit 2098e75 into pingcap:master Apr 2, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants