-
Notifications
You must be signed in to change notification settings - Fork 6.2k
importsdk, importer: fix sampled source size in import estimate #67492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ti-chi-bot
merged 4 commits into
pingcap:master
from
GMHDBJD:fix/importsdk-estimate-source-size-20260401
Apr 2, 2026
+167
−20
Merged
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
feed1c7
importsdk, importer: fix sampled source size in import estimate
GMHDBJD ce12082
importer: clarify sampled source size comment
GMHDBJD a6bf749
Merge remote-tracking branch 'upstream/master' into pr-67492
GMHDBJD b62a461
importsdk: support multi-statement schema in size estimator
GMHDBJD File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 [Minor] Parquet and fallback source-size branches lack targeted regression coverage
Why
The patch introduces format-specific source-size logic in
sampledRowSourceSize, but the new test only validates the SQL consumed-bytes path and does not exercise the newly added Parquet or non-positive delta fallback branches.Scope
pkg/executor/importer/sampler.go:388; pkg/executor/importer/sampler_test.go:238
Risk if unchanged
Future parser position behavior changes can silently skew sampled source-size estimation for Parquet or edge parser offsets, which may mis-tune IMPORT resource planning without an obvious failure signal.
Evidence
sampledRowSourceSizeaddsif s.cfg.Format == DataFormatParquet { return int64(row.Length) }andif rowDelta := endPos - startPos; rowDelta > 0 { ... }fallback logic, while the added testsql_source_size_uses_consumed_bytes_not_buffered_progresscovers only SQL input.Change request
add UT for it: add a case for
DataFormatParquetand a case for the non-parquetendPos <= startPosfallback path so each new sizing branch is pinned by deterministic assertions.