pkg/importsdk: fix wildcard generation for subdir csv files#67472
pkg/importsdk: fix wildcard generation for subdir csv files#67472ti-chi-bot[bot] merged 2 commits intopingcap:masterfrom
Conversation
|
@GMHDBJD I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
Hi @GMHDBJD. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
📝 WalkthroughWalkthroughThe change enhances wildcard pattern generation to handle CSV files stored in subdirectories. The implementation splits file paths by "/" and generates per-segment prefix/suffix patterns when all paths have the same number of components, enabling patterns like Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/importsdk/pattern.go (1)
185-189: Soften the contract in this comment.
generatePrefixSuffixPatternstill returns a candidate that needsisValidPatternvalidation. In mixed-depth cases it can intentionally fall back to a flat pattern that does not satisfy “all and only” on its own, so the current wording overstates the guarantee.✏️ Suggested wording
-// generatePrefixSuffixPattern returns a wildcard pattern that matches all and only the given paths. +// generatePrefixSuffixPattern builds a candidate wildcard pattern for the given paths. +// The caller must still validate that it matches all and only the intended files. // When all paths have the same number of '/'-separated components, it generates the wildcard // component by component so every '*' stays within a single path segment, which is required by // filepath.Match.As per coding guidelines, "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs, and SHOULD NOT restate what the code already makes clear."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/importsdk/pattern.go` around lines 185 - 189, Update the doc comment for generatePrefixSuffixPattern to soften its contract: state that it produces a candidate wildcard pattern which may require validation via isValidPattern rather than guaranteeing it matches “all and only” the given paths; explicitly mention that in mixed-depth cases the function intentionally falls back to a flat pattern that might not satisfy the stricter guarantee and therefore must be checked with isValidPattern, and keep the rest of the implementation and behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/importsdk/pattern.go`:
- Around line 185-189: Update the doc comment for generatePrefixSuffixPattern to
soften its contract: state that it produces a candidate wildcard pattern which
may require validation via isValidPattern rather than guaranteeing it matches
“all and only” the given paths; explicitly mention that in mixed-depth cases the
function intentionally falls back to a flat pattern that might not satisfy the
stricter guarantee and therefore must be checked with isValidPattern, and keep
the rest of the implementation and behavior unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 99badddb-1bdf-46db-abe2-cea14d9f9769
📒 Files selected for processing (2)
pkg/importsdk/pattern.gopkg/importsdk/pattern_test.go
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #67472 +/- ##
================================================
- Coverage 77.7173% 77.6075% -0.1098%
================================================
Files 1959 1943 -16
Lines 543377 543523 +146
================================================
- Hits 422298 421815 -483
- Misses 120238 121706 +1468
+ Partials 841 2 -839
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
🔍 Starting code review for this PR... |
|
🔍 New commits detected — starting re-review... |
1 similar comment
|
🔍 New commits detected — starting re-review... |
D3Hunter
left a comment
There was a problem hiding this comment.
Summary
- Total findings: 1
- Inline comments: 1
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
🟡 [Minor] (1)
- generatePrefixSuffixPattern comment overstates guarantee and hides required post-validation (
pkg/importsdk/pattern.go:185)
| return prefix + "*" + suffix | ||
| } | ||
|
|
||
| // generatePrefixSuffixPattern returns a wildcard pattern that matches all and only the given paths. |
There was a problem hiding this comment.
🟡 [Minor] generatePrefixSuffixPattern comment overstates guarantee and hides required post-validation
Why
The doc says the function returns a wildcard that matches all and only the given paths, but this helper can return an unchecked candidate and relies on caller-side isValidPattern for that exactness guarantee.
Scope
pkg/importsdk/pattern.go:185
Risk if unchanged
A future callsite can treat this helper as self-validating and skip isValidPattern, which risks producing patterns that do not match intended files or that match an unintended set.
Evidence
The mixed-depth fallback returns generateFlatPrefixSuffixPattern(paths) at pkg/importsdk/pattern.go:200-201, and generatePrefixSuffixPattern itself performs no validity check. Exactness is enforced only where callers run isValidPattern, such as pkg/importsdk/pattern.go:56.
Change request
Please add a short why-comment for this contract: describe this function as generating a candidate pattern, and state that callers must run isValidPattern before treating it as an all-and-only match.
joechenrh
left a comment
There was a problem hiding this comment.
Rest LGTM
(So the implicit assumption is that all file paths of the same table need to have the same directory depth, since both SDK and IMPORT INTO can't handle such case. 🤔
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: D3Hunter, joechenrh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@GMHDBJD: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
/retest |
5 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
@GMHDBJD: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@GMHDBJD: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
1 similar comment
|
/retest |
|
@GMHDBJD: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
1 similar comment
|
/retest |
What problem does this PR solve?
Issue Number: close #67471
Problem Summary:
CSV data files that belong to the same table can be stored under sibling subdirectories such as
dir/subdir1/*.csvanddir/subdir2/*.csv. The old generic wildcard fallback built one flatprefix*suffixpattern across the whole path, butfilepath.Matchdoes not allow*to match/, so TiDB could not derive a valid unique wildcard for these files.What changed and how does it work?
The generic prefix/suffix fallback is now directory-aware when all candidate paths have the same number of
/-separated components. It generates the wildcard component by component so every*stays within a single path segment. For the subdirectory CSV case above, TiDB now derivesdir/subdir*/*.csv, which can be validated correctly byfilepath.Match.This PR also adds regression coverage for both wildcard validation and wildcard generation.
Check List
Tests
Validation commands:
pushd pkg/importsdk >/dev/null && go test -run '^(TestValidatePattern|TestGenerateWildcardPath)$' -tags=intest,deadlock && popd >/dev/nullmake lintSide effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
Bug Fixes
Tests