Skip to content

lightning: enhance import into backend for dm#65473

Merged
ti-chi-bot[bot] merged 13 commits intopingcap:masterfrom
GMHDBJD:fixImportInto
Jan 20, 2026
Merged

lightning: enhance import into backend for dm#65473
ti-chi-bot[bot] merged 13 commits intopingcap:masterfrom
GMHDBJD:fixImportInto

Conversation

@GMHDBJD
Copy link
Copy Markdown
Collaborator

@GMHDBJD GMHDBJD commented Jan 8, 2026

What problem does this PR solve?

Issue Number: ref #65092

Problem Summary:

What changed and how does it work?

  • Updated JobOrchestrator to handle job submission errors gracefully, ensuring submitted jobs are cancelled if submission fails.
  • Introduced job progress estimator to calculate job progress based on processed and total sizes.
  • Implemented logging redaction for sensitive information in SQL commands during job submission.
  • fix mysql checkpoint bug.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 8, 2026
@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 8, 2026

Hi @GMHDBJD. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 76.39257% with 89 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.3966%. Comparing base (7f129e5) to head (433dd1a).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #65473        +/-   ##
================================================
+ Coverage   77.8399%   78.3966%   +0.5567%     
================================================
  Files          1983       1914        -69     
  Lines        542760     532288     -10472     
================================================
- Hits         422484     417296      -5188     
+ Misses       118616     114556      -4060     
+ Partials       1660        436      -1224     
Flag Coverage Δ
integration 44.3453% <41.7867%> (-3.8424%) ⬇️
unit 76.8102% <73.4748%> (+0.3463%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 48.7502% <ø> (-12.2964%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 9, 2026
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 9, 2026
Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 9, 2026

/hold Progress regression detected.

@ti-chi-bot ti-chi-bot bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 9, 2026
…riod and progress tracking

- Added cancellation grace period and polling interval for job cancellation by group key.
- Updated JobOrchestrator to handle job submission errors gracefully, ensuring submitted jobs are cancelled if submission fails.
- Introduced job progress estimator to calculate job progress based on processed and total sizes.
- Implemented logging redaction for sensitive information in SQL commands during job submission.
- Added tests for job progress estimation, job submission error handling, and job cancellation scenarios.
- Created a new ProgressUpdater interface to facilitate progress updates during job execution.
}

if err := o.recordSubmission(egCtx, job); err != nil {
cpCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), cancelTimeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between WithoutCancel and background ctx?

and why we have to add a timeout for recording submission?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithoutCancel will retain context values like logging and tracing, change to background to avoid confuse.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout is to avoid the import hang due to network/db/lock, so I add a timeout here.

return &jobProgressEstimator{logger: logger}
}

func (e *jobProgressEstimator) parseHumanSize(jobID int64, sizeText string, warnMsg string) (int64, bool) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can avoid those pause if we return the number directly in SHOW RAW IMPORT JOB as a JSON field
we can enhance this later

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the import-into backend for DM (Data Migration) by adding robust job management capabilities. It introduces progress tracking, secure logging with credential redaction, graceful error handling during job submission, and failover-aware cancellation support.

Changes:

  • Added job progress estimator to calculate and report import progress based on job phases and step completion
  • Implemented SQL redaction to hide sensitive credentials (access keys, secret keys) in logs
  • Enhanced job orchestrator with graceful cancellation including grace period and polling for late-appearing jobs
  • Added failover-aware cancellation that preserves running jobs during DM worker transitions
  • Fixed MySQL checkpoint schema bug by removing redundant db_name column and using composite table_name

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
lightning/pkg/importinto/job_progress.go Implements progress estimation logic for global-sort and non-global-sort import phases
lightning/pkg/importinto/job_progress_test.go Tests for progress calculation with various job states and phase transitions
lightning/pkg/server/lightning.go Integrates progress adapter to bridge LightningStatus with ProgressUpdater interface
lightning/pkg/importinto/job_submitter.go Adds SQL redaction using regex to mask credentials in cloud storage URLs
lightning/pkg/importinto/job_submitter_test.go Validates that sensitive credentials are properly redacted in logs
lightning/pkg/importinto/job_orchestrator.go Enhances error handling with automatic cancellation of submitted jobs on failure and adds grace period polling
lightning/pkg/importinto/job_orchestrator_test.go Tests cancellation scenarios including late-appearing jobs and submission errors
lightning/pkg/importinto/job_monitor.go Integrates progress estimator and removes inline cancellation logic (delegated to orchestrator)
lightning/pkg/importinto/job_monitor_test.go Updates tests to verify progress tracking and non-rollback behavior
lightning/pkg/importinto/importer.go Adds failover cancellation support via ErrFailoverCancel to preserve jobs during transitions
lightning/pkg/importinto/importer_test.go Tests both user-initiated and failover-triggered cancellation paths
lightning/pkg/importinto/checkpoint.go Fixes schema by removing db_name column and making table_name the sole primary key
lightning/pkg/importinto/BUILD.bazel Adds new dependencies and increases test shard count for new test files
lightning/pkg/importinto/mock/import_mock.go Adds MockProgressUpdater for testing progress tracking
Makefile Updates mock generation command to include ProgressUpdater interface

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Comment on lines 391 to 399
createTableSQL := fmt.Sprintf(`CREATE TABLE IF NOT EXISTS %s.%s (
db_name VARCHAR(64) NOT NULL,
table_name VARCHAR(64) NOT NULL,
table_name VARCHAR(256) NOT NULL,
job_id BIGINT NOT NULL,
status TINYINT NOT NULL,
message TEXT,
group_key VARCHAR(128),
update_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (db_name, table_name)
PRIMARY KEY (table_name)
)`, common.EscapeIdentifier(m.schemaName), common.EscapeIdentifier(m.tableName))
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checkpoint table schema has been changed from having separate db_name and table_name columns with a composite primary key to a single table_name column. However, the CREATE TABLE IF NOT EXISTS statement won't alter existing tables. If users upgrade from a previous version with the old checkpoint schema, they'll encounter issues because the code expects the new schema but the existing table has the old schema. Consider adding migration logic or documenting this as a breaking change that requires manual checkpoint table recreation.

Copilot uses AI. Check for mistakes.
Replace local clamp01 helper with mathutil.Clamp.
@ti-chi-bot ti-chi-bot bot added lgtm approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 19, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 19, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-09 02:40:51.051519219 +0000 UTC m=+929206.869827651: ☑️ agreed by joechenrh.
  • 2026-01-19 09:57:10.940084696 +0000 UTC m=+401458.554041542: ☑️ agreed by D3Hunter.

Copilot AI review requested due to automatic review settings January 19, 2026 10:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

createTableSQL := fmt.Sprintf(`CREATE TABLE IF NOT EXISTS %s.%s (
db_name VARCHAR(64) NOT NULL,
table_name VARCHAR(64) NOT NULL,
table_name VARCHAR(256) NOT NULL,
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table_name column is being changed from VARCHAR(64) to VARCHAR(256), which may not be sufficient for all cases. The UniqueTable function formats table names as backtick-quoted identifiers in the form `schema`.`table`.

In MySQL, identifiers can be up to 64 characters each, so a fully qualified table name with backticks would be: `{64 chars}`.`{64 chars}` = 64 + 1 + 64 + 4 backticks = 133 characters minimum. However, backticks within identifiers are escaped as double backticks, so in the worst case (all 64 characters are backticks), you could have: `{128 chars}`.`{128 chars}` = 128 + 1 + 128 + 4 = 261 characters, which exceeds VARCHAR(256).

Consider increasing the column size to VARCHAR(300) or VARCHAR(512) to safely accommodate escaped identifiers.

Suggested change
table_name VARCHAR(256) NOT NULL,
table_name VARCHAR(512) NOT NULL,

Copilot uses AI. Check for mistakes.
@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 19, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 19, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Collaborator

@Benjamin2037 Benjamin2037 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Benjamin2037, D3Hunter, joechenrh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Benjamin2037
Copy link
Copy Markdown
Collaborator

/retest

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 20, 2026

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 20, 2026
@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 20, 2026

/retest

5 similar comments
@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 20, 2026

/retest

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 20, 2026

/retest

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 20, 2026

/retest

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 20, 2026

/retest

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 20, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 20, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 20, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 20, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@GMHDBJD
Copy link
Copy Markdown
Collaborator Author

GMHDBJD commented Jan 20, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 20, 2026

@GMHDBJD: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot merged commit 6dd3a58 into pingcap:master Jan 20, 2026
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants