Skip to content

importinto: ensure engine cleanup on import error#65663

Merged
ti-chi-bot[bot] merged 3 commits intopingcap:masterfrom
D3Hunter:fix-dir-locked
Jan 20, 2026
Merged

importinto: ensure engine cleanup on import error#65663
ti-chi-bot[bot] merged 3 commits intopingcap:masterfrom
D3Hunter:fix-dir-locked

Conversation

@D3Hunter
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #65645

Problem Summary:

What changed and how does it work?

as title

Why it happens:

  • RunSubtask opens both data + index engines and doesn’t defer cleanup on error. If anything fails before
    onFinished, both engines can stay open. See pkg/dxf/importinto/task_executor.go.
  • In onFinished, the data engine is imported/cleaned first. If that import fails (PD “not leader” /
    timeouts), the function returns immediately and skips closing/cleaning the index engine. See pkg/dxf/
    importinto/task_executor.go.
  • The log’s stack trace is from TableImporter.OpenIndexEngine → pebble.LockDirectory, i.e., it’s the
    index engine reopening and hitting a lock that’s still held.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

without this pr, the case failes, after, it success

        	Error:      	Received unexpected error:
        	            	[0]lock held by current process
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copilot AI review requested due to automatic review settings January 20, 2026 05:36
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 20, 2026
@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 20, 2026

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

// finishWrite flushes all data and waits for the background routine to finish.
// after calling this method, the engine cannot be written anymore. the underlying
// pebble DB is still kept open for reading.
func (e *Engine) finishWrite(ctx context.Context) error {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split from closeEngine

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a resource leak where local engines (data and index engines) were not being properly cleaned up when import operations failed. The fix ensures that both engines are cleaned up even if the data engine import fails.

Changes:

  • Added error-path cleanup in RunSubtask via a defer to close all local engines when local sort fails
  • Refactored engine closing logic by extracting finishWrite method for reuse
  • Added CleanupAllLocalEngines methods to close all engines during error recovery
  • Added integration test to verify engines are properly cleaned up and can be reopened on retry

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/realtikvtest/importintotest3/cleanup_on_error_test.go New integration test that verifies engine cleanup on data engine import error with retry
tests/realtikvtest/importintotest3/BUILD.bazel Added test file and driver error dependency to build configuration
pkg/dxf/importinto/task_executor.go Added defer cleanup logic to handle engine cleanup on error path
pkg/executor/importer/table_import.go Added failpoint for simulating data engine import errors in tests
pkg/lightning/backend/local/local.go Added public CleanupAllLocalEngines method to backend interface
pkg/lightning/backend/local/engine_mgr.go Refactored closeEngine to use new finishWrite method and added cleanupAllLocalEngines implementation
pkg/lightning/backend/local/engine.go Extracted finishWrite method for reusable engine write finalization logic

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 20, 2026
@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 20, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: GMHDBJD, joechenrh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 20, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 20, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-20 05:55:47.74716596 +0000 UTC m=+473375.361122816: ☑️ agreed by joechenrh.
  • 2026-01-20 06:10:33.312752866 +0000 UTC m=+474260.926709722: ☑️ agreed by GMHDBJD.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 38.00000% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.3242%. Comparing base (a6fc928) to head (e21bff5).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #65663        +/-   ##
================================================
+ Coverage   77.8491%   78.3242%   +0.4750%     
================================================
  Files          1983       1913        -70     
  Lines        542774     531252     -11522     
================================================
- Hits         422545     416099      -6446     
+ Misses       118569     114709      -3860     
+ Partials       1660        444      -1216     
Flag Coverage Δ
integration 44.2764% <28.5714%> (-3.9113%) ⬇️
unit 76.7485% <38.0000%> (+0.2708%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 48.7476% <ø> (-12.2966%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@ti-chi-bot ti-chi-bot bot merged commit 8e961f7 into pingcap:master Jan 20, 2026
31 of 32 checks passed
@D3Hunter D3Hunter deleted the fix-dir-locked branch January 20, 2026 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

import into failed with error "lock held by current process" after injection pd leader network partition

4 participants