Skip to content

br: load necessary db infos for backup and restore#64982

Merged
ti-chi-bot[bot] merged 22 commits intopingcap:masterfrom
YuJuncen:restore_skip_full_load
Jan 17, 2026
Merged

br: load necessary db infos for backup and restore#64982
ti-chi-bot[bot] merged 22 commits intopingcap:masterfrom
YuJuncen:restore_skip_full_load

Conversation

@YuJuncen
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #64833

Problem Summary:
See the issue.

What changed and how does it work?

Added a filter to loader, which allows the caller to load a subset of tables only.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Optimizated DDL performance during backing up / restoring.

3pointer and others added 13 commits August 29, 2025 13:50
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
…nter/63278

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 11, 2025
@tiprow
Copy link
Copy Markdown

tiprow bot commented Dec 11, 2025

Hi @YuJuncen. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 19, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 82.11382% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.8492%. Comparing base (01916ad) to head (a3c5b2d).
⚠️ Report is 110 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #64982        +/-   ##
================================================
+ Coverage   70.7690%   77.8492%   +7.0801%     
================================================
  Files          1902       2001        +99     
  Lines        519022     562563     +43541     
================================================
+ Hits         367307     437951     +70644     
+ Misses       127178     121568      -5610     
+ Partials      24537       3044     -21493     
Flag Coverage Δ
integration 48.1712% <58.8235%> (-0.0074%) ⬇️
unit 77.1519% <39.0243%> (+11.6078%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (+3.9273%) ⬆️
parser ∅ <ø> (∅)
br 58.3967% <79.7468%> (+0.0807%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@YuJuncen
Copy link
Copy Markdown
Contributor Author

/retest-required

@tiprow
Copy link
Copy Markdown

tiprow bot commented Dec 24, 2025

@YuJuncen: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

}

// IsBRRelatedDB checks whether dbOriginName is a temporary database created by BR.
func IsBRRelatedDB(dbOriginName string) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be moved to BR pkg

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of test cases sill relies on it. Moving it to br may cause cycle dependency. (Also it is hard to move those test cases to BR package...)

Comment on lines +62 to +64
if !isAllowed {
return false
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this want to impl "if any table ID is allowed, then we shouldn't skip"? if so, we shouldn't negate isAllowed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I forgot these codes. I have rewritten them.

…_skip_full_load

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
@pantheon-bot
Copy link
Copy Markdown

P0 Issue Found

BR can block cluster DDL (MDL-enabled) by skipping per-job schema-version updates

  • Impact: While br backup runs (or br restore with filtered schemas), any concurrent DDL on tables outside BR's InfoSchema filter can hang/timeout cluster-wide. The DDL owner waits indefinitely for BR's MDL sync acknowledgment that never arrives, blocking all DDL operations.

  • Root Cause: This PR introduces InfoSchema/MDL filtering that causes BR to skip MDL checks for filtered-out tables. When BR's SkipMDLCheck returns true, the issyncer drops these jobs from MDL processing and never calls UpdateSelfVersion(jobID, ver). With MDL enabled (default), the DDL owner requires per-job version updates from ALL servers, including BR, causing indefinite blocking.

  • Evidence:

    • br/pkg/gluetidb/infoschema_filter.go:55-67: BR filter returns true (skip) when table is not in filtered InfoSchema
    • pkg/infoschema/issyncer/syncer.go:189: Skipped jobs are dropped via continue, never reaching version sync
    • pkg/ddl/schemaver/syncer.go:310-314: With MDL enabled, only per-job keys count (jobID=0 is no-op)
    • pkg/ddl/schemaver/syncer.go:422-444: Owner waits for ALL servers' per-job versions, including BR
  • Suggested Fix: Ensure 'skip MDL check' does not mean 'skip reporting'. Either:

    1. Make brInfoSchemaFilter.SkipMDLCheck always return false, OR
    2. Modify issyncer to call UpdateSelfVersion(jobID, ver) immediately for skipped jobs

Files Affected by This PR:

  • pkg/infoschema/issyncer/filter.go (new)
  • pkg/infoschema/issyncer/syncer.go (modified)
  • br/pkg/gluetidb/infoschema_filter.go (modified)
  • br/cmd/br/backup.go, br/cmd/br/restore.go (enable filter)

D3Hunter
D3Hunter previously approved these changes Jan 4, 2026
@YuJuncen
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 12, 2026

@YuJuncen: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -0,0 +1,37 @@
// Copyright 2025 PingCAP, Inc.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2026

func (f *brInfoSchemaFilter) SkipLoadDiff(diff *model.SchemaDiff, latestIS infoschema.InfoSchema) (skip bool) {
defer func() {
if skip {
log.Warn("skip load a schema diff due to configuration.", zap.Any("diff", diff), zap.Int64("version", diff.Version))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Warn && zap.Any("diff", diff) may cause noice?
Consider use INFO or just log diff.Type/SchemaID/TableID/OldSchemaID

return &brInfoSchemaFilter{allow: allow}
}

func (f *brInfoSchemaFilter) SkipLoadDiff(diff *model.SchemaDiff, latestIS infoschema.InfoSchema) (skip bool) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SkipLoadDiff treats SchemaDiff.SchemaID as a database ID and does latestIS.SchemaByID(SchemaID) , but for several DDLs SchemaID is not a DB ID, so BR may incorrectly skip their diffs and keep stale global metadata:
- Placement policy diffs: SchemaID is the policy ID for create/alter/drop. BR only special-cases ActionCreatePlacementPolicy, not alter/drop.
- Resource group diffs: SchemaID is the resource group ID. BR has no special-cases, so create/alter/drop can be skipped.
- Impact: BR’s embedded domain/infoSchema can become inconsistent for these objects during diff-load, causing incorrect behavior if BR code relies on up-to-date policy/resource-group info.


func (dm *domainMap) getWithEtcdClient(store kv.Storage, etcdClient *clientv3.Client) (d *domain.Domain, err error) {
func (dm *domainMap) GetOrCreateWithFilter(store kv.Storage, filter issyncer.Filter) (d *domain.Domain, err error) {
return dm.getWithEtcdClient(store, nil, filter)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Domain reuse ignores filter changes: the domain cache key is only store.UUID(), so GetOrCreateDomainWithFilter only applies the filter when the domain is first created. If a domain already exists (with a different/no filter), subsequent calls won’t recreate or update it.
Is it expected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as its name OrCreateWithFilter. Perhaps filter ID can be added to domain map's key but for now it seems that isn't pretty useful. I will add a comment to clearify this.

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 14, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 14, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-04 02:40:51.267735331 +0000 UTC m=+497207.086043763: ☑️ agreed by D3Hunter.
  • 2026-01-14 12:47:21.581972916 +0000 UTC m=+448085.643837824: ☑️ agreed by wjhuang2016.

@YuJuncen
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 15, 2026

@YuJuncen: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3pointer, D3Hunter, wjhuang2016

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Jan 15, 2026
@YuJuncen
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 16, 2026

@YuJuncen: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@YuJuncen
Copy link
Copy Markdown
Contributor Author

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Jan 16, 2026
@YuJuncen
Copy link
Copy Markdown
Contributor Author

/retest

3 similar comments
@BornChanger
Copy link
Copy Markdown
Contributor

/retest

@BornChanger
Copy link
Copy Markdown
Contributor

/retest

@BornChanger
Copy link
Copy Markdown
Contributor

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 17, 2026

@YuJuncen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
tidb_parser_test a3c5b2d link true /test tidb_parser_test
fast_test_tiprow a3c5b2d link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot ti-chi-bot bot merged commit 9c0773b into pingcap:master Jan 17, 2026
73 of 87 checks passed
@BornChanger BornChanger added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Jan 18, 2026
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jan 18, 2026
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #65627.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. ok-to-test Indicates a PR is ready to be tested. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make BR Backup/Restore avoid impacting online DDL

8 participants