Skip to content

gc: Optimize GetAllKeyspacesGCStates for scenarios where there are too many keyspaces#9777

Merged
ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
MyonKeminta:m/get-all-keyspaces-gc-states-opt
Oct 20, 2025
Merged

gc: Optimize GetAllKeyspacesGCStates for scenarios where there are too many keyspaces#9777
ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
MyonKeminta:m/get-all-keyspaces-gc-states-opt

Conversation

@MyonKeminta
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: Ref #8978

What is changed and how does it work?

This PR supports loading keyspaces by smaller batches in `GetAllKeyspacesGCStates`. This avoids it througing errors due to etcd txn size limit when there are too many keyspaces.
Also makes `GetAllKeyspacesGCStates` support cancelling.
This is not the ultimate solution for tracking GC states changes. The best way is to implement a streaming API that watches the changes of the GC states. However, this PR provides a simpler fix to make it able to run.

Check List

Tests

  • Unit test

Code changes

-

Side effects

  • Increased code complexity

Related changes

Release note

None.

…ort cancelling during execution

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has signed the dco. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 24, 2025
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 25, 2025
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@MyonKeminta MyonKeminta marked this pull request as ready for review September 25, 2025 08:48
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 25, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 25, 2025

Codecov Report

❌ Patch coverage is 72.10884% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.88%. Comparing base (ec8e27c) to head (1272e7c).
⚠️ Report is 27 commits behind head on master.

❌ Your patch status has failed because the patch coverage (72.10%) is below the target coverage (74.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9777      +/-   ##
==========================================
+ Coverage   76.75%   76.88%   +0.13%     
==========================================
  Files         488      491       +3     
  Lines       77727    78547     +820     
==========================================
+ Hits        59658    60392     +734     
- Misses      14414    14464      +50     
- Partials     3655     3691      +36     
Flag Coverage Δ
unittests 76.88% <72.10%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MyonKeminta
Copy link
Copy Markdown
Contributor Author

/retest

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
m.mu.Lock()
defer m.mu.Unlock()

ensureUnlocked := func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure the mutex is unlocked if the function is aborted abnormally (e.g., panicking), or more return path is introduced in the future.

const sealedFlag int64 = 1 << 62

func newTask[TResult any]() *task[TResult] {
ctx, cancel := context.WithCancel(context.Background())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass a context instead of using a background context?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make it able to be canceled if all calls waiting for the result is canceled, but it won't be canceled immediately if the call that triggers the execution of the inner function is canceled while there are still other calls waiting for the result. If it should be an externally provided context, which should it be? Sounds like it should be a context that is canceled when the PD server process exits, if there is one.

func newKeyspaceIterator(manager *Manager, startID uint32) *Iterator {
return &Iterator{
manager: manager,
startID: startID,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is it used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mistakenly left unused. As we don't need it for now, I removed it.
But I just noticed that there's no test that directly covers the IterateKeyspaces itself.. it's indirectly covered by the tests of GC states though.
Maybe I'd better add some?

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@MyonKeminta
Copy link
Copy Markdown
Contributor Author

/retest

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Oct 17, 2025

@MyonKeminta: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
non-block/pull-unit-test-next-gen 1272e7c link false /test pull-unit-test-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Oct 20, 2025
@rleungx
Copy link
Copy Markdown
Member

rleungx commented Oct 20, 2025

BTW, I suggest using a separate goroutine to get GC states periodically in the future.

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Oct 20, 2025
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Oct 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JmPotato, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Oct 20, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-10-20 02:59:10.853888126 +0000 UTC m=+667856.931140676: ☑️ agreed by rleungx.
  • 2025-10-20 09:47:41.614632827 +0000 UTC m=+692367.691885377: ☑️ agreed by JmPotato.

@ti-chi-bot ti-chi-bot bot merged commit 7cbc382 into tikv:master Oct 20, 2025
34 of 40 checks passed
@MyonKeminta MyonKeminta deleted the m/get-all-keyspaces-gc-states-opt branch October 21, 2025 05:10
@rleungx
Copy link
Copy Markdown
Member

rleungx commented Oct 27, 2025

/cherry-pick release-nextgen-20251011

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Oct 27, 2025
ref tikv#8978

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Copy Markdown
Member

@rleungx: new pull request created to branch release-nextgen-20251011: #9872.
But this PR has conflicts, please resolve them!

Details

In response to this:

/cherry-pick release-nextgen-20251011

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot bot pushed a commit that referenced this pull request Oct 28, 2025
…o many keyspaces (#9777) (#9872)

ref #8978

This PR supports loading keyspaces by smaller batches in `GetAllKeyspacesGCStates`. This avoids it througing errors due to etcd txn size limit when there are too many keyspaces.
Also makes `GetAllKeyspacesGCStates` support cancelling.
This is not the ultimate solution for tracking GC states changes. The best way is to implement a streaming API that watches the changes of the GC states. However, this PR provides a simpler fix to make it able to run.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

Co-authored-by: MyonKeminta <9948422+MyonKeminta@users.noreply.github.com>
Co-authored-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants