Skip to content

scheduler: graceful shutdown implement#9720

Merged
ti-chi-bot[bot] merged 18 commits intotikv:masterfrom
hujiatao0:graceful-shutdown-impl
Sep 28, 2025
Merged

scheduler: graceful shutdown implement#9720
ti-chi-bot[bot] merged 18 commits intotikv:masterfrom
hujiatao0:graceful-shutdown-impl

Conversation

@hujiatao0
Copy link
Copy Markdown
Contributor

@hujiatao0 hujiatao0 commented Sep 4, 2025

add test for scheduler

What problem does this PR solve?

Issue Number: Close #9719

What is changed and how does it work?

Add an is_stopping status to the StoreHeartbeat message. When TiKV receives a SIGTERM, it sets this flag. This change adds a new evict-stopping-store-scheduler to PD, which is analogous to the evict-slow-store-scheduler. It proactively transfers leaders away from nodes by inspecting the is_stopping status from store heartbeats.

Check List

Tests

  • Unit test

Code changes

  • Added a new evict-stopping-store scheduler, it inspects the is_stopping status from storeheartbeat and evict leaders for the Tikv node if is_stopping is true.

Side effects

  • Increased code complexity

Related changes

Release note

None.

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. contribution This PR is from a community contributor. labels Sep 4, 2025
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 4, 2025

Hi @hujiatao0. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. labels Sep 4, 2025
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 4, 2025

Welcome @hujiatao0!

It looks like this is your first PR to tikv/pd 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to tikv/pd. 😃

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 4, 2025
@rleungx
Copy link
Copy Markdown
Member

rleungx commented Sep 4, 2025

/ok-to-test

@ti-chi-bot ti-chi-bot bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Sep 4, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 76.10922% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.87%. Comparing base (ec8e27c) to head (5f818ac).
⚠️ Report is 80 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9720      +/-   ##
==========================================
+ Coverage   76.75%   76.87%   +0.12%     
==========================================
  Files         488      489       +1     
  Lines       77727    78019     +292     
==========================================
+ Hits        59658    59979     +321     
+ Misses      14414    14394      -20     
+ Partials     3655     3646       -9     
Flag Coverage Δ
unittests 76.87% <76.10%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hujiatao0 hujiatao0 force-pushed the graceful-shutdown-impl branch 3 times, most recently from 4faf212 to 3abfe2c Compare September 16, 2025 09:09
@hujiatao0 hujiatao0 force-pushed the graceful-shutdown-impl branch 2 times, most recently from ac129a7 to 80045a7 Compare September 17, 2025 05:39
@okJiang
Copy link
Copy Markdown
Member

okJiang commented Sep 23, 2025

please fix the conflict @hujiatao0

new slow store format scheduler for graceful shutdown

Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>

add unit test and remove some useless code

Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>
@hujiatao0 hujiatao0 force-pushed the graceful-shutdown-impl branch from 14fab10 to f752039 Compare September 23, 2025 07:32
Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>
Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>
Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>
Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>
@hujiatao0
Copy link
Copy Markdown
Contributor Author

/retest

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Sep 26, 2025
Copy link
Copy Markdown
Member

@okJiang okJiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

func (s *evictStoppingStoreScheduler) cleanupEvictLeader(cluster sche.SchedulerCluster) {
evictStoppingStore, err := s.conf.clearEvictedAndPersist()
if err != nil {
log.Info("evict-stopping-store-scheduler persist config failed", zap.Uint64("store-id", evictStoppingStore))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to keep the consistency between storage and memory.

Suggested change
log.Info("evict-stopping-store-scheduler persist config failed", zap.Uint64("store-id", evictStoppingStore))
log.Warn("evict-stopping-store-scheduler persist config failed", zap.Uint64("store-id", evictStoppingStore))
return

Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>
Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>
@hujiatao0 hujiatao0 force-pushed the graceful-shutdown-impl branch from 3a6c4dd to 5f818ac Compare September 26, 2025 07:23
@hujiatao0
Copy link
Copy Markdown
Contributor Author

/retest

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 26, 2025
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 26, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-09-26 02:36:23.046685617 +0000 UTC m=+583393.117179299: ☑️ agreed by rleungx.
  • 2025-09-26 08:28:56.965460978 +0000 UTC m=+604547.035954661: ☑️ agreed by okJiang.

@hujiatao0
Copy link
Copy Markdown
Contributor Author

/retest

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: niubell, okJiang, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Sep 28, 2025
@hujiatao0
Copy link
Copy Markdown
Contributor Author

/retest

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 28, 2025

@hujiatao0: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
non-block/pull-unit-test-next-gen 5f818ac link false /test pull-unit-test-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot ti-chi-bot bot merged commit 9450168 into tikv:master Sep 28, 2025
28 of 29 checks passed
hujiatao0 added a commit to hujiatao0/pd that referenced this pull request Sep 29, 2025
close tikv#9719

Add an is_stopping status to the StoreHeartbeat message. When TiKV receives a SIGTERM, it sets this flag. This change adds a new evict-stopping-store-scheduler to PD, which is analogous to the evict-slow-store-scheduler. It proactively transfers leaders away from nodes by inspecting the is_stopping status from store heartbeats.

Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>
ti-chi-bot bot pushed a commit that referenced this pull request Sep 30, 2025
close #9719

Add an is_stopping status to the StoreHeartbeat message. When TiKV receives a SIGTERM, it sets this flag. This change adds a new evict-stopping-store-scheduler to PD, which is analogous to the evict-slow-store-scheduler. It proactively transfers leaders away from nodes by inspecting the is_stopping status from store heartbeats.

Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>
hujiatao0 added a commit to hujiatao0/pd that referenced this pull request Dec 2, 2025
close tikv#9719

Add an is_stopping status to the StoreHeartbeat message. When TiKV receives a SIGTERM, it sets this flag. This change adds a new evict-stopping-store-scheduler to PD, which is analogous to the evict-slow-store-scheduler. It proactively transfers leaders away from nodes by inspecting the is_stopping status from store heartbeats.

Signed-off-by: hujiatao0 <hhjjtt110@gmail.com>
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Dec 5, 2025
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Dec 5, 2025
close tikv#9719

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #10000.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Graceful Shutdown of TiKV pods when SIGTERM is sent to the pod

6 participants