online recovery: fix online recovery timeout mechanism#6108
online recovery: fix online recovery timeout mechanism#6108ti-chi-bot merged 4 commits intotikv:masterfrom
Conversation
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
|
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. DetailsReviewer can indicate their review by submitting an approval review. |
|
PTAL @v01dstar |
|
Can you please explain a little bit more about what the bug is? I can tell that with this change, the whole process will exit faster when timeout happens (the old / existing also exit after timeout, i believe? ). Besides, I think, with this change, we may leave some regions in exit force leader state when timeout? |
Suppose that, one TiKV always returns store heartbeat but without store report for somewhat reason. Then in the existing impl, it would never trigger timeout and keep in the collecting stage forever. |
I think the existing code still exit? Just with a longer wait time, |
Please check |
| // blocks reads and writes. | ||
| u.storePlanExpires = make(map[uint64]time.Time) | ||
| u.storeRecoveryPlans = make(map[uint64]*pdpb.RecoveryPlan) | ||
| u.timeout = time.Now().Add(storeRequestInterval) |
There was a problem hiding this comment.
Isn't one heartbeat interval too aggressive? Maybe *2 to make it more stable?
|
@v01dstar: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #6108 +/- ##
==========================================
+ Coverage 74.03% 74.12% +0.08%
==========================================
Files 385 385
Lines 37952 37952
==========================================
+ Hits 28099 28131 +32
+ Misses 7377 7353 -24
+ Partials 2476 2468 -8
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
|
/merge |
|
@nolouch: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: 22eec8a |
|
@Connor1996: Your PR was out of date, I have automatically updated it for you. If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
In response to a cherrypick label: new pull request created to branch |
close tikv#6107 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created to branch |
close tikv#6107 fix online recovery timeout mechanism Signed-off-by: Connor1996 <zbk602423539@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: Yang Zhang <yang.zhang@pingcap.com>
What problem does this PR solve?
Issue Number: Close #6107
What is changed and how does it work?
Check List
Tests
Release note