Skip to content

server: fix the leader cannot election after pd leader lost while etcd leader intact#6447

Merged
ti-chi-bot[bot] merged 8 commits intotikv:masterfrom
nolouch:miss-leader
May 12, 2023
Merged

server: fix the leader cannot election after pd leader lost while etcd leader intact#6447
ti-chi-bot[bot] merged 8 commits intotikv:masterfrom
nolouch:miss-leader

Conversation

@nolouch
Copy link
Copy Markdown
Contributor

@nolouch nolouch commented May 11, 2023

What problem does this PR solve?

Issue Number: Close #6403

What is changed and how does it work?

server: fix the leader cannot election after pd leader lost while etcd leader intact

Check List

Tests

  • Unit test
  • Integration test

Release note

None.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented May 11, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • JmPotato
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. release-note-none Denotes a PR that doesn't merit a release note. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. labels May 11, 2023
@ti-chi-bot ti-chi-bot bot requested review from JmPotato and disksing May 11, 2023 18:24
@nolouch nolouch requested review from lhy1024 and rleungx and removed request for JmPotato and disksing May 11, 2023 18:24
…d leader intact

Signed-off-by: nolouch <nolouch@gmail.com>
Copy link
Copy Markdown
Contributor

@lhy1024 lhy1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but ci failed

zap.String("current-leader-member-id", types.ID(etcdLeader).String()),
zap.String("transferee-member-id", types.ID(s.member.ID()).String()),
)
s.member.MoveEtcdLeader(s.ctx, etcdLeader, s.member.ID())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider the leader's priority?

Copy link
Copy Markdown
Contributor Author

@nolouch nolouch May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases, there are no priority. if there exists priority, it also not affect the priority because the higher priority will do move the leader again. so, I think we can keep simple with this implementation.

re.NoError(failpoint.Enable("github.com/tikv/pd/server/exitCampaignLeader", fmt.Sprintf("return(\"%d\")", memberID)))
re.NoError(failpoint.Enable("github.com/tikv/pd/server/timeoutWaitPDLeader", `return(true)`))
leader2 := waitLeaderChange(re, cluster, leader1)
t.Log("leader2:", leader2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.Log("leader2:", leader2)

re.NoError(failpoint.Enable("github.com/tikv/pd/server/timeoutWaitPDLeader", `return(true)`))
leader2 := waitLeaderChange(re, cluster, leader1)
t.Log("leader2:", leader2)
re.NotEqual(leader1, leader2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check the duration that leader changing costs won't be too long?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this fix, this test will be failed after timeout after 30s in waitLeaderChange.

Signed-off-by: nolouch <nolouch@gmail.com>
@codecov
Copy link
Copy Markdown

codecov bot commented May 12, 2023

Codecov Report

❗ No coverage uploaded for pull request base (master@511115f). Click here to learn what that means.
Patch coverage: 88.88% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #6447   +/-   ##
=========================================
  Coverage          ?   74.85%           
=========================================
  Files             ?      410           
  Lines             ?    41718           
  Branches          ?        0           
=========================================
  Hits              ?    31227           
  Misses            ?     7782           
  Partials          ?     2709           
Flag Coverage Δ
unittests 74.85% <88.88%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/tso/allocator_manager.go 66.61% <ø> (ø)
pkg/member/participant.go 51.93% <42.85%> (ø)
pkg/member/member.go 70.04% <100.00%> (ø)
server/server.go 74.91% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 12, 2023
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 12, 2023
@JmPotato
Copy link
Copy Markdown
Member

/merge

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented May 12, 2023

@JmPotato: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented May 12, 2023

This pull request has been accepted and is ready to merge.

DetailsCommit hash: 9eaa004

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 12, 2023
@ti-chi-bot ti-chi-bot bot removed the status/can-merge Indicates a PR has been approved by a committer. label May 12, 2023
@nolouch
Copy link
Copy Markdown
Contributor Author

nolouch commented May 12, 2023

/merge

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented May 12, 2023

@nolouch: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented May 12, 2023

This pull request has been accepted and is ready to merge.

DetailsCommit hash: 14e9f16

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 12, 2023
}

leader, checkAgain := s.member.CheckLeader()
// add failpoint to test leader check go to stuck.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file permission has been changed.

@rleungx
Copy link
Copy Markdown
Member

rleungx commented May 12, 2023

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2023
Signed-off-by: nolouch <nolouch@gmail.com>
@ti-chi-bot ti-chi-bot bot removed the status/can-merge Indicates a PR has been approved by a committer. label May 12, 2023
@rleungx
Copy link
Copy Markdown
Member

rleungx commented May 12, 2023

/hold cancel

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2023
@rleungx
Copy link
Copy Markdown
Member

rleungx commented May 12, 2023

/merge

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented May 12, 2023

@rleungx: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented May 12, 2023

This pull request has been accepted and is ready to merge.

DetailsCommit hash: 6d810b4

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 12, 2023
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented May 12, 2023

@nolouch: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #6460.

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request May 12, 2023
close tikv#6403

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request May 12, 2023
close tikv#6403

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #6461.

ti-chi-bot bot added a commit that referenced this pull request May 15, 2023
…d leader intact (#6447) (#6461)

close #6403, ref #6447

server: fix the leader cannot election after pd leader lost while etcd leader intact

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot added a commit that referenced this pull request May 24, 2023
…d leader intact (#6447) (#6460)

close #6403, ref #6447

server: fix the leader cannot election after pd leader lost while etcd leader intact

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@nolouch nolouch deleted the miss-leader branch August 11, 2023 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. release-note-none Denotes a PR that doesn't merit a release note. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detect leader health and automatically do failover

5 participants