Skip to content

PD may repeatedly add learner to a region #5786

@HunDunDM

Description

@HunDunDM

Bug Report

What did you do?

  • Start a cluster with 5 tikv(mark as tikv-1, tikv-2, tikv-3, tikv-4, tikv-5)
  • Pause rule-checker curl -X POST -d '{"delay":3000}' "http://{PD_ADDRESS}/pd/api/v1/checker/rule"
  • Find a Region whose Peer is on tikv-1, tikv-2, tikv-3
  • Add Learner to this Region on tikv-4, tikv-5
  • Kill tikv-2, and wait for it to become Down and the Region recognizes pending-peer and down-peer
  • Resume rule-checker curl -X POST -d '{"delay":0}' "http://{PD_ADDRESS}/pd/api/v1/checker/rule"

What did you expect to see?

The other Regions of tikv-2 were migrated away, and the Region also returned to normal.

What did you see instead?

Other Regions return to normal, but this Region cannot return to normal.

What version of PD are you using (pd-server -V)?

v6.4.0

Note

It can be restored manually.

Solving it requires two steps:

  • If the redundant Learner is also Pending or Down, delete it first.
  • If a redundant Learner is available, promote it directly.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Closed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions