Skip to content

grpcproxy: new watchers may be added to a disconnected watchBroadcast and miss all subsequent events #21813

@maluyazi

Description

@maluyazi

Bug report criteria

What happened?

In server/proxy/grpcproxy, when a watchBroadcast's underlying etcd watch connection is broken (the goroutine exits after for wr := range wch completes), the watchBroadcast remains in watchBroadcasts.bcasts with no indication that it is no longer receiving events.

This causes two problems:

  1. add() can place new watchers into a dead broadcast: Since there is no stopped flag, add() happily accepts new watchers into a broadcast whose goroutine has already exited. These watchers will never receive any subsequent events.

  2. coalesce() can migrate watchers into a dead broadcast: The coalesce logic only checks nextrev and responses, not whether the target broadcast is still alive. Watchers migrated to a dead broadcast will also stop receiving events.

Additionally, there is no mechanism to reassign existing watchers (orphans) from a dead broadcast to a healthy one.

What did you expect to happen?

  1. A disconnected watchBroadcast should be marked as stopped and refuse new watchers.
  2. coalesce() should not migrate watchers to a stopped broadcast.
  3. When a broadcast disconnects, its existing watchers should be automatically reassigned to a healthy broadcast or a newly created one.

How can we reproduce it (as minimally and precisely as possible)?

  1. Set up an etcd gRPC proxy with multiple client watchers coalesced on the same key range.
  2. Cause the backend etcd watch connection to break (e.g., network partition, etcd server restart).
  3. After the break, create a new watcher on the same key range through the proxy.
  4. Observe that the new watcher (and any existing watchers on the dead broadcast) never receives subsequent events.

Anything else we need to know?

No response

Etcd version (please run commands below)

Details
$ etcd --version
v3.5.4

$ etcdctl version
v3.5.4

Etcd configuration (command line flags or environment variables)

Details

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Details
$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions