Skip to content

Continuous Gossip Warnings (Failed determining organization) causing disk space pressure after old certificates expire (new certificates have been updated) #5458

@sky030b

Description

@sky030b

Description

Hi everyone,

may related issue: #5111

We are experiencing an issue with our peers and could really use some expert guidance.

After the old certificates of other peers in our channel expired (note: these peers had already been successfully updated with new certificates), we started seeing the following warning in our logs:

WARN [gossip.gossip] func3 -> Unable to determine org of message tag:EMPTY alive_msg:<membership:<endpoint:...

Image 🔺 The first peer whose credentials expired (log starts half an hour after expiration).

Shortly after, a second type of warning started appearing continuously and in massive volumes, which is now causing severe disk space pressure on our machines:

WARN [gossip.gossip] func3 -> Failed determining organization of d5bfxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx5aab5ef8b
WARN [gossip.gossip] func3 -> Failed determining organization of b419xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx7f5ac3ad4
Image 🔺 Later, the second warning logs start showing.

Because this second warning storm only triggered after the first peer's old certificate officially expired, we suspect these two events are connected, but we were unable to reproduce the issue there, so we aren't sure this two log are related.🤔

Has anyone encountered this specific behavior where the gossip filter continues to spam these hashes even after a container restart? Any guidance on how to properly clear these ghost identities or stop the log spam would be greatly appreciated!

Image

As can be seen from the graph, it continuously generates repeat log.

Steps to reproduce

Our Environment & Troubleshooting Steps:

Versions Affected: We checked our hyperledger/fabric-peer images. We have two separate environments running v2.5.10 and v2.5.13, and both are experiencing this exact same log spam.

Container Restarts: We have tried restarting the peer containers to clear the memory. We are aware that the v2.5.11 release notes mention a fix regarding "Gossip handling of expired certificates". However, our understanding is that even without that fix (e.g., on v2.5.10), simply restarting the container should clear the idMapper memory and stop the use of the old cache. Despite the restarts, the warnings persist continuously.

Failed Reproduction: To isolate the issue, we built a completely fresh environment from scratch using the v2.5.10 peer image and re-do certificates updated and expired process, but we were unable to reproduce the issue there.

We are currently stuck and aren't entirely sure if our issue is directly related to the v2.5.11 fix, given that v2.5.13 is also affected and container restarts aren't mitigating the log spam.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions