Skip to content

Refactor GaugeBucketCollector metrics to be homeserver-scoped#18715

Merged
MadLittleMods merged 17 commits intodevelopfrom
madlittlemods/18592-GaugeBucketCollector
Jul 29, 2025
Merged

Refactor GaugeBucketCollector metrics to be homeserver-scoped#18715
MadLittleMods merged 17 commits intodevelopfrom
madlittlemods/18592-GaugeBucketCollector

Conversation

@MadLittleMods
Copy link
Copy Markdown
Contributor

@MadLittleMods MadLittleMods commented Jul 22, 2025

Refactor GaugeBucketCollector metrics to be homeserver-scoped

Part of #18592

Testing strategy

  1. Add the metrics listener in your homeserver.yaml
    listeners:
      # This is just showing how to configure metrics either way
      #
      # `http` `metrics` resource
      - port: 9322
        type: http
        bind_addresses: ['127.0.0.1']
        resources:
          - names: [metrics]
            compress: false
      # `metrics` listener
      - port: 9323
        type: metrics
        bind_addresses: ['127.0.0.1']
  2. Start the homeserver: poetry run synapse_homeserver --config-path homeserver.yaml
  3. Fetch http://localhost:9322/_synapse/metrics and/or http://localhost:9323/metrics
  4. Adjust the number of msecs in the looping_call so that _read_forward_extremities runs immediately instead of after an hour.
  5. Observe response includes the synapse_forward_extremities and synapse_excess_extremity_events metrics with the server_name label

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Comment thread synapse/metrics/__init__.py Outdated
@MadLittleMods MadLittleMods marked this pull request as ready for review July 23, 2025 16:49
@MadLittleMods MadLittleMods requested a review from a team as a code owner July 23, 2025 16:49
Copy link
Copy Markdown
Member

@devonh devonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if people add metrics in the future that use the upstream GaugeHistogram type and we don't catch that it doesn't have the server_name label?

Comment thread synapse/metrics/__init__.py Outdated
Comment thread synapse/metrics/__init__.py Outdated
Comment on lines +382 to +388
def add_metric(
self,
labelvalues: StrSequence,
buckets: Sequence[Tuple[str, float]],
gsum_value: float,
timestamp: Optional[Union[float, Timestamp]] = None,
) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function exactly the same as from the upstream code?
It looks like it is.

If so, could we not just inherit from GaugeHistogramMetricFamily directly (instead of Metric) to not need to duplicate this code?
It's already not ideal to need to almost duplicate the __init__ function.

Copy link
Copy Markdown
Contributor Author

@MadLittleMods MadLittleMods Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to inherit from GaugeHistogramMetricFamily. It seems a bit brittle as we're just relying on setting self._labelnames = tuple(labelnames) and hoping the super implementation does something with it.

I think I'd rather just have our own full-custom version but will go forward with your preference.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Both cases have a level of fragility. Since if the upstream version changes something else about how it is adding metrics, then we have no way of knowing we should update our copied code snippet.

But either way, I think this PR is good to go. If we see this being too fragile the way it is, we can always revisit. Since it's just a metric label we aren't dealing with a potential Synapse functionality breakage so the stakes are low.

Comment thread changelog.d/18715.misc
@@ -0,0 +1 @@
Refactor `GaugeBucketCollector` metrics to be homeserver-scoped.
Copy link
Copy Markdown
Contributor Author

@MadLittleMods MadLittleMods Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if people add metrics in the future that use the upstream GaugeHistogram type and we don't catch that it doesn't have the server_name label?

-- @devonh, #18715 (review)

The linting for this is coming as part of #18733

It doesn't really matter whether lints are introduced along with the changes as when we introduce the lints, it will catch whatever is leftover. To be clear, I think this PR is exhaustive but I'm just pointing out that it isn't a hard requirement.

@MadLittleMods MadLittleMods requested a review from devonh July 25, 2025 20:37
Copy link
Copy Markdown
Member

@devonh devonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small request to comment the risk of setting labelname.
Otherwise this looks good! Thanks for bearing with me :)

Comment thread synapse/metrics/__init__.py Outdated
if len(labelvalues) != len(labelnames):
raise ValueError("Incorrect label count")

self._labelnames = tuple(labelnames)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are relying on the inheritance now to apply the labels, we should add a comment here to describe that risk / fragility.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative, I've updated to use the super() constructor to set the internal label names field. This feels a bit better to use their stable API to get everything done.

Comment thread synapse/metrics/__init__.py Outdated
Comment on lines +382 to +388
def add_metric(
self,
labelvalues: StrSequence,
buckets: Sequence[Tuple[str, float]],
gsum_value: float,
timestamp: Optional[Union[float, Timestamp]] = None,
) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Both cases have a level of fragility. Since if the upstream version changes something else about how it is adding metrics, then we have no way of knowing we should update our copied code snippet.

But either way, I think this PR is good to go. If we see this being too fragile the way it is, we can always revisit. Since it's just a metric label we aren't dealing with a potential Synapse functionality breakage so the stakes are low.

@MadLittleMods MadLittleMods merged commit 5106818 into develop Jul 29, 2025
74 of 78 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/18592-GaugeBucketCollector branch July 29, 2025 16:46
@MadLittleMods
Copy link
Copy Markdown
Contributor Author

Thanks for the review @devonh 🦚

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants