Skip to content

Fix cache metrics to collect from all servers#18748

Merged
MadLittleMods merged 5 commits intodevelopfrom
madlittlemods/fix-cache-metrics-per-tenant
Aug 1, 2025
Merged

Fix cache metrics to collect from all servers#18748
MadLittleMods merged 5 commits intodevelopfrom
madlittlemods/fix-cache-metrics-per-tenant

Conversation

@MadLittleMods
Copy link
Copy Markdown
Contributor

@MadLittleMods MadLittleMods commented Jul 29, 2025

Fix cache metrics to collect from all servers

Follow-up to #18604

Previously, our cache metrics did include the server_name label as expected but we were only seeing the last server being reported. This was caused because we would CACHE_METRIC_REGISTRY.register_hook(metric_name, metric.collect) where the metric_name only took into account the cache name so it would be overwritten every time we spawn a new server.

This PR updates the register logic to include the server_name so we have a hook for every cache on every server as expected.

I noticed this problem thanks to some tests in the Synapse Pro for Small Hosts repo that sanity check all metrics to ensure that we can see each metric includes data from multiple servers.

Testing strategy

  1. This is only noticeable when you run multiple Synapse instances in the same process.
  2. TODO

(see test_cache_metric_multiple_servers that was added in this PR)

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

# just before they are returned from the scrape endpoint.
CACHE_METRIC_REGISTRY = DynamicCollectorRegistry()

caches_by_name: Dict[str, Sized] = {}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because it was unused

Same changelog as #18604 so they merge
Comment thread changelog.d/18748.misc
@@ -0,0 +1 @@
Refactor cache metrics to be homeserver-scoped.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same changelog as #18604 so they merge

@MadLittleMods MadLittleMods marked this pull request as ready for review July 31, 2025 01:29
@MadLittleMods MadLittleMods requested a review from a team as a code owner July 31, 2025 01:29
@MadLittleMods MadLittleMods merged commit e43a1ce into develop Aug 1, 2025
75 of 78 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/fix-cache-metrics-per-tenant branch August 1, 2025 17:30
@MadLittleMods
Copy link
Copy Markdown
Contributor Author

Thanks for the review @devonh 🐠

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants