Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
b14eed7
Add in base linting for metrics
MadLittleMods Jul 23, 2025
4bb84a6
Fill in `synapse/app/phone_stats_home.py`
MadLittleMods Jul 23, 2025
80d5fd5
Support `labelnames` argument being a Tuple expression
MadLittleMods Jul 23, 2025
1504100
Fill in `synapse/federation/federation_server.py`
MadLittleMods Jul 23, 2025
608f72e
Fill in `synapse/federation/sender/transaction_manager.py`
MadLittleMods Jul 23, 2025
0a2877e
Fill in `synapse/metrics/__init__.py`
MadLittleMods Jul 23, 2025
0c93d85
Fill in `synapse/metrics/_gc.py`
MadLittleMods Jul 23, 2025
fe1a16f
Fill in `synapse/metrics/common_usage_metrics.py`
MadLittleMods Jul 23, 2025
944df9c
Fill in `synapse/push/pusherpool.py`
MadLittleMods Jul 23, 2025
54e2374
Fill in `synapse/replication/http/_base.py`
MadLittleMods Jul 23, 2025
883062b
Fill in `synapse/storage/databases/main/event_federation.py`
MadLittleMods Jul 23, 2025
90efa41
Fill in `synapse/storage/databases/main/events_worker.py`
MadLittleMods Jul 23, 2025
f4b6d35
Fill in `synapse/util/batching_queue.py`
MadLittleMods Jul 23, 2025
656c3ad
Fill in `synapse/util/caches/deferred_cache.py`
MadLittleMods Jul 23, 2025
c55e615
Add changelog
MadLittleMods Jul 23, 2025
0fd34a6
Add upgrade notes
MadLittleMods Jul 23, 2025
563f543
Fix `make_fake_db_pool`
MadLittleMods Jul 23, 2025
554d588
Fix `event_persisted_position` usage
MadLittleMods Jul 23, 2025
2536aaf
Fix lints
MadLittleMods Jul 23, 2025
7b55ffb
Fill in event `Gauge` metrics from `synapse/metrics/__init__.py`
MadLittleMods Jul 24, 2025
68061f9
Merge branch 'develop' into madlittlemods/18592-refactor-gauge
MadLittleMods Jul 24, 2025
d587aa9
Remove debug log
MadLittleMods Jul 24, 2025
def4eb5
Merge branch 'develop' into madlittlemods/18592-refactor-gauge
MadLittleMods Jul 25, 2025
650ce32
Fix grammar
MadLittleMods Jul 25, 2025
4387262
Fix duplicate word typo
MadLittleMods Jul 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/18725.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Refactor `Gauge` metrics to be homeserver-scoped.
4 changes: 2 additions & 2 deletions contrib/grafana/synapse.json
Original file line number Diff line number Diff line change
Expand Up @@ -4396,7 +4396,7 @@
"exemplar": false,
"expr": "(time() - max without (job, index, host) (avg_over_time(synapse_federation_last_received_pdu_time[10m]))) / 60",
"instant": false,
"legendFormat": "{{server_name}} ",
"legendFormat": "{{origin_server_name}} ",
"range": true,
"refId": "A"
}
Expand Down Expand Up @@ -4518,7 +4518,7 @@
"exemplar": false,
"expr": "(time() - max without (job, index, host) (avg_over_time(synapse_federation_last_sent_pdu_time[10m]))) / 60",
"instant": false,
"legendFormat": "{{server_name}}",
"legendFormat": "{{destination_server_name}}",
"range": true,
"refId": "A"
}
Expand Down
19 changes: 19 additions & 0 deletions docs/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,25 @@ each upgrade are complete before moving on to the next upgrade, to avoid
stacking them up. You can monitor the currently running background updates with
[the Admin API](usage/administration/admin_api/background_updates.html#status).

# Upgrading to v1.136.0

## Metric labels have changed on `synapse_federation_last_received_pdu_time` and `synapse_federation_last_sent_pdu_time`

Previously, the `synapse_federation_last_received_pdu_time` and
`synapse_federation_last_sent_pdu_time` metrics both used the `server_name` label to
differentiate between different servers that we send and receive events from.

Since we're now using the `server_name` label to differentiate between different Synapse
homeserver instances running in the same process, these metrics have been changed as follows:

- `synapse_federation_last_received_pdu_time` now uses the `origin_server_name` label
- `synapse_federation_last_sent_pdu_time` now uses the `destination_server_name` label

The Grafana dashboard JSON in `contrib/grafana/synapse.json` has been updated to reflect
this change but you will need to manually update your own existing Grafana dashboards
using these metrics.


# Upgrading to v1.135.0

## `on_user_registration` module API callback may now run on any worker
Expand Down
9 changes: 5 additions & 4 deletions scripts-dev/mypy_synapse_plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
import mypy.types
from mypy.erasetype import remove_instance_last_known_values
from mypy.errorcodes import ErrorCode
from mypy.nodes import ARG_NAMED_OPT, ListExpr, NameExpr, TempNode, Var
from mypy.nodes import ARG_NAMED_OPT, ListExpr, NameExpr, TempNode, TupleExpr, Var
from mypy.plugin import (
FunctionLike,
FunctionSigContext,
Expand Down Expand Up @@ -61,6 +61,7 @@ def get_function_signature_hook(
) -> Optional[Callable[[FunctionSigContext], FunctionLike]]:
if fullname in (
"prometheus_client.metrics.Counter",
"prometheus_client.metrics.Gauge",
# TODO: Add other prometheus_client metrics that need checking as we
# refactor, see https://github.com/element-hq/synapse/issues/18592
):
Expand Down Expand Up @@ -98,8 +99,8 @@ def check_prometheus_metric_instantiation(ctx: FunctionSigContext) -> CallableTy
ensures metrics are correctly separated by homeserver.

There are also some metrics that apply at the process level, such as CPU usage,
Python garbage collection, Twisted reactor tick time which shouldn't have the
`SERVER_NAME_LABEL`. In those cases, use use a type ignore comment to disable the
Python garbage collection, and Twisted reactor tick time, which shouldn't have the
`SERVER_NAME_LABEL`. In those cases, use a type ignore comment to disable the
check, e.g. `# type: ignore[missing-server-name-label]`.
"""
# The true signature, this isn't being modified so this is what will be returned.
Expand Down Expand Up @@ -136,7 +137,7 @@ def check_prometheus_metric_instantiation(ctx: FunctionSigContext) -> CallableTy
# ]
# ```
labelnames_arg_expression = ctx.args[2][0] if len(ctx.args[2]) > 0 else None
if isinstance(labelnames_arg_expression, ListExpr):
if isinstance(labelnames_arg_expression, (ListExpr, TupleExpr)):
# Check if the `labelnames` argument includes the `server_name` label (`SERVER_NAME_LABEL`).
for labelname_expression in labelnames_arg_expression.items:
if (
Expand Down
8 changes: 6 additions & 2 deletions synapse/app/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -525,8 +525,12 @@ async def start(hs: "HomeServer") -> None:
)

# Register the threadpools with our metrics.
register_threadpool("default", reactor.getThreadPool())
register_threadpool("gai_resolver", resolver_threadpool)
register_threadpool(
name="default", server_name=server_name, threadpool=reactor.getThreadPool()
)
register_threadpool(
name="gai_resolver", server_name=server_name, threadpool=resolver_threadpool
)

# Set up the SIGHUP machinery.
if hasattr(signal, "SIGHUP"):
Expand Down
34 changes: 26 additions & 8 deletions synapse/app/phone_stats_home.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@

from twisted.internet import defer

from synapse.metrics import SERVER_NAME_LABEL
from synapse.metrics.background_process_metrics import (
run_as_background_process,
)
Expand Down Expand Up @@ -57,16 +58,25 @@
_stats_process: List[Tuple[int, "resource.struct_rusage"]] = []

# Gauges to expose monthly active user control metrics
current_mau_gauge = Gauge("synapse_admin_mau_current", "Current MAU")
current_mau_gauge = Gauge(
"synapse_admin_mau_current",
"Current MAU",
labelnames=[SERVER_NAME_LABEL],
)
current_mau_by_service_gauge = Gauge(
"synapse_admin_mau_current_mau_by_service",
"Current MAU by service",
["app_service"],
labelnames=["app_service", SERVER_NAME_LABEL],
)
max_mau_gauge = Gauge(
"synapse_admin_mau_max",
"MAU Limit",
labelnames=[SERVER_NAME_LABEL],
)
max_mau_gauge = Gauge("synapse_admin_mau_max", "MAU Limit")
registered_reserved_users_mau_gauge = Gauge(
"synapse_admin_mau_registered_reserved_users",
"Registered users with reserved threepids",
labelnames=[SERVER_NAME_LABEL],
)


Expand Down Expand Up @@ -237,13 +247,21 @@ async def _generate_monthly_active_users() -> None:
await store.get_monthly_active_count_by_service()
)
reserved_users = await store.get_registered_reserved_users()
current_mau_gauge.set(float(current_mau_count))
current_mau_gauge.labels(**{SERVER_NAME_LABEL: server_name}).set(
float(current_mau_count)
)

for app_service, count in current_mau_count_by_service.items():
current_mau_by_service_gauge.labels(app_service).set(float(count))

registered_reserved_users_mau_gauge.set(float(len(reserved_users)))
max_mau_gauge.set(float(hs.config.server.max_mau_value))
current_mau_by_service_gauge.labels(
app_service=app_service, **{SERVER_NAME_LABEL: server_name}
).set(float(count))

registered_reserved_users_mau_gauge.labels(
**{SERVER_NAME_LABEL: server_name}
).set(float(len(reserved_users)))
max_mau_gauge.labels(**{SERVER_NAME_LABEL: server_name}).set(
float(hs.config.server.max_mau_value)
)

return run_as_background_process(
"generate_monthly_active_users",
Expand Down
6 changes: 4 additions & 2 deletions synapse/federation/federation_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@
last_pdu_ts_metric = Gauge(
"synapse_federation_last_received_pdu_time",
"The timestamp of the last PDU which was successfully received from the given domain",
labelnames=("server_name",),
labelnames=("origin_server_name", SERVER_NAME_LABEL),
Copy link
Copy Markdown
Contributor Author

@MadLittleMods MadLittleMods Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric was already using the server_name label so I've had to rename the label. I've updated the contrib/grafana/synapse.json but it's something we probably want to call out in the upgrade notes (added)

)


Expand Down Expand Up @@ -554,7 +554,9 @@ async def process_pdu(pdu: EventBase) -> JsonDict:
)

if newest_pdu_ts and origin in self._federation_metrics_domains:
last_pdu_ts_metric.labels(server_name=origin).set(newest_pdu_ts / 1000)
last_pdu_ts_metric.labels(
origin_server_name=origin, **{SERVER_NAME_LABEL: self.server_name}
).set(newest_pdu_ts / 1000)
Comment thread
MadLittleMods marked this conversation as resolved.

return pdu_results

Expand Down
8 changes: 5 additions & 3 deletions synapse/federation/sender/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -705,10 +705,12 @@ async def handle_room_events(events: List[EventBase]) -> None:
assert ts is not None

synapse.metrics.event_processing_lag.labels(
"federation_sender"
name="federation_sender",
**{SERVER_NAME_LABEL: self.server_name},
).set(now - ts)
synapse.metrics.event_processing_last_ts.labels(
"federation_sender"
name="federation_sender",
**{SERVER_NAME_LABEL: self.server_name},
).set(ts)

events_processed_counter.labels(
Expand All @@ -726,7 +728,7 @@ async def handle_room_events(events: List[EventBase]) -> None:
).inc()

synapse.metrics.event_processing_positions.labels(
"federation_sender"
name="federation_sender", **{SERVER_NAME_LABEL: self.server_name}
).set(next_token)

finally:
Expand Down
10 changes: 6 additions & 4 deletions synapse/federation/sender/transaction_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
tags,
whitelisted_homeserver,
)
from synapse.metrics import SERVER_NAME_LABEL
from synapse.types import JsonDict
from synapse.util import json_decoder
from synapse.util.metrics import measure_func
Expand All @@ -47,7 +48,7 @@
last_pdu_ts_metric = Gauge(
"synapse_federation_last_sent_pdu_time",
"The timestamp of the last PDU which was successfully sent to the given domain",
labelnames=("server_name",),
labelnames=("destination_server_name", SERVER_NAME_LABEL),
Copy link
Copy Markdown
Contributor Author

@MadLittleMods MadLittleMods Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric was already using the server_name label so I've had to rename the label. I've updated the contrib/grafana/synapse.json but it's something we probably want to call out in the upgrade notes (added)

)


Expand Down Expand Up @@ -191,6 +192,7 @@ def json_data_cb() -> JsonDict:

if pdus and destination in self._federation_metrics_domains:
last_pdu = pdus[-1]
last_pdu_ts_metric.labels(server_name=destination).set(
last_pdu.origin_server_ts / 1000
)
last_pdu_ts_metric.labels(
destination_server_name=destination,
**{SERVER_NAME_LABEL: self.server_name},
).set(last_pdu.origin_server_ts / 1000)
9 changes: 6 additions & 3 deletions synapse/handlers/appservice.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,8 @@ async def handle_room_events(events: Iterable[EventBase]) -> None:
await self.store.set_appservice_last_pos(upper_bound)

synapse.metrics.event_processing_positions.labels(
"appservice_sender"
name="appservice_sender",
**{SERVER_NAME_LABEL: self.server_name},
).set(upper_bound)

events_processed_counter.labels(
Expand All @@ -230,10 +231,12 @@ async def handle_room_events(events: Iterable[EventBase]) -> None:
assert ts is not None

synapse.metrics.event_processing_lag.labels(
"appservice_sender"
name="appservice_sender",
**{SERVER_NAME_LABEL: self.server_name},
).set(now - ts)
synapse.metrics.event_processing_last_ts.labels(
"appservice_sender"
name="appservice_sender",
**{SERVER_NAME_LABEL: self.server_name},
).set(ts)
finally:
self.is_processing = False
Expand Down
6 changes: 4 additions & 2 deletions synapse/handlers/delayed_events.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from synapse.api.ratelimiting import Ratelimiter
from synapse.config.workers import MAIN_PROCESS_INSTANCE_NAME
from synapse.logging.opentracing import set_tag
from synapse.metrics import event_processing_positions
from synapse.metrics import SERVER_NAME_LABEL, event_processing_positions
from synapse.metrics.background_process_metrics import run_as_background_process
from synapse.replication.http.delayed_events import (
ReplicationAddedDelayedEventRestServlet,
Expand Down Expand Up @@ -191,7 +191,9 @@ async def _unsafe_process_new_event(self) -> None:
self._event_pos = max_pos

# Expose current event processing position to prometheus
event_processing_positions.labels("delayed_events").set(max_pos)
event_processing_positions.labels(
name="delayed_events", **{SERVER_NAME_LABEL: self.server_name}
).set(max_pos)

await self._store.update_delayed_events_stream_pos(max_pos)

Expand Down
6 changes: 3 additions & 3 deletions synapse/handlers/presence.py
Original file line number Diff line number Diff line change
Expand Up @@ -1568,9 +1568,9 @@ async def _unsafe_process(self) -> None:
self._event_pos = max_pos

# Expose current event processing position to prometheus
synapse.metrics.event_processing_positions.labels("presence").set(
max_pos
)
synapse.metrics.event_processing_positions.labels(
name="presence", **{SERVER_NAME_LABEL: self.server_name}
).set(max_pos)

async def _handle_state_delta(self, room_id: str, deltas: List[StateDelta]) -> None:
"""Process current state deltas for the room to find new joins that need
Expand Down
6 changes: 4 additions & 2 deletions synapse/handlers/room_member.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
from synapse.handlers.state_deltas import MatchChange, StateDeltasHandler
from synapse.handlers.worker_lock import NEW_EVENT_DURING_PURGE_LOCK_NAME
from synapse.logging import opentracing
from synapse.metrics import event_processing_positions
from synapse.metrics import SERVER_NAME_LABEL, event_processing_positions
from synapse.metrics.background_process_metrics import run_as_background_process
from synapse.replication.http.push import ReplicationCopyPusherRestServlet
from synapse.storage.databases.main.state_deltas import StateDelta
Expand Down Expand Up @@ -2255,7 +2255,9 @@ async def _unsafe_process(self) -> None:
self.pos = max_pos

# Expose current event processing position to prometheus
event_processing_positions.labels("room_forgetter").set(max_pos)
event_processing_positions.labels(
name="room_forgetter", **{SERVER_NAME_LABEL: self.server_name}
).set(max_pos)

await self._store.update_room_forgetter_stream_pos(max_pos)

Expand Down
6 changes: 4 additions & 2 deletions synapse/handlers/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
)

from synapse.api.constants import EventContentFields, EventTypes, Membership
from synapse.metrics import event_processing_positions
from synapse.metrics import SERVER_NAME_LABEL, event_processing_positions
from synapse.metrics.background_process_metrics import run_as_background_process
from synapse.storage.databases.main.state_deltas import StateDelta
from synapse.types import JsonDict
Expand Down Expand Up @@ -147,7 +147,9 @@ async def _unsafe_process(self) -> None:

logger.debug("Handled room stats to %s -> %s", self.pos, max_pos)

event_processing_positions.labels("stats").set(max_pos)
event_processing_positions.labels(
name="stats", **{SERVER_NAME_LABEL: self.server_name}
).set(max_pos)

self.pos = max_pos

Expand Down
7 changes: 4 additions & 3 deletions synapse/handlers/user_directory.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
)
from synapse.api.errors import Codes, SynapseError
from synapse.handlers.state_deltas import MatchChange, StateDeltasHandler
from synapse.metrics import SERVER_NAME_LABEL
from synapse.metrics.background_process_metrics import run_as_background_process
from synapse.storage.databases.main.state_deltas import StateDelta
from synapse.storage.databases.main.user_directory import SearchResult
Expand Down Expand Up @@ -262,9 +263,9 @@ async def _unsafe_process(self) -> None:
self.pos = max_pos

# Expose current event processing position to prometheus
synapse.metrics.event_processing_positions.labels("user_dir").set(
max_pos
)
synapse.metrics.event_processing_positions.labels(
name="user_dir", **{SERVER_NAME_LABEL: self.server_name}
).set(max_pos)

await self.store.update_user_directory_stream_pos(max_pos)

Expand Down
Loading
Loading