Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
5f7d50d
Add total message and e2ee event counts to stats reporting
anoadragon453 Mar 21, 2025
b4e42a4
Test phone home stats
anoadragon453 Mar 21, 2025
5213c87
newsfile
anoadragon453 Mar 21, 2025
c8de2c5
wip
anoadragon453 Mar 26, 2025
cc9e31a
Merge branch 'develop' into anoa/export_total_message_count
MadLittleMods Apr 7, 2025
f0d6b26
Add `total_event_count`
MadLittleMods Apr 7, 2025
032ae5e
Add descriptions
MadLittleMods Apr 7, 2025
575bbe9
Fix some table/column mismatches
MadLittleMods Apr 7, 2025
9b8d9a9
Fill in trigger logic
MadLittleMods Apr 7, 2025
e320f2d
Can only run one statement at a time
MadLittleMods Apr 7, 2025
7c9fcb3
Adjust names
MadLittleMods Apr 7, 2025
8a04a08
Fix some background update lints
MadLittleMods Apr 7, 2025
90e57ff
Fix `builtins.IndexError: tuple index out of range`
MadLittleMods Apr 7, 2025
942d066
Iterate on background update
MadLittleMods Apr 7, 2025
ba40e0c
Better names
MadLittleMods Apr 7, 2025
5a3da6a
Refactor pattern to add triggers first in the background update
MadLittleMods Apr 7, 2025
6a1a03c
`backfill` -> `populate`
MadLittleMods Apr 7, 2025
4c65945
Make sure the trigger SQL can be run multiple times
MadLittleMods Apr 7, 2025
6cb50b0
Docs
MadLittleMods Apr 7, 2025
a5fedfb
Fix Postgres trigger syntax
MadLittleMods Apr 7, 2025
20f8dbe
Fix stats
MadLittleMods Apr 7, 2025
ab38bdb
Fix lints
MadLittleMods Apr 7, 2025
1e65719
Test `total_event_count`
MadLittleMods Apr 7, 2025
18d122e
We probably just need to wait for the background updates (not re-run it)
MadLittleMods Apr 7, 2025
f24c47d
Better description of the magic number
MadLittleMods Apr 7, 2025
7c26de5
Move things out of `prepare`
MadLittleMods Apr 7, 2025
845c29a
Fix lints
MadLittleMods Apr 7, 2025
8ad9664
Update changelog
MadLittleMods Apr 7, 2025
f713ea0
Test non-working background update with `events`
MadLittleMods Apr 7, 2025
eeb6dba
Working `_populate_txn`
MadLittleMods Apr 8, 2025
55cff0e
Add tests for the background updates
MadLittleMods Apr 8, 2025
1e68f6a
Merge branch 'develop' into anoa/export_total_message_count
MadLittleMods Apr 8, 2025
0985e14
Better doc comment
MadLittleMods Apr 8, 2025
0a06be3
Fix lints
MadLittleMods Apr 8, 2025
a67e185
Merge branch 'develop' into anoa/export_total_message_count
MadLittleMods Apr 11, 2025
6381d99
Use correct delta ordering number
MadLittleMods Apr 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,12 @@ The following statistics are sent to the configured reporting endpoint:
| `python_version` | string | The Python version number in use (e.g "3.7.1"). Taken from `sys.version_info`. |
| `total_users` | int | The number of registered users on the homeserver. |
| `total_nonbridged_users` | int | The number of users, excluding those created by an Application Service. |
| `daily_user_type_native` | int | The number of native users created in the last 24 hours. |
| `daily_user_type_native` | int | The number of native, non-guest users created in the last 24 hours. |
| `daily_user_type_guest` | int | The number of guest users created in the last 24 hours. |
| `daily_user_type_bridged` | int | The number of users created by Application Services in the last 24 hours. |
| `total_room_count` | int | The total number of rooms present on the homeserver. |
| `total_message_count` | int | The total number of non-state events with type `m.room.message` present on the homeserver. |
| `total_e2ee_event_count` | int | The total number of non-state events with type `m.room.encrypted` present on the homeserver. This can be used as a slight over-estimate for the number of encrypted messages. |
| `daily_active_users` | int | The number of unique users[^1] that have used the homeserver in the last 24 hours. |
| `monthly_active_users` | int | The number of unique users[^1] that have used the homeserver in the last 30 days. |
| `daily_active_rooms` | int | The number of rooms that have had a (state) event with the type `m.room.message` sent in them in the last 24 hours. |
Expand All @@ -50,8 +52,8 @@ The following statistics are sent to the configured reporting endpoint:
| `cache_factor` | int | The configured [`global factor`](../../configuration/config_documentation.md#caching) value for caching. |
| `event_cache_size` | int | The configured [`event_cache_size`](../../configuration/config_documentation.md#caching) value for caching. |
| `database_engine` | string | The database engine that is in use. Either "psycopg2" meaning PostgreSQL is in use, or "sqlite3" for SQLite3. |
| `database_server_version` | string | The version of the database server. Examples being "10.10" for PostgreSQL server version 10.0, and "3.38.5" for SQLite 3.38.5 installed on the system. |
| `log_level` | string | The log level in use. Examples are "INFO", "WARNING", "ERROR", "DEBUG", etc. |
| `database_server_version` | string | The version of the database server. Examples being "10.10" for PostgreSQL server version 10.0, and "3.38.5" for SQLite 3.38.5 installed on the system. |
| `log_level` | string | The log level in use. Examples are "INFO", "WARNING", "ERROR", "DEBUG", etc. |


[^1]: Native matrix users and guests are always counted. If the
Expand Down
2 changes: 2 additions & 0 deletions synapse/app/phone_stats_home.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,8 @@ async def phone_stats_home(

room_count = await store.get_room_count()
stats["total_room_count"] = room_count
stats["total_message_count"] = await store.count_total_messages()
stats["total_e2ee_event_count"] = await store.count_total_e2ee_events()

stats["daily_active_users"] = common_metrics.daily_active_users
stats["monthly_active_users"] = await store.count_monthly_users()
Expand Down
40 changes: 40 additions & 0 deletions synapse/storage/databases/main/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,46 @@ def _count_messages(txn: LoggingTransaction) -> int:

return await self.db_pool.runInteraction("count_e2ee_messages", _count_messages)

async def count_total_messages(self) -> int:
"""
Returns the total number of `m.room.message` events present on the
server.
"""

def _count_total_messages(txn: LoggingTransaction) -> int:
sql = """
SELECT COUNT(*) FROM events
WHERE type = 'm.room.message'
AND state_key IS NULL
Copy link
Copy Markdown
Contributor

@MadLittleMods MadLittleMods Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: There isn't an index for type or state_key on the events table. This probably won't be play nice with the database.

Perhaps we're okay with the full table scan?

We do have similar queries for the daily counts but they are restricted to only scan over the current days worth of messages using stream_ordering

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. I just tested on matrix.org, and it takes about ~5m to run this query. This isn't the worst thing for a metrics job that runs every 3hrs. However, it is concerning that a connection to the DB would be taken up for so long.

We could eliminate that concern by chunking the count query through batching on stream_ordering (which does have an index). But you'd still take significantly longer to generate the metrics than we do today.

We could also add a partial index on WHERE state_key IS NULL AND (type = 'm.room.message OR type = 'm.room.encrypted'). This will take up significantly less disk space than a full index on both type and state_key, while still making the query extremely quick.

The events table on matrix.org right now is ~4300GB. m.room.{encrypted,message} make up ~86.5% of the table, with the extreme majority of those rows having state_key = NULL; so the index would be roughly ~750GB:

Partial Index Size ≈ Table Size × % of Matching Rows × Index Ratio (10-30%)
750GB ≈ 4300GB × 0.865 × 0.20

A full index would be 1.5 - 2.5TB across both type and state_key. A partial index does reduce the flexibility of queries we can make, but I don't think we should add indexes with the hope of using them in the future, especially if it comes at the cost of a lot of disk space.

We'd need to add a background update to compute the index. And then, presumably, set these fields to 0 until the partial index has finished being added.

Does that sound like a reasonable path forward?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like a waste for numbers that probably won't be looked at. Are these stats interesting at all beyond our daily counts?

We do have daily_user_type_xxx vs total_users and daily_active_rooms vstotal_room_count. So this is pretty much the equivalent for daily_sent_messages and daily_sent_e2ee_messages 👍


Overall, seems like an ok plan and necessary evil if we want this feature

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikjohnston mentioned internally that such a large index would cause significant strain on performance as you'd need to pull out the 750GB index from disk each time in order to perform the count.

@reivilibre suggested that instead we keep track of the stream_ordering when we scan. So we'd only need to do a full scan once, then subsequently we'd only need a very fast query to scan the rows that have been added since the last scan. This eliminates the need for an index, which making every query other than the first very fast.

However, it doesn't account for the data becoming out of sync over time as rooms and events are deleted by users. To account for this, you could do a full rescan every so often to reset the error drift.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am never sure how I feel about suggesting this, but you could also in theory use triggers. Perhaps using triggers for decrementing the count when an event is deleted, would not be the worst thing ever.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would add a slightly more processing time to each event insert and removal, but likely a negligible amount? And would completely eliminate the massive query needed to count everything. I like this idea.

We'd still need a background update to initially populate the table, but that's reasonable.

I think I'll give that a shot. It's certainly much simpler than a batch job running on a timer. Thanks for suggesting it!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to add a background job which adds the triggers and populates the event_stats table from the existing events

"""
txn.execute(sql)
(count,) = cast(Tuple[int], txn.fetchone())
return count

return await self.db_pool.runInteraction(
"count_total_messages", _count_total_messages
)

async def count_total_e2ee_events(self) -> int:
"""
Returns the total number of `m.room.encrypted` events present on the
server.
"""

def _count_total_e2ee_events(txn: LoggingTransaction) -> int:
sql = """
SELECT COUNT(*) FROM events
WHERE type = 'm.room.encrypted'
AND state_key IS NULL
"""
txn.execute(sql)
(count,) = cast(Tuple[int], txn.fetchone())
return count

return await self.db_pool.runInteraction(
"count_total_e2ee_events", _count_total_e2ee_events
)

async def count_daily_sent_e2ee_messages(self) -> int:
def _count_messages(txn: LoggingTransaction) -> int:
# This is good enough as if you have silly characters in your own
Expand Down