-
Notifications
You must be signed in to change notification settings - Fork 522
Add total event, unencrypted message, and e2ee event counts to stats reporting #18260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
5f7d50d
Add total message and e2ee event counts to stats reporting
anoadragon453 b4e42a4
Test phone home stats
anoadragon453 5213c87
newsfile
anoadragon453 c8de2c5
wip
anoadragon453 cc9e31a
Merge branch 'develop' into anoa/export_total_message_count
MadLittleMods f0d6b26
Add `total_event_count`
MadLittleMods 032ae5e
Add descriptions
MadLittleMods 575bbe9
Fix some table/column mismatches
MadLittleMods 9b8d9a9
Fill in trigger logic
MadLittleMods e320f2d
Can only run one statement at a time
MadLittleMods 7c9fcb3
Adjust names
MadLittleMods 8a04a08
Fix some background update lints
MadLittleMods 90e57ff
Fix `builtins.IndexError: tuple index out of range`
MadLittleMods 942d066
Iterate on background update
MadLittleMods ba40e0c
Better names
MadLittleMods 5a3da6a
Refactor pattern to add triggers first in the background update
MadLittleMods 6a1a03c
`backfill` -> `populate`
MadLittleMods 4c65945
Make sure the trigger SQL can be run multiple times
MadLittleMods 6cb50b0
Docs
MadLittleMods a5fedfb
Fix Postgres trigger syntax
MadLittleMods 20f8dbe
Fix stats
MadLittleMods ab38bdb
Fix lints
MadLittleMods 1e65719
Test `total_event_count`
MadLittleMods 18d122e
We probably just need to wait for the background updates (not re-run it)
MadLittleMods f24c47d
Better description of the magic number
MadLittleMods 7c26de5
Move things out of `prepare`
MadLittleMods 845c29a
Fix lints
MadLittleMods 8ad9664
Update changelog
MadLittleMods f713ea0
Test non-working background update with `events`
MadLittleMods eeb6dba
Working `_populate_txn`
MadLittleMods 55cff0e
Add tests for the background updates
MadLittleMods 1e68f6a
Merge branch 'develop' into anoa/export_total_message_count
MadLittleMods 0985e14
Better doc comment
MadLittleMods 0a06be3
Fix lints
MadLittleMods a67e185
Merge branch 'develop' into anoa/export_total_message_count
MadLittleMods 6381d99
Use correct delta ordering number
MadLittleMods File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by: There isn't an index for
typeorstate_keyon theeventstable. This probably won't be play nice with the database.Perhaps we're okay with the full table scan?
We do have similar queries for the daily counts but they are restricted to only scan over the current days worth of messages using
stream_orderingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. I just tested on matrix.org, and it takes about ~5m to run this query. This isn't the worst thing for a metrics job that runs every 3hrs. However, it is concerning that a connection to the DB would be taken up for so long.
We could eliminate that concern by chunking the count query through batching on
stream_ordering(which does have an index). But you'd still take significantly longer to generate the metrics than we do today.We could also add a partial index on
WHERE state_key IS NULL AND (type = 'm.room.message OR type = 'm.room.encrypted'). This will take up significantly less disk space than a full index on bothtypeandstate_key, while still making the query extremely quick.The
eventstable on matrix.org right now is ~4300GB.m.room.{encrypted,message}make up ~86.5% of the table, with the extreme majority of those rows havingstate_key = NULL; so the index would be roughly ~750GB:A full index would be 1.5 - 2.5TB across both
typeandstate_key. A partial index does reduce the flexibility of queries we can make, but I don't think we should add indexes with the hope of using them in the future, especially if it comes at the cost of a lot of disk space.We'd need to add a background update to compute the index. And then, presumably, set these fields to
0until the partial index has finished being added.Does that sound like a reasonable path forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like a waste for numbers that probably won't be looked at. Are these stats interesting at all beyond our daily counts?
We do have
daily_user_type_xxxvstotal_usersanddaily_active_roomsvstotal_room_count. So this is pretty much the equivalent fordaily_sent_messagesanddaily_sent_e2ee_messages👍Overall, seems like an ok plan and necessary evil if we want this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erikjohnston mentioned internally that such a large index would cause significant strain on performance as you'd need to pull out the 750GB index from disk each time in order to perform the count.
@reivilibre suggested that instead we keep track of the
stream_orderingwhen we scan. So we'd only need to do a full scan once, then subsequently we'd only need a very fast query to scan the rows that have been added since the last scan. This eliminates the need for an index, which making every query other than the first very fast.However, it doesn't account for the data becoming out of sync over time as rooms and events are deleted by users. To account for this, you could do a full rescan every so often to reset the error drift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am never sure how I feel about suggesting this, but you could also in theory use triggers. Perhaps using triggers for decrementing the count when an event is deleted, would not be the worst thing ever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would add a slightly more processing time to each event insert and removal, but likely a negligible amount? And would completely eliminate the massive query needed to count everything. I like this idea.
We'd still need a background update to initially populate the table, but that's reasonable.
I think I'll give that a shot. It's certainly much simpler than a batch job running on a timer. Thanks for suggesting it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to add a background job which adds the triggers and populates the
event_statstable from the existingevents⏩