Skip to content

Prune old rows in device_lists_changes_in_room table.#19473

Merged
erikjohnston merged 32 commits intodevelopfrom
erikj/remove_old_devices_in_rooms
Apr 17, 2026
Merged

Prune old rows in device_lists_changes_in_room table.#19473
erikjohnston merged 32 commits intodevelopfrom
erikj/remove_old_devices_in_rooms

Conversation

@erikjohnston
Copy link
Copy Markdown
Member

@erikjohnston erikjohnston commented Feb 17, 2026

Fixes #13043

The usages of the table mostly already correctly handled if we don't have old entries, as that was needed when we first added the table.

I arbitrarily set the prune time to 30 days. The only use for old entries is for sync streams that haven't synced since then, and we should very rarely see sync streams that haven't been used in 30 days.

Reviewable commit-by-commit.

@erikjohnston erikjohnston marked this pull request as ready for review February 17, 2026 11:33
@erikjohnston erikjohnston requested a review from a team as a code owner February 17, 2026 11:33
We called `txn.rowcount` *after* we used the txn for something else, so
we were no longer counting rows deleted by the prune but rows inserted
into the cache stream. This caused a tight loop.
@reivilibre reivilibre self-requested a review February 26, 2026 17:03
Copy link
Copy Markdown
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structurally sane :)

Comment thread synapse/storage/databases/main/devices.py Outdated
Comment thread synapse/storage/databases/main/devices.py
Comment thread synapse/storage/databases/main/devices.py Outdated
Comment thread synapse/storage/databases/main/devices.py Outdated
return num_deleted

num_rows_deleted = 0
while True:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we happy that there isn't much in the way of rate control on these deletions?

Copy link
Copy Markdown
Member Author

@erikjohnston erikjohnston Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread synapse/storage/databases/main/devices.py
Comment thread synapse/storage/schema/main/delta/93/05_device_lists_room_timestamp.sql Outdated
Comment thread synapse/storage/databases/main/devices.py Outdated
Comment thread synapse/storage/databases/main/devices.py
@erikjohnston erikjohnston requested a review from reivilibre April 1, 2026 15:54
Comment thread synapse/storage/databases/main/devices.py Outdated
Copy link
Copy Markdown
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it's basically there, save for a bit of potential raciness and maybe a test?

Comment thread synapse/storage/databases/main/devices.py Outdated
Comment thread synapse/storage/databases/main/devices.py Outdated
if changes is not None:
local_changes = {(u, d) for u, d in changes if self.hs.is_mine_id(u)}
else:
# The `device_lists_stream_id` is too old, so we need to fall back
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much of a pain would it be to stand up a test against this case? I worry this is probably untested code. Being an edge case, it will therefore probably go unnoticed if it breaks until a very awkward moment. Would be nice to have something

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, managed to wrangle a test bb6c559

Comment thread synapse/storage/databases/main/devices.py Outdated
erikjohnston and others added 4 commits April 10, 2026 11:37
Co-authored-by: Olivier 'reivilibre' <oliverw@element.io>
Co-authored-by: Olivier 'reivilibre' <oliverw@element.io>
Instead of checking it in a separate transaction, check in the
transaction we're reading the table from.
@erikjohnston erikjohnston force-pushed the erikj/remove_old_devices_in_rooms branch from bb6c559 to 756675c Compare April 13, 2026 09:49
Copy link
Copy Markdown
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread synapse/storage/databases/main/devices.py Outdated
"synapse.storage.databases.main.devices.PRUNE_DEVICE_LISTS_CHANGES_IN_ROOM_AGE",
Duration(minutes=1),
)
def test_local_device_changes_sent_to_new_servers_on_un_partial_state(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine it was reasonably fiddly to get this test all pieced together; thanks for writing it!

@erikjohnston erikjohnston force-pushed the erikj/remove_old_devices_in_rooms branch from 68b2415 to 5e5c3f2 Compare April 14, 2026 14:22
Rather than trying to infer it from the minimum ID in the table.

We were seeing issues in CI due to not all device list updates having
entries in the `device_lists_changes_in_rooms` table due to the user not
being in any rooms. This meant that the returned minimum ID was larger
than expected, causing failures when calling
`get_all_device_list_changes(..)`.

By explicitly tracking the max pruned ID, we don't have to worry about
problems with trying to infer the actual max pruned ID.
@erikjohnston erikjohnston force-pushed the erikj/remove_old_devices_in_rooms branch from 5e5c3f2 to fee4c3f Compare April 14, 2026 14:32
@erikjohnston erikjohnston requested a review from reivilibre April 14, 2026 15:01
@erikjohnston
Copy link
Copy Markdown
Member Author

@reivilibre sorry, the worker integration tests raised an issue around the fact that taking the minimum stream ID of device_lists_changes_in_room is not a reliable way of determining the minimum safe stream ID. fee4c3f changes things so that we explicitly track in a dedicated table the maximum stream ID that we have pruned

Copy link
Copy Markdown
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably sane but would be good to have a really good idea of what the new table is for; it does feel like the kind of thing someone will dig up years down the line and have to scratch their head over it.

It doesn't sound insane so probably just a matter of getting the illustration right, e.g. with a step by step sequence to show what goes wrong with the 'simple' MIN(stream_id)

Comment thread synapse/storage/databases/main/devices.py
-- the table cannot provide a complete answer.
--
-- This replaces the previous approach of using MIN(stream_id) on the
-- device_lists_changes_in_room table, which incorrectly returned 0 when
Copy link
Copy Markdown
Contributor

@reivilibre reivilibre Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically this new table is more confusing to me, as in I don't see why this is preventing anything.

By my reading, COALESCE(MIN(stream_id), 0) is the reason the previous approach returned 0 when empty. (So I think the comment here is maybe a bit misleading, or doesn't quite reveal the reason we need this?)

Trying to understand the difference, here is a (step by step) comparison table as I see it:

Old _get_min_device_lists_changes_in_room New _get_max_pruned_device_lists_changes_in_room_txn
Seed (empty table) returns 0 returns 0
New rows added with stream_ids 2..=15 (empty table) returns 2 still returns 0
Prune with prune_before_stream_id 10 now returns MIN(stream_id) = 10 now returns 10 (explicitly stored)
Old _get_min_device_lists_changes_in_room New _get_max_pruned_device_lists_changes_in_room_txn
Seed (populated table with stream_ids 2..=9) returns MIN(stream_id) = 2 returns MIN(stream_id) - 1 = 1
New rows added with stream_ids 10..=15 (populated table) still returns 2 still returns 1
Prune with prune_before_stream_id 10 now returns MIN(stream_id) = 10 now returns 10 (explicitly stored)

Apart from the initial state before the first prune, it looks like both approaches behave the same between prunes?
What am I missing? Maybe the device_lists_changes_in_room rows get deleted as rooms get purged?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By my reading, COALESCE(MIN(stream_id), 0) is the reason the previous approach returned 0 when empty. (So I think the comment here is maybe a bit misleading, or doesn't quite reveal the reason we need this?)

Argh, sorry that comment should have been deleted. I was messing around with testing getting LLM to generate the patch and it misunderstood the rationale (but got the change correct). I changed it locally but it got swallowed.

The actual problem (as per commit comment) comes from when we generate stream positions that don't have associated data in the device_lists_changes_in_rooms table, e.g. because the user isn't in any rooms. In that case if we insert a row into device_lists_stream table but not in device_lists_changes_in_rooms, and then later one that inserts rows into both, then MIN(stream_id) will return a stream ID greater than the first row even though its data hasn't been pruned (there was just no associated data to fetch).

@erikjohnston erikjohnston requested a review from reivilibre April 15, 2026 10:46
Copy link
Copy Markdown
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise, thanks!

Comment thread synapse/storage/databases/main/devices.py
-- it's safe to read from that table for a given stream_id — if the
-- requested stream_id is < the value here, the data has been pruned and
-- the table cannot provide a complete answer.
CREATE TABLE IF NOT EXISTS device_lists_changes_in_room_max_pruned_stream_id (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to still have a little distilled comment about why we need this table, which is that device_lists_stream somehow ties into this (I don't have a great picture of what this is otherwise I'd suggest some wording)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does e49192d help?

@erikjohnston erikjohnston merged commit 2a82859 into develop Apr 17, 2026
78 of 81 checks passed
@erikjohnston erikjohnston deleted the erikj/remove_old_devices_in_rooms branch April 17, 2026 10:54
erikjohnston added a commit that referenced this pull request Apr 20, 2026
Follows on from #19473.

We should be recording where we have deleted up to in the same
transaction as we perform the delete, rather than at the end.

Also let's log more regularly, as the initial set of deletions will
likely take a long time
erikjohnston added a commit that referenced this pull request Apr 20, 2026
Follows on from #19473.

We should be recording where we have deleted up to in the same
transaction as we perform the delete, rather than at the end.

Also let's log more regularly, as the initial set of deletions will
likely take a long time
erikjohnston added a commit that referenced this pull request Apr 20, 2026
Follows on from #19473.

We should be recording where we have deleted up to in the same
transaction as we perform the delete, rather than at the end.

Also let's log more regularly, as the initial set of deletions will
likely take a long time
erikjohnston added a commit that referenced this pull request Apr 20, 2026
Follows on from #19473.

We should be recording where we have deleted up to in the same
transaction as we perform the delete, rather than at the end.

Also let's log more regularly, as the initial set of deletions will
likely take a long time
erikjohnston added a commit that referenced this pull request Apr 20, 2026
Follows on from #19473.

We should be recording where we have deleted up to in the same
transaction as we perform the delete, rather than at the end.

Also let's log more regularly, as the initial set of deletions will
likely take a long time
erikjohnston added a commit that referenced this pull request Apr 21, 2026
Follows on from #19473.

We should be recording where we have deleted up to in the same
transaction as we perform the delete, rather than at the end. This code
only starts deleting rows after a month (and the original PR isn't in a
release yet), so no server should have run into this problem yet.

Also let's log more regularly, as the initial set of deletions will
likely take a long time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clear out device_lists_changes_in_room table periodically.

3 participants