Prune old rows in `device_lists_changes_in_room` table. by erikjohnston · Pull Request #19473 · element-hq/synapse

erikjohnston · 2026-02-17T11:12:36Z

The usages of the table mostly already correctly handled if we don't have old entries, as that was needed when we first added the table.

I arbitrarily set the prune time to 30 days. The only use for old entries is for sync streams that haven't synced since then, and we should very rarely see sync streams that haven't been used in 30 days.

Reviewable commit-by-commit.

We called `txn.rowcount` *after* we used the txn for something else, so we were no longer counting rows deleted by the prune but rows inserted into the cache stream. This caused a tight loop.

reivilibre

Structurally sane :)

reivilibre · 2026-02-27T13:15:09Z

+            return num_deleted
+
+        num_rows_deleted = 0
+        while True:


Are we happy that there isn't much in the way of rate control on these deletions?

Co-authored-by: Olivier 'reivilibre' <oliverw@element.io>

…evices_in_rooms

Co-authored-by: Olivier 'reivilibre' <olivier@librepush.net>

reivilibre

Seems like it's basically there, save for a bit of potential raciness and maybe a test?

reivilibre · 2026-04-09T17:58:57Z

+        if changes is not None:
+            local_changes = {(u, d) for u, d in changes if self.hs.is_mine_id(u)}
+        else:
+            # The `device_lists_stream_id` is too old, so we need to fall back


how much of a pain would it be to stand up a test against this case? I worry this is probably untested code. Being an edge case, it will therefore probably go unnoticed if it breaks until a very awkward moment. Would be nice to have something

Right, managed to wrangle a test bb6c559

Co-authored-by: Olivier 'reivilibre' <oliverw@element.io>

Instead of checking it in a separate transaction, check in the transaction we're reading the table from.

reivilibre

Thanks!

reivilibre · 2026-04-13T13:52:53Z

+        "synapse.storage.databases.main.devices.PRUNE_DEVICE_LISTS_CHANGES_IN_ROOM_AGE",
+        Duration(minutes=1),
+    )
+    def test_local_device_changes_sent_to_new_servers_on_un_partial_state(


I imagine it was reasonably fiddly to get this test all pieced together; thanks for writing it!

Rather than trying to infer it from the minimum ID in the table. We were seeing issues in CI due to not all device list updates having entries in the `device_lists_changes_in_rooms` table due to the user not being in any rooms. This meant that the returned minimum ID was larger than expected, causing failures when calling `get_all_device_list_changes(..)`. By explicitly tracking the max pruned ID, we don't have to worry about problems with trying to infer the actual max pruned ID.

erikjohnston · 2026-04-14T15:03:31Z

@reivilibre sorry, the worker integration tests raised an issue around the fact that taking the minimum stream ID of device_lists_changes_in_room is not a reliable way of determining the minimum safe stream ID. fee4c3f changes things so that we explicitly track in a dedicated table the maximum stream ID that we have pruned

reivilibre

Probably sane but would be good to have a really good idea of what the new table is for; it does feel like the kind of thing someone will dig up years down the line and have to scratch their head over it.

It doesn't sound insane so probably just a matter of getting the illustration right, e.g. with a step by step sequence to show what goes wrong with the 'simple' MIN(stream_id)

reivilibre · 2026-04-14T16:18:01Z

+-- the table cannot provide a complete answer.
+--
+-- This replaces the previous approach of using MIN(stream_id) on the
+-- device_lists_changes_in_room table, which incorrectly returned 0 when


Ironically this new table is more confusing to me, as in I don't see why this is preventing anything.

By my reading, COALESCE(MIN(stream_id), 0) is the reason the previous approach returned 0 when empty. (So I think the comment here is maybe a bit misleading, or doesn't quite reveal the reason we need this?)

Trying to understand the difference, here is a (step by step) comparison table as I see it:

Old _get_min_device_lists_changes_in_room New _get_max_pruned_device_lists_changes_in_room_txn

Seed (empty table) returns 0 returns 0

New rows added with stream_ids 2..=15 (empty table) returns 2 still returns 0

Prune with prune_before_stream_id 10 now returns MIN(stream_id) = 10 now returns 10 (explicitly stored)

Old _get_min_device_lists_changes_in_room New _get_max_pruned_device_lists_changes_in_room_txn

Seed (populated table with stream_ids 2..=9) returns MIN(stream_id) = 2 returns MIN(stream_id) - 1 = 1

New rows added with stream_ids 10..=15 (populated table) still returns 2 still returns 1

Prune with prune_before_stream_id 10 now returns MIN(stream_id) = 10 now returns 10 (explicitly stored)

Apart from the initial state before the first prune, it looks like both approaches behave the same between prunes?
What am I missing? Maybe the device_lists_changes_in_room rows get deleted as rooms get purged?

By my reading, COALESCE(MIN(stream_id), 0) is the reason the previous approach returned 0 when empty. (So I think the comment here is maybe a bit misleading, or doesn't quite reveal the reason we need this?)

Argh, sorry that comment should have been deleted. I was messing around with testing getting LLM to generate the patch and it misunderstood the rationale (but got the change correct). I changed it locally but it got swallowed.

The actual problem (as per commit comment) comes from when we generate stream positions that don't have associated data in the device_lists_changes_in_rooms table, e.g. because the user isn't in any rooms. In that case if we insert a row into device_lists_stream table but not in device_lists_changes_in_rooms, and then later one that inserts rows into both, then MIN(stream_id) will return a stream ID greater than the first row even though its data hasn't been pruned (there was just no associated data to fetch).

reivilibre

LGTM otherwise, thanks!

reivilibre · 2026-04-16T10:04:20Z

+-- it's safe to read from that table for a given stream_id — if the
+-- requested stream_id is < the value here, the data has been pruned and
+-- the table cannot provide a complete answer.
+CREATE TABLE IF NOT EXISTS device_lists_changes_in_room_max_pruned_stream_id (


would be good to still have a little distilled comment about why we need this table, which is that device_lists_stream somehow ties into this (I don't have a great picture of what this is otherwise I'd suggest some wording)

Does e49192d help?

Follows on from #19473. We should be recording where we have deleted up to in the same transaction as we perform the delete, rather than at the end. Also let's log more regularly, as the initial set of deletions will likely take a long time

Follows on from #19473. We should be recording where we have deleted up to in the same transaction as we perform the delete, rather than at the end. This code only starts deleting rows after a month (and the original PR isn't in a release yet), so no server should have run into this problem yet. Also let's log more regularly, as the initial set of deletions will likely take a long time.

erikjohnston added 6 commits February 16, 2026 14:37

Handle device list ID too old in get_device_list_changes_in_room

e945747

Add index

6883d48

Insert timestamp

155e941

Add background job to prune old device lists changes

dd612e0

Add test

0041c7e

Newsfile

fb88b76

erikjohnston marked this pull request as ready for review February 17, 2026 11:33

erikjohnston requested a review from a team as a code owner February 17, 2026 11:33

Fix tight loop

c517139

We called `txn.rowcount` *after* we used the txn for something else, so we were no longer counting rows deleted by the prune but rows inserted into the cache stream. This caused a tight loop.

reivilibre self-requested a review February 26, 2026 17:03

reivilibre requested changes Feb 27, 2026

View reviewed changes

erikjohnston and others added 12 commits April 1, 2026 14:53

Update synapse/storage/databases/main/devices.py

50fd385

Co-authored-by: Olivier 'reivilibre' <oliverw@element.io>

Update docstring to mention what None means

9069a83

Finish docstring for _prune_device_lists_changes_in_room

0179444

Merge remote-tracking branch 'origin/develop' into erikj/remove_old_d…

cfae5da

…evices_in_rooms

Ratelimit how often we delete stuff from the device lists table

de620fa

Add a lower limit to deletion

76e050b

Apply suggestions from code review

19c26bb

Co-authored-by: Olivier 'reivilibre' <olivier@librepush.net>

Note why inserted_Ts can be none

484ca3a

Comment on to_key=None meaning

d41552a

Move delta to correct folder

1bdd726

Fix CRLF

b2e3213

Fix SQLite support

31d2d9e

erikjohnston requested a review from reivilibre April 1, 2026 15:54

reivilibre reviewed Apr 9, 2026

View reviewed changes

Comment thread synapse/storage/databases/main/devices.py Outdated

reivilibre requested changes Apr 9, 2026

View reviewed changes

erikjohnston and others added 4 commits April 10, 2026 11:37

Update synapse/storage/databases/main/devices.py

a1b03f1

Co-authored-by: Olivier 'reivilibre' <oliverw@element.io>

Update synapse/storage/databases/main/devices.py

76eb337

Co-authored-by: Olivier 'reivilibre' <oliverw@element.io>

Fix race checking minimum stream

4712139

Instead of checking it in a separate transaction, check in the transaction we're reading the table from.

Add where clause to inserted_ts index

846491f

MadLittleMods added the A-Database label Apr 10, 2026

erikjohnston requested a review from reivilibre April 13, 2026 09:32

Add tests

756675c

erikjohnston force-pushed the erikj/remove_old_devices_in_rooms branch from bb6c559 to 756675c Compare April 13, 2026 09:49

reivilibre approved these changes Apr 13, 2026

View reviewed changes

Inline SQL

524b93f

erikjohnston force-pushed the erikj/remove_old_devices_in_rooms branch from 68b2415 to 5e5c3f2 Compare April 14, 2026 14:22

erikjohnston force-pushed the erikj/remove_old_devices_in_rooms branch from 5e5c3f2 to fee4c3f Compare April 14, 2026 14:32

erikjohnston requested a review from reivilibre April 14, 2026 15:01

reivilibre requested changes Apr 14, 2026

View reviewed changes

erikjohnston added 3 commits April 14, 2026 17:31

Fix up comments on table

aa2544b

We can assume no rows have yet been deleted

22ac839

Fixup insert

a4dc708

erikjohnston requested a review from reivilibre April 15, 2026 10:46

reivilibre approved these changes Apr 16, 2026

View reviewed changes

erikjohnston added 3 commits April 16, 2026 12:35

Move delete sql

2ef41a9

Add why we need a separate table

e49192d

Lint

eafd1ee

erikjohnston merged commit 2a82859 into develop Apr 17, 2026
78 of 81 checks passed

erikjohnston deleted the erikj/remove_old_devices_in_rooms branch April 17, 2026 10:54

erikjohnston mentioned this pull request Apr 20, 2026

Fix race in new pruning of device lists tables. #19709

Merged

	Old `_get_min_device_lists_changes_in_room`	New `_get_max_pruned_device_lists_changes_in_room_txn`
Seed (empty table)	returns 0	returns 0
New rows added with stream_ids 2..=15 (empty table)	returns 2	still returns 0
Prune with prune_before_stream_id 10	now returns MIN(stream_id) = 10	now returns 10 (explicitly stored)

	Old `_get_min_device_lists_changes_in_room`	New `_get_max_pruned_device_lists_changes_in_room_txn`
Seed (populated table with stream_ids 2..=9)	returns MIN(stream_id) = 2	returns MIN(stream_id) - 1 = 1
New rows added with stream_ids 10..=15 (populated table)	still returns 2	still returns 1
Prune with prune_before_stream_id 10	now returns MIN(stream_id) = 10	now returns 10 (explicitly stored)

Conversation

erikjohnston commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reivilibre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

reivilibre Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

erikjohnston Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

reivilibre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

reivilibre Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

erikjohnston Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

reivilibre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

reivilibre Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

erikjohnston commented Apr 14, 2026

Uh oh!

reivilibre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

reivilibre Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erikjohnston Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

reivilibre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

reivilibre Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

erikjohnston Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erikjohnston commented Feb 17, 2026 •

edited

Loading

erikjohnston Apr 1, 2026 •

edited

Loading

reivilibre Apr 14, 2026 •

edited

Loading