Title modified.
Description
Enabling media_upload_limits may generate junk files that can take up server disk space. The amount of junk files generated depends on whether users continue to upload after hitting the limit.
media_upload_limits: If a user triggers this limit, Synapse does not clean up user-uploaded media, which can lead to junk files consuming disk space. If this continues for a long time, the server's hard drive may become full of junk files.
Steps to reproduce
In order to highlight the problem, ensure that no files larger than 512 MB exist in the media repository of the current server instance before performing the following steps.
- Configure Synapse to enable
media_upload_limits:
media_upload_limits:
- time_period: 24h
- max_size: 1536M
# This configuration means that the server allows each user to upload a maximum of 1536 MB of media in a 24-hour period.
- Continue configuring Synapse to allow large file uploads. Here, increase the limit to 512 MB:
- As a user, upload three 512 MB files in a room (using a non-encrypted room).
- After the upload is complete, I will query the media uploaded by the user (I will bypass Synapse and execute SQL directly).
SELECT
media_id,
media_length,
created_ts,
user_id
FROM
public.local_media_repository
WHERE
user_id = '@someuser:example.com'
AND
media_length >= 524288000;
- Count the media in the Synapse media repository that are 524288000 bytes or larger and note the number of media.
- Upload a 512 MB file again (this upload will trigger the limit).
- Repeat steps 4 & 5.
You will notice that the last uploaded media has no record in the database, but is fully stored in the media repository. Four 512 MB files appear at the file system level, but only three are returned in the Synapse database.
This means that the fourth uploaded file is not managed by Synapse, but still consumes disk space (indicates that it is an orphan file). If a user repeatedly uploads media even after reaching the limit during a period, More media will continue to occupy the disk (and this media remains unmanaged by Synapse).
For Synapse instances with relatively relaxed media upload policies, this phenomenon can be easily exploited by malicious users.
Homeserver
Self-hosted
Synapse Version
Synapse 1.138.0
Installation Method
Docker (matrixdotorg/synapse)
Database
Postgres 16, n/a, no, no
Workers
Single process
Platform
Synology DiskStation Manager 6.2.3 (Linux 3.10.105, x64)
Configuration
media_upload_limits:
- time_period: 24h
- max_size: 1536M
max_upload_size: 512M
Relevant log output
Anything else that would be useful to know?
Currently known solutions:
a) Strictly restrict user uploads;
b) Use crontab Such media is periodically deleted using other methods. This requires a "reverse match" approach: using a rule to convert the file path to an MXC ID and feed it into a SQL query. If the ID isn't found in the database, the corresponding file is considered orphaned and ultimately deleted.
My (perhaps unprofessional) view of this approach is:
Before the upload begins, only the file size is sent to the server for calculation. If this value triggers the limit, the upload is blocked. Ultimately, this prevents such media from taking up the server's hard drive. The current implementation is that as a user, I have to transmit data to the server again to know whether I am about to (or have already) triggered the limit.
I've written several scripts that work together to clean up these files and have iterated through several versions, but after testing, I've found their effectiveness to be mediocre:
The scripts are log-driven and therefore completely dependent on Synapse logs. If there's a gap in the logs the scripts rely on, or if hardware resources are limited, the script's workflow will become disrupted and fail to work as expected. I'd rather Synapse help me resolve this issue.
Title modified.
Description
Enablingmedia_upload_limitsmay generate junk files that can take up server disk space. The amount of junk files generated depends on whether users continue to upload after hitting the limit.media_upload_limits: If a user triggers this limit, Synapse does not clean up user-uploaded media, which can lead to junk files consuming disk space. If this continues for a long time, the server's hard drive may become full of junk files.Steps to reproduce
In order to highlight the problem, ensure that no files larger than 512 MB exist in the media repository of the current server instance before performing the following steps.
media_upload_limits:You will notice that the last uploaded media has no record in the database, but is fully stored in the media repository. Four 512 MB files appear at the file system level, but only three are returned in the Synapse database.
This means that the fourth uploaded file is not managed by Synapse, but still consumes disk space (indicates that it is an orphan file). If a user repeatedly uploads media even after reaching the limit during a period, More media will continue to occupy the disk (and this media remains unmanaged by Synapse).
For Synapse instances with relatively relaxed media upload policies, this phenomenon can be easily exploited by malicious users.
Homeserver
Self-hosted
Synapse Version
Synapse 1.138.0
Installation Method
Docker (matrixdotorg/synapse)
Database
Postgres 16, n/a, no, no
Workers
Single process
Platform
Synology DiskStation Manager 6.2.3 (Linux 3.10.105, x64)
Configuration
Relevant log output
Anything else that would be useful to know?
Currently known solutions:
a) Strictly restrict user uploads;
b) Use crontab Such media is periodically deleted using other methods. This requires a "reverse match" approach: using a rule to convert the file path to an MXC ID and feed it into a SQL query. If the ID isn't found in the database, the corresponding file is considered orphaned and ultimately deleted.
My (perhaps unprofessional) view of this approach is:
Before the upload begins, only the file size is sent to the server for calculation. If this value triggers the limit, the upload is blocked. Ultimately, this prevents such media from taking up the server's hard drive. The current implementation is that as a user, I have to transmit data to the server again to know whether I am about to (or have already) triggered the limit.
I've written several scripts that work together to clean up these files and have iterated through several versions, but after testing, I've found their effectiveness to be mediocre:
The scripts are log-driven and therefore completely dependent on Synapse logs. If there's a gap in the logs the scripts rely on, or if hardware resources are limited, the script's workflow will become disrupted and fail to work as expected. I'd rather Synapse help me resolve this issue.