Skip to content

media_upload_limits: If a user triggers this limit, Synapse does not clean up user-uploaded media, which can lead to junk files occupying disk space. #18915

@YamatoRyou

Description

@YamatoRyou

Title modified.

Description

Enabling media_upload_limits may generate junk files that can take up server disk space. The amount of junk files generated depends on whether users continue to upload after hitting the limit.
media_upload_limits: If a user triggers this limit, Synapse does not clean up user-uploaded media, which can lead to junk files consuming disk space. If this continues for a long time, the server's hard drive may become full of junk files.

Steps to reproduce

In order to highlight the problem, ensure that no files larger than 512 MB exist in the media repository of the current server instance before performing the following steps.

  1. Configure Synapse to enable media_upload_limits:
media_upload_limits:
- time_period: 24h
- max_size: 1536M
# This configuration means that the server allows each user to upload a maximum of 1536 MB of media in a 24-hour period.
  1. Continue configuring Synapse to allow large file uploads. Here, increase the limit to 512 MB:
max_upload_size: 512M
  1. As a user, upload three 512 MB files in a room (using a non-encrypted room).
  2. After the upload is complete, I will query the media uploaded by the user (I will bypass Synapse and execute SQL directly).
SELECT
  media_id,
  media_length,
  created_ts,
  user_id
FROM
  public.local_media_repository
WHERE
  user_id = '@someuser:example.com'
  AND
  media_length >= 524288000;
  1. Count the media in the Synapse media repository that are 524288000 bytes or larger and note the number of media.
  2. Upload a 512 MB file again (this upload will trigger the limit).
  3. Repeat steps 4 & 5.

You will notice that the last uploaded media has no record in the database, but is fully stored in the media repository. Four 512 MB files appear at the file system level, but only three are returned in the Synapse database.
This means that the fourth uploaded file is not managed by Synapse, but still consumes disk space (indicates that it is an orphan file). If a user repeatedly uploads media even after reaching the limit during a period, More media will continue to occupy the disk (and this media remains unmanaged by Synapse).

For Synapse instances with relatively relaxed media upload policies, this phenomenon can be easily exploited by malicious users.

Homeserver

Self-hosted

Synapse Version

Synapse 1.138.0

Installation Method

Docker (matrixdotorg/synapse)

Database

Postgres 16, n/a, no, no

Workers

Single process

Platform

Synology DiskStation Manager 6.2.3 (Linux 3.10.105, x64)

Configuration

media_upload_limits:
- time_period: 24h
- max_size: 1536M

max_upload_size: 512M

Relevant log output

n/a

Anything else that would be useful to know?

Currently known solutions:
a) Strictly restrict user uploads;
b) Use crontab Such media is periodically deleted using other methods. This requires a "reverse match" approach: using a rule to convert the file path to an MXC ID and feed it into a SQL query. If the ID isn't found in the database, the corresponding file is considered orphaned and ultimately deleted.

My (perhaps unprofessional) view of this approach is:
Before the upload begins, only the file size is sent to the server for calculation. If this value triggers the limit, the upload is blocked. Ultimately, this prevents such media from taking up the server's hard drive. The current implementation is that as a user, I have to transmit data to the server again to know whether I am about to (or have already) triggered the limit.


I've written several scripts that work together to clean up these files and have iterated through several versions, but after testing, I've found their effectiveness to be mediocre:
The scripts are log-driven and therefore completely dependent on Synapse logs. If there's a gap in the logs the scripts rely on, or if hardware resources are limited, the script's workflow will become disrupted and fail to work as expected. I'd rather Synapse help me resolve this issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions