Skip to content

Extremely slow image uploads and thumbnail generation on v8.2.1 (GridFS) despite massive hardware scaling and DB indexing #40011

@wlazevedo

Description

@wlazevedo

Description:

We are experiencing severe delays when users upload media files (especially heavy images). The upload process and the subsequent thumbnail generation take an unreasonable amount of time to render in the chat.

We have completely ruled out infrastructure bottlenecks. We drastically scaled our environment and applied known community workarounds for GridFS and Node.js memory limits, but the application still behaves sluggishly during media uploads, suggesting a bottleneck in UploadFS/GridFS or the internal image processing (Sharp).

Steps to reproduce:

  1. Go to any channel or direct message.
  2. Upload a heavy image file (e.g., 5MB - 15MB).
  3. Observe the extreme delay during the upload phase and the time it takes for the thumbnail to finally render in the chat timeline.

Expected behavior:

Image uploads and OEmbed/Thumbnail generation should process quickly, utilizing the available hardware resources without hanging the user experience.

Actual behavior:

The upload and thumbnail generation take a very long time (often 30+ seconds). During this time, the host OS shows plenty of idle CPU and abundant free RAM, meaning the Rocket.Chat container/Node.js is bottlenecking internally and not utilizing the available hardware to process the image.

Server Setup Information:

  • Version of Rocket.Chat Server: 8.2.1
  • License Type: Community
  • Number of Users: [50]
  • Operating System: AlmaLinux 10.1
  • Deployment Method: Docker Compose
  • Number of Running Instances: 1
  • DB Replicaset Oplog: Enabled (rs0)
  • NodeJS Version: Bundled with 8.2.1 Docker image
  • MongoDB Version: 8.0

Client Setup Information

  • Desktop App or Browser Version: Affects all clients (Desktop App and latest Chrome/Firefox browsers)
  • Operating System: Windows / Linux / macOS

Additional context

Troubleshooting steps we already took (without success):

  1. Hardware Scale-Up: Increased the VM to 4 vCPUs and 8GB of RAM. The server is completely idle and there is no swap usage (over 5.5GB of RAM completely free).
  2. Node.js Tuning: Added NODE_OPTIONS=--max-old-space-size=4096 to the environment variables to prevent Garbage Collector thrashing and allow the V8 engine to use the abundant RAM.
  3. MongoDB Indexing: We manually created the GridFS chunks index inside MongoDB as suggested in older community threads (db.rocketchat_uploads.chunks.createIndex( { files_id: 1, n: 1 }, { unique: true } )).

Despite having a highly optimized and idle infrastructure, the image processing inside the container remains extremely slow. This points to an application-level limitation with GridFS chunking or the Sharp image processing library in this specific version.

Relevant logs:

There are no specific crash logs or stack traces during the upload, just the severe delay in processing. Server logs show normal operation, but the rendering time is exceptionally high.
(I can provide debug-level logs if requested by the engineering team).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions