✨(storage) implement tiered storage by sylvinus · Pull Request #486 · suitenumerique/messages

sylvinus · 2026-01-16T11:09:29Z

This allows to use S3-compatible object storage to offload blobs, making Postgres much lighter. We design for storing ~1B emails on a single instance.

Fixes #185.

Summary by CodeRabbit

New Features
- Tiered storage for message blobs (offload to object storage) with scheduled hourly offload.
- Configurable blob compression and optional encryption with key rotation and verification tools.
- Management command to verify and re-encrypt tiered storage content.
Admin
- Admin UI shows blob storage location and encryption key.
Chores
- New configuration options for storage endpoints, buckets, credentials, offload controls, compression, and encryption.
Tests
- Extensive tests added for tiered storage, tasks, verification, and key rotation.

coderabbitai · 2026-01-16T11:09:42Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds tiered blob storage with optional encryption and configurable compression, object-storage backend settings, async offload tasks, DB model changes for storage location and key id, a management command for verification/key rotation, startup checks, admin exposure, and extensive tests.

Changes

Cohort / File(s)	Summary
Configuration & Environment `env.d/development/backend.defaults`, `src/backend/messages/settings.py`	Adds blob compression/config (`MESSAGES_BLOB_COMPRESS`), encryption keys/active id, `message-blobs` S3 storage settings (`STORAGE_MESSAGE_BLOBS_`), and tiered offload flags/parameters (`TIERED_STORAGE_OFFLOAD_`). Removes legacy `MESSAGES_BLOB_ZSTD_LEVEL`.
Core Enums & Models `src/backend/core/enums.py`, `src/backend/core/models.py`, `src/backend/core/migrations/0026_blob_encryption_key_id_blob_storage_location_and_more.py`	Adds `BlobStorageLocationChoices` and `parse_compression_spec`; makes `raw_content` nullable, adds `storage_location` and `encryption_key_id`, updates blob creation/compression/encryption and `get_content()` logic. Migration updates DB schema accordingly.
Tiered Storage Service `src/backend/core/services/tiered_storage.py`, `src/backend/core/services/tiered_storage_tasks.py`, `src/backend/core/signals.py`	Implements `TieredStorageService` (encrypt/decrypt, deterministic object keys, upload/download, rotate/delete_if_orphaned, advisory locking). Adds Celery tasks for offload, single-blob offload, and orphan cleanup with retry and lock semantics. Adds post-delete signal to enqueue cleanup.
Management Command `src/backend/core/management/commands/verify_tiered_storage.py`	New `verify_tiered_storage` command supporting `db-to-storage` / `storage-to-db` verification, hash verification, orphan detection, and `--re-encrypt` key rotation with advisory locks and dry-run support.
Tests & Test Infrastructure `src/backend/core/tests/conftest.py`, `src/backend/core/tests/services/test_tiered_storage.py`, `src/backend/core/tests/tasks/test_tiered_storage_tasks.py`, `src/backend/core/tests/commands/test_verify_tiered_storage.py`, test package `__init__.py`s	Adds extensive unit/integration tests covering storage key computation, encryption/decryption, offload tasks, deduplication, orphan cleanup, upload/download E2E with MinIO, key rotation, management command behaviours, and test fixture to create buckets.
Admin, Startup & Checks `src/backend/core/admin.py`, `src/backend/core/apps.py`, `src/backend/core/checks.py`	Exposes `storage_location` and `encryption_key_id` in admin; registers `core.checks` at app ready; adds system checks validating compression and encryption key configuration with specific error/warning codes.
Celery & Scheduling `src/backend/messages/celery_app.py`	Adds hourly beat schedule entry to run `offload_blobs_task`.
Build & CI `Makefile`, `.github/workflows/messages.yml`	Renames Make target from `import-bucket` to `create-buckets` and updates workflow to run `make create-buckets`; `create-buckets` now creates both `message-imports` and `message-blobs`.
Minor Utility Change `src/backend/core/services/search/search.py`, `src/backend/core/utils.py`	Changes OpenSearch enable check to direct settings access; `JSONValue.to_python` now returns `None` for empty/whitespace strings.
Admin Tests Small Changes `src/backend/core/tests/commands/__init__.py`, `src/backend/core/tests/services/__init__.py`, `src/backend/core/tests/tasks/__init__.py`, `src/backend/core/tests/tasks/test_task_send_message.py`	Adds package docstrings and small test lint suppressions.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Application
    participant Blob as Blob Model
    participant TierSvc as TieredStorageService
    participant Encrypt as AES-256-GCM
    participant ObjStor as Object Storage
    participant DB as PostgreSQL

    Client->>Blob: create_blob(raw_bytes)
    Blob->>TierSvc: encrypt(compressed_bytes)
    TierSvc->>Encrypt: encrypt(data, active_key)
    Encrypt-->>TierSvc: encrypted_bytes, key_id
    TierSvc-->>Blob: encrypted_bytes, key_id
    Blob->>DB: save(storage_location=OBJECT_STORAGE, encryption_key_id, raw_content=NULL)
    DB-->>Blob: saved

    Client->>Blob: get_content()
    Blob->>DB: read storage_location, encryption_key_id
    DB-->>Blob: location=OBJECT_STORAGE, key_id
    Blob->>TierSvc: download_blob(blob)
    TierSvc->>ObjStor: GET blobs/{key_id}/{shard}/{sha}
    ObjStor-->>TierSvc: encrypted_bytes
    TierSvc->>Encrypt: decrypt(encrypted_bytes, key_id)
    Encrypt-->>TierSvc: decompressed_bytes
    TierSvc-->>Blob: decompressed_bytes
    Blob-->>Client: decompressed_bytes

sequenceDiagram
    participant Beat as Celery Beat
    participant Task as offload_blobs_task
    participant DB as PostgreSQL
    participant BlobTask as offload_single_blob_task
    participant TierSvc as TieredStorageService
    participant ObjStor as Object Storage

    Beat->>Task: trigger (hourly)
    Task->>DB: SELECT eligible blobs (POSTGRES, aged, size)
    DB-->>Task: blob_ids
    Task->>BlobTask: enqueue per blob_id

    BlobTask->>DB: SELECT blob WHERE id=blob_id
    DB-->>BlobTask: blob
    BlobTask->>DB: acquire advisory_lock(sha256)
    DB-->>BlobTask: locked
    BlobTask->>TierSvc: upload_blob(raw_content)
    TierSvc->>TierSvc: encrypt(raw_content)
    TierSvc->>ObjStor: PUT blobs/{key_id}/{shard}/{sha}
    ObjStor-->>TierSvc: stored
    TierSvc-->>BlobTask: key_id
    BlobTask->>DB: UPDATE blob SET storage_location=OBJECT_STORAGE, encryption_key_id, raw_content=NULL
    DB-->>BlobTask: updated

sequenceDiagram
    participant Cmd as verify_tiered_storage --re-encrypt
    participant DB as PostgreSQL
    participant TierSvc as TieredStorageService
    participant ObjStor as Object Storage

    Cmd->>DB: SELECT blobs WHERE encryption_key_id != active_key_id
    DB-->>Cmd: blobs_to_rotate

    loop per blob
        Cmd->>DB: acquire advisory_lock(sha256)
        DB-->>Cmd: locked
        alt POSTGRES-backed
            Cmd->>DB: read raw_content
            DB-->>Cmd: encrypted_bytes
            Cmd->>TierSvc: decrypt(encrypted_bytes, old_key)
            TierSvc-->>Cmd: plaintext
            Cmd->>TierSvc: encrypt(plaintext, active_key)
            TierSvc-->>Cmd: new_encrypted
            Cmd->>DB: UPDATE blob raw_content=new_encrypted, encryption_key_id=active_key
        else OBJECT_STORAGE-backed
            Cmd->>ObjStor: GET blobs/{old_key}/{shard}/{sha}
            ObjStor-->>Cmd: encrypted_bytes
            Cmd->>TierSvc: decrypt(encrypted_bytes, old_key)
            TierSvc-->>Cmd: plaintext
            Cmd->>TierSvc: encrypt(plaintext, active_key)
            TierSvc-->>ObjStor: PUT blobs/{active_key}/{shard}/{sha}
            Cmd->>DB: UPDATE blob encryption_key_id=active_key
            Cmd->>TierSvc: delete old object (best-effort)
        end
    end
    Cmd-->>Cmd: report summary

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

✨(spam) add rspamd integration, header rules and InboundMessage queue #436: Modifies BlobManager.create_blob and compression semantics — related to compression/key handling changes here.
🔨(devx) update developer experience: uv, rustfs, caddy, new makefile #556: Adjusts bucket creation workflow and Make target names relevant to the new create-buckets target and CI workflow update.

Suggested reviewers

sdemagny

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '✨(storage) implement tiered storage' directly and specifically describes the primary change: implementing tiered storage functionality for blob offloading.
Linked Issues check	✅ Passed	The PR implements both objectives from `#185`: enables S3-compatible object storage for blobs, and provides configurable storage location (POSTGRES vs OBJECT_STORAGE) with corresponding Blob model fields and infrastructure.
Out of Scope Changes check	✅ Passed	Minor changes to search.py (OpenSearch setting access), utils.py (JSONValue whitespace handling), and settings cleanup are supportive infrastructure adjustments needed for tiered storage implementation and do not represent scope creep.
Docstring Coverage	✅ Passed	Docstring coverage is 97.30% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/backend/core/management/commands/verify_tiered_storage.py`:
- Around line 466-495: The current flow writes the newly encrypted object via
self.service.storage.save(storage_key, ...) before updating
blob.encryption_key_id inside transaction.atomic(), risking storage/DB
inconsistency if the DB update fails; instead, write the new encrypted bytes to
a temporary object (e.g. derive a temp key from storage_key and new_key_id)
using self.service.storage.save(temp_key, ContentFile(encrypted)), then perform
the DB update inside transaction.atomic() (update blob.encryption_key_id and
save), and only after the transaction succeeds atomically remove/rename the temp
object to the final storage_key (or copy temp→final and delete temp) so storage
and DB remain consistent; reference symbols: self.service.storage.save,
storage_key, temp_key (create), self.service.encrypt, blob.encryption_key_id,
transaction.atomic.

In `@src/backend/core/services/tiered_storage.py`:
- Around line 31-43: In __init__, the enabled gate currently checks for an
OPTIONS.endpoint_url which wrongly disables valid S3 setups; instead set
self.enabled based on presence of the "message-blobs" storage config itself
(e.g. check that settings.STORAGES contains a non-empty "message-blobs" entry).
Update the assignment to self.enabled to use
settings.STORAGES.get("message-blobs") (or "message-blobs" in settings.STORAGES
and truthy) rather than digging for OPTIONS.endpoint_url so AWS S3 configs
without endpoint_url remain enabled.

🧹 Nitpick comments (3)

src/backend/core/services/tiered_storage_tasks.py (1)
68-133: Consider adding retry for transient failures.

The task handles lock contention gracefully by returning "locked" status, but transient failures (network issues, temporary S3 unavailability) at line 131 are logged and returned as errors without retry. The periodic offload_blobs_task will eventually re-queue these blobs, but adding explicit retry behavior for transient exceptions (e.g., ConnectionError, Timeout) could improve reliability.
💡 Optional: Add retry for transient failures
-@celery_app.task(bind=True)
+@celery_app.task(bind=True, autoretry_for=(ConnectionError, TimeoutError), retry_backoff=True, max_retries=3)
 def offload_single_blob_task(self, blob_id: str) -> Dict[str, Any]:
src/backend/core/models.py (1)
1536-1557: Enforce storage_location/raw_content invariants at the DB layer.
With raw_content now nullable, inconsistent states (e.g., OBJECT_STORAGE + non-null content)
become possible and will surface as runtime errors in get_content. A check constraint makes
the invariant explicit and avoids silent drift. This will require a migration.
♻️ Proposed constraint
         constraints = [
             models.CheckConstraint(
                 check=(
                     models.Q(mailbox__isnull=False) | models.Q(maildomain__isnull=False)
                 ),
                 name="blob_has_owner",
             ),
+            models.CheckConstraint(
+                check=(
+                    models.Q(
+                        storage_location=BlobStorageLocationChoices.POSTGRES,
+                        raw_content__isnull=False,
+                    )
+                    | models.Q(
+                        storage_location=BlobStorageLocationChoices.OBJECT_STORAGE,
+                        raw_content__isnull=True,
+                    )
+                ),
+                name="blob_storage_location_matches_content",
+            ),
         ]
As per coding guidelines, enforce data integrity with model constraints.

Also applies to: 1583-1589
src/backend/core/services/tiered_storage.py (1)
244-281: Guard against orphan-delete races and capture delete errors.
There’s a TOCTOU window between the reference count (Line 259-263) and deletion (Line 274-275);
a concurrent offload could add a reference after the count and still have its object deleted.
Consider an advisory lock keyed by SHA256 or a transactional guard around the check+delete.

Also, capture the storage deletion exception to Sentry so cleanup failures are observable.
♻️ Suggested Sentry capture
 from cryptography.fernet import Fernet
+from sentry_sdk import capture_exception
@@
-        except Exception as e:  # pylint: disable=broad-except
-            logger.warning("Failed to delete blob from storage %s: %s", key, e)
+        except Exception as exc:  # pylint: disable=broad-except
+            capture_exception(exc)
+            logger.warning("Failed to delete blob from storage %s: %s", key, exc)
             return False
As per coding guidelines, capture and report exceptions to Sentry.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec4c7ef and 8858822.

📒 Files selected for processing (22)

compose.yaml
env.d/development/backend.defaults
src/backend/core/api/viewsets/config.py
src/backend/core/enums.py
src/backend/core/management/commands/verify_tiered_storage.py
src/backend/core/migrations/0014_blob_encryption_key_id_blob_storage_location_and_more.py
src/backend/core/models.py
src/backend/core/services/search/search.py
src/backend/core/services/tiered_storage.py
src/backend/core/services/tiered_storage_tasks.py
src/backend/core/signals.py
src/backend/core/tests/commands/__init__.py
src/backend/core/tests/commands/test_verify_tiered_storage.py
src/backend/core/tests/conftest.py
src/backend/core/tests/services/__init__.py
src/backend/core/tests/services/test_tiered_storage.py
src/backend/core/tests/tasks/__init__.py
src/backend/core/tests/tasks/test_task_send_message.py
src/backend/core/tests/tasks/test_tiered_storage_tasks.py
src/backend/core/utils.py
src/backend/messages/celery_app.py
src/backend/messages/settings.py

🧰 Additional context used

📓 Path-based instructions (6)

src/backend/**/*.py