Skip to content

fix: TTL management task no longer crashes on missing file records#9804

Draft
jcaille wants to merge 4 commits intoonyx-dot-app:mainfrom
jcaille:fix/ttl-skip-missing-file-records
Draft

fix: TTL management task no longer crashes on missing file records#9804
jcaille wants to merge 4 commits intoonyx-dot-app:mainfrom
jcaille:fix/ttl-skip-missing-file-records

Conversation

@jcaille
Copy link
Copy Markdown

@jcaille jcaille commented Mar 31, 2026

Description

The TTL cleanup task would crash with a RuntimeError when trying to delete a file record that no longer exists, blocking cleanup of all remaining sessions.

2026-03-30 18:31:16.948infoTraceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/app/onyx/background/celery/apps/app_base.py", line 87, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ee/onyx/background/celery/tasks/ttl_management/tasks.py", line 60, in perform_ttl_management_task
    delete_chat_session(
  File "/app/onyx/db/chat.py", line 322, in delete_chat_session
    delete_messages_and_files_from_chat_session(chat_session_id, db_session)
  File "/app/onyx/db/chat.py", line 169, in delete_messages_and_files_from_chat_session
    file_store.delete_file(file_id=file_info.get("id"))
  File "/app/onyx/file_store/file_store.py", line 459, in delete_file
    file_record = get_filerecord_by_file_id(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/onyx/db/file_record.py", line 42, in get_filerecord_by_file_id
    raise RuntimeError(f"File by id {file_id} does not exist or was deleted")

Two fixes proposed:

  1. Wrap file deletion in delete_messages_and_files_from_chat_session with try/except — a missing file is already the desired state, so log a warning and continue.

  2. Add per-session error handling in the TTL task loop. The existing comment said "one session per delete so that we don't blow up" but the outer try/except still aborted on the first failure. Now each session deletion is individually wrapped so failures don't block cleanup of subsequent sessions.

Additional Options

  • [Optional] Please cherry-pick this PR to the latest release version.
  • [Optional] Override Linear Check

Summary by cubic

Prevents TTL cleanup from crashing on missing file records by skipping those deletes and isolating per-session errors. Cleanup continues for all sessions and the task reports failure only if any deletion fails.

  • Bug Fixes

    • In delete_messages_and_files_from_chat_session, wrap file deletes in try/except; log a warning and continue on missing files.
    • In perform_ttl_management_task, handle each session in try/except, count failures, and set success to failures == 0.
    • Added unit tests for chat deletion and TTL task resilience.
  • Refactors

    • Aligned test conventions (helper extraction, unused imports removed, consistent naming).

Written for commit 569fef2. Summary will update on new commits.

Jean Caillé and others added 4 commits March 30, 2026 19:25
The TTL cleanup task would crash with a RuntimeError when trying to
delete a file record that no longer exists, blocking cleanup of all
remaining sessions. Two fixes:

1. Wrap file deletion in delete_messages_and_files_from_chat_session
   with try/except — a missing file is already the desired state, so
   log a warning and continue.

2. Add per-session error handling in the TTL task loop. The existing
   comment said "one session per delete so that we don't blow up" but
   the outer try/except still aborted on the first failure. Now each
   session deletion is individually wrapped so failures don't block
   cleanup of subsequent sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_chat_deletion: verifies delete_messages_and_files_from_chat_session
  continues when a file record is missing, and handles messages with no files
- test_ttl_task: verifies perform_ttl_management_task continues deleting
  remaining sessions after one fails, and reports correct success status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Prefix unused mock params with _ and type as Any
- Extract shared db session mock setup into helper
- Remove unused pytest import
- Use module path constant to reduce repetition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants