Skip to content

feat: filestore delete missing error#9878

Open
evan-onyx wants to merge 2 commits intomainfrom
feat/filestore-delete-missing-error
Open

feat: filestore delete missing error#9878
evan-onyx wants to merge 2 commits intomainfrom
feat/filestore-delete-missing-error

Conversation

@evan-onyx
Copy link
Copy Markdown
Contributor

@evan-onyx evan-onyx commented Apr 2, 2026

Description

No longer error on filestore delete when file is missing during ttl cleanup task (since we want the file to be gone)

How Has This Been Tested?

new tests

Additional Options

  • [Optional] Please cherry-pick this PR to the latest release version.
  • [Optional] Override Linear Check

Summary by cubic

Make file deletes tolerant when the file is already gone during TTL cleanup and chat purges. Adds an error_on_missing flag to FileStore.delete_file and makes the TTL task continue on per-session failures.

  • Bug Fixes

    • Added error_on_missing to FileStore.delete_file; TTL and chat cleanup call it with False so missing files don’t fail the job.
    • S3BackedFileStore and PostgresBackedFileStore honor the flag and skip raises when the file/content is missing; new unit tests cover this.
    • TTL management now wraps each session delete in a try/except and logs, then continues.
  • Migration

    • If you maintain a custom FileStore, update delete_file(self, file_id: str, error_on_missing: bool = True, db_session: Session | None = None) to match the new signature.

Written for commit 1a43a97. Summary will update on new commits.

@evan-onyx evan-onyx requested a review from a team as a code owner April 2, 2026 23:16
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR makes file deletion more resilient during TTL-driven chat session cleanup by introducing an error_on_missing parameter on the FileStore.delete_file interface and using it in delete_messages_and_files_from_chat_session. It also wraps each individual delete_chat_session call in its own try/except so one session failure doesn't abort the entire TTL sweep.

Key changes:

  • FileStore.delete_file (abstract + both concrete backends) gains error_on_missing: bool = True — when False, a missing file record silently returns instead of raising
  • delete_messages_and_files_from_chat_session in chat.py passes error_on_missing=False so already-deleted files don't surface as errors during cleanup
  • TTL management task wraps each delete_chat_session in its own try/except, logging and continuing on failure
  • New unit tests cover the raise-by-default and silent-return paths for both S3 and Postgres backends

Confidence Score: 5/5

Safe to merge; the one remaining finding is a misleading log message in an edge-case error path, not a correctness issue.

All changes are targeted, well-tested, and backward-compatible (error_on_missing defaults to True). The only finding is a P2 style issue: the outer except in tasks.py has a stale log message after the inner try/except was added. No data integrity, security, or runtime correctness issues were found.

backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py — outer except log message should be updated to reflect what it actually catches now.

Important Files Changed

Filename Overview
backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py Added per-session try/except so one failed deletion doesn't abort the whole TTL task; outer except log message is now misleading since it can't be reached by delete_chat_session errors.
backend/onyx/db/chat.py Passes error_on_missing=False to file_store.delete_file so already-deleted files during TTL cleanup don't surface as errors.
backend/onyx/file_store/file_store.py Adds error_on_missing parameter to the FileStore abstract API and S3BackedFileStore implementation; switches to the _optional variant of the record lookup and handles None explicitly.
backend/onyx/file_store/postgres_file_store.py Mirrors S3 changes in PostgresBackedFileStore: adds error_on_missing, uses get_file_content_by_file_id_optional, and returns silently when appropriate.
backend/tests/unit/onyx/file_store/test_delete_file.py New unit tests covering both raise-by-default and silent-return behaviours for S3 and Postgres backends; mocking is correct.

Sequence Diagram

sequenceDiagram
    participant T as TTL Task
    participant DB as DB (chat.py)
    participant FS as FileStore

    T->>DB: get_chat_sessions_older_than()
    DB-->>T: [(user_id, session_id), ...]

    loop Each session (individual try/except)
        T->>DB: delete_chat_session(hard_delete=True)
        DB->>FS: delete_file(error_on_missing=False)
        alt File exists
            FS-->>DB: deleted OK
        else File already gone
            FS-->>DB: silent return (no error)
        end
        DB-->>T: session deleted
        note over T: Exception → log & continue
    end
Loading

Comments Outside Diff (1)

  1. backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py, line 58-62 (link)

    P2 Misleading outer-except log message after refactoring

    After the inner try/except was added to catch every delete_chat_session failure, the outer except Exception block can no longer be reached by a delete_chat_session error. It will only fire if get_chat_sessions_older_than itself raises (or the for-loop machinery fails). In that case user_id and session_id are still None (their initial values), so the log line will always read user_id=None session_id=None and the message still says "delete_chat_session exceptioned", which is inaccurate.

    Consider updating the message to reflect what can actually go wrong here:

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py
    Line: 58-62
    
    Comment:
    **Misleading outer-except log message after refactoring**
    
    After the inner `try/except` was added to catch every `delete_chat_session` failure, the outer `except Exception` block can no longer be reached by a `delete_chat_session` error. It will only fire if `get_chat_sessions_older_than` itself raises (or the for-loop machinery fails). In that case `user_id` and `session_id` are still `None` (their initial values), so the log line will always read `user_id=None session_id=None` and the message still says "delete_chat_session exceptioned", which is inaccurate.
    
    Consider updating the message to reflect what can actually go wrong here:
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py
Line: 58-62

Comment:
**Misleading outer-except log message after refactoring**

After the inner `try/except` was added to catch every `delete_chat_session` failure, the outer `except Exception` block can no longer be reached by a `delete_chat_session` error. It will only fire if `get_chat_sessions_older_than` itself raises (or the for-loop machinery fails). In that case `user_id` and `session_id` are still `None` (their initial values), so the log line will always read `user_id=None session_id=None` and the message still says "delete_chat_session exceptioned", which is inaccurate.

Consider updating the message to reflect what can actually go wrong here:

```suggestion
    except Exception:
        logger.exception(
            "perform_ttl_management_task failed to fetch or iterate over old chat sessions"
        )
        raise
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "test" | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files

Confidence score: 3/5

  • There is a concrete reliability risk in backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py: catching broad Exception can silently mask real deletion failures instead of surfacing them.
  • Because this issue is high severity (7/10) with high confidence (9/10), the merge risk is moderate rather than minimal, even though it appears localized to one task path.
  • Pay close attention to backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py - narrow exception handling to the known missing-file case and re-raise unexpected errors.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py">

<violation number="1" location="backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py:51">
P1: Catching `Exception` here is too broad and hides real deletion failures; only the known missing-file case should be tolerated, and unexpected errors should be re-raised.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

include_deleted=True,
hard_delete=True,
)
except Exception:
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Catching Exception here is too broad and hides real deletion failures; only the known missing-file case should be tolerated, and unexpected errors should be re-raised.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py, line 51:

<comment>Catching `Exception` here is too broad and hides real deletion failures; only the known missing-file case should be tolerated, and unexpected errors should be re-raised.</comment>

<file context>
@@ -39,14 +39,20 @@ def perform_ttl_management_task(
+                        include_deleted=True,
+                        hard_delete=True,
+                    )
+            except Exception:
+                logger.exception(
+                    "Failed to delete chat session "
</file context>
Fix with Cubic

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

🖼️ Visual Regression Report

Project Changed Added Removed Unchanged Report
admin 8 0 0 157 View Report
exclusive 0 0 0 8 ✅ No changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants