Skip to content

add_files_to_workspace silently accepts non-existent file IDs #278

@EnjoyBacon7

Description

@EnjoyBacon7

Summary

POST /partition/{partition}/workspaces/{workspace_id}/files accepts arbitrary file_ids and returns 200 OK {"status": "added"} even when the referenced files do not exist in the files table. Ghost entries are inserted into workspace_files, polluting the workspace membership and potentially causing confusing downstream behaviour.

Root cause

add_files_to_workspace() in openrag/components/indexer/vectordb/utils.py:653 performs an upsert into workspace_files without first checking whether each file_id exists in the files table:

def add_files_to_workspace(self, workspace_id: str, file_ids: list[str]):
    with self.Session() as session:
        for fid in file_ids:
            stmt = pg_insert(WorkspaceFile).values(workspace_id=workspace_id, file_id=fid)
            stmt = stmt.on_conflict_do_nothing(constraint="uix_workspace_file")
            session.execute(stmt)
        session.commit()

There is no foreign-key constraint from workspace_files.file_id to files.file_id, so invalid IDs are stored without error.

Steps to reproduce

curl -X POST http://<host>/partition/default/workspaces/my-ws/files \
  -H "Content-Type: application/json" \
  -d '{"file_ids": ["00000000-0000-0000-0000-000000000000"]}'
# Returns 200 {"status": "added"}

curl http://<host>/partition/default/workspaces/my-ws/files
# Returns ["00000000-0000-0000-0000-000000000000"] — ghost entry

Expected behaviour

The endpoint should return 404 Not Found (or 422 Unprocessable Entity) for any file_id that does not exist in the files table for that partition.

Suggested fix

Two complementary approaches:

  1. Application-level validation in the router or add_files_to_workspace: query files for each ID and reject unknown ones before inserting.
  2. Database-level constraint: add a foreign key from workspace_files.file_id to files.file_id (with ON DELETE CASCADE) so the DB enforces referential integrity.

The DB constraint is the safer long-term option; the application check provides a clear error message to the client.

Affected files

  • openrag/routers/workspaces.py — router handler for add_files_to_workspace
  • openrag/components/indexer/vectordb/utils.py:653add_files_to_workspace()
  • openrag/components/indexer/vectordb/models.pyWorkspaceFile model (missing FK constraint)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions