-
Notifications
You must be signed in to change notification settings - Fork 36
add_files_to_workspace silently accepts non-existent file IDs #278
Description
Summary
POST /partition/{partition}/workspaces/{workspace_id}/files accepts arbitrary file_ids and returns 200 OK {"status": "added"} even when the referenced files do not exist in the files table. Ghost entries are inserted into workspace_files, polluting the workspace membership and potentially causing confusing downstream behaviour.
Root cause
add_files_to_workspace() in openrag/components/indexer/vectordb/utils.py:653 performs an upsert into workspace_files without first checking whether each file_id exists in the files table:
def add_files_to_workspace(self, workspace_id: str, file_ids: list[str]):
with self.Session() as session:
for fid in file_ids:
stmt = pg_insert(WorkspaceFile).values(workspace_id=workspace_id, file_id=fid)
stmt = stmt.on_conflict_do_nothing(constraint="uix_workspace_file")
session.execute(stmt)
session.commit()There is no foreign-key constraint from workspace_files.file_id to files.file_id, so invalid IDs are stored without error.
Steps to reproduce
curl -X POST http://<host>/partition/default/workspaces/my-ws/files \
-H "Content-Type: application/json" \
-d '{"file_ids": ["00000000-0000-0000-0000-000000000000"]}'
# Returns 200 {"status": "added"}
curl http://<host>/partition/default/workspaces/my-ws/files
# Returns ["00000000-0000-0000-0000-000000000000"] — ghost entryExpected behaviour
The endpoint should return 404 Not Found (or 422 Unprocessable Entity) for any file_id that does not exist in the files table for that partition.
Suggested fix
Two complementary approaches:
- Application-level validation in the router or
add_files_to_workspace: queryfilesfor each ID and reject unknown ones before inserting. - Database-level constraint: add a foreign key from
workspace_files.file_idtofiles.file_id(withON DELETE CASCADE) so the DB enforces referential integrity.
The DB constraint is the safer long-term option; the application check provides a clear error message to the client.
Affected files
openrag/routers/workspaces.py— router handler foradd_files_to_workspaceopenrag/components/indexer/vectordb/utils.py:653—add_files_to_workspace()openrag/components/indexer/vectordb/models.py—WorkspaceFilemodel (missing FK constraint)