Skip to content

🐛 Handle IntegrityError in aiida_computer fixture#7349

Merged
GeigerJ2 merged 6 commits into
aiidateam:mainfrom
agoscinski:fix/xdist-database
May 4, 2026
Merged

🐛 Handle IntegrityError in aiida_computer fixture#7349
GeigerJ2 merged 6 commits into
aiidateam:mainfrom
agoscinski:fix/xdist-database

Conversation

@agoscinski
Copy link
Copy Markdown
Collaborator

@agoscinski agoscinski commented Apr 29, 2026

Tried to fix issue #7347 here but this is just a tmp helper fix. This seems to fix category 1 problem but its more a bandaid than a real fix.


When aiida_profile_clean resets the database mid-session, subsequent calls to the aiida_computer fixture can hit a TOCTOU race: get() finds no matching computer, but store() fails with an IntegrityError because another fixture call already recreated it. Catch the error and fall back to get() by label.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.27%. Comparing base (ad4a70a) to head (850d0a3).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7349      +/-   ##
==========================================
+ Coverage   80.25%   80.27%   +0.02%     
==========================================
  Files         577      577              
  Lines       45498    45507       +9     
==========================================
+ Hits        36512    36525      +13     
+ Misses       8986     8982       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@agoscinski agoscinski force-pushed the fix/xdist-database branch 3 times, most recently from 55a300e to 7c29c05 Compare April 29, 2026 20:08
@agoscinski agoscinski force-pushed the fix/xdist-database branch 2 times, most recently from 4055a54 to 6ada13b Compare April 30, 2026 05:32
@agoscinski agoscinski closed this Apr 30, 2026
@agoscinski agoscinski deleted the fix/xdist-database branch April 30, 2026 05:44
@agoscinski agoscinski restored the fix/xdist-database branch April 30, 2026 15:06
@agoscinski agoscinski reopened this Apr 30, 2026
@agoscinski
Copy link
Copy Markdown
Collaborator Author

This fix still seems to be needed to solve some of the flaky behavior I observed in #7323 see CI run https://github.com/aiidateam/aiida-core/actions/runs/25172162526/job/73794412601?pr=7323 (fixed when adding this commit and reappeared when removing it)

@agoscinski agoscinski marked this pull request as ready for review April 30, 2026 15:07
@GeigerJ2 GeigerJ2 self-requested a review May 1, 2026 09:47
agoscinski and others added 2 commits May 1, 2026 11:59
When `aiida_profile_clean` resets the database mid-session, subsequent
calls to the `aiida_computer` fixture can hit a TOCTOU race: `get()`
finds no matching computer, but `store()` fails with an IntegrityError
because another fixture call already recreated it. Catch the error and
fall back to `get()` by label.
Drop the redundant `SqlaIntegrityError` catch. The storage layer
(`ModelWrapper.save()`) already converts SQLAlchemy's `IntegrityError`
to `aiida.common.exceptions.IntegrityError` before it reaches caller
code, so catching the raw SQLAlchemy exception adds a needless
dependency on storage internals.

Also match the fallback `get()` on all four fields (label, hostname,
scheduler_type, transport_type) instead of label alone, consistent with
the original `get()` call.
@GeigerJ2 GeigerJ2 force-pushed the fix/xdist-database branch from fddf7b6 to 9e1d96f Compare May 1, 2026 09:59
GeigerJ2 added 4 commits May 1, 2026 12:18
Add a test that exercises the except-IntegrityError branch in the
`aiida_computer` fixture. Mocks `get()` to miss and `store()` to raise
IntegrityError, verifying that the fallback `get()` returns the correct
computer.
The previous IntegrityError fallback had two latent issues:

1. ``set_minimum_job_poll_interval`` and
   ``set_default_mpiprocs_per_machine`` lived in an ``else:`` branch and
   were skipped on the fallback path, silently dropping caller-provided
   values when a race fired.

2. The fallback ``get()`` could itself raise ``NotExistent`` if the
   racing worker's row was already gone, and that propagated as a
   confusing exception rather than triggering a fresh ``store()``.

Move the setters out of ``else:`` so they always run, and add a
build-and-store fallback for the rare inner ``NotExistent``. Factor the
``Computer(...)`` construction into a small ``_build()`` closure to
avoid duplication.
Collapse the recovery test into a single test parametrized over
both branches of the inner ``except IntegrityError``:

- ``fallback_get_succeeds``: the fallback ``get()`` finds the
  racing worker's row and returns it (common case).
- ``fallback_get_misses_rebuilds``: the fallback ``get()`` also
  raises ``NotExistent`` because the racing worker's row vanished;
  the fixture rebuilds and stores fresh.

The second branch was previously uncovered. Adding it closes the
patch-coverage gap from the recent ``orm.py`` change and exercises
the inner ``except NotExistent`` block that drives the rebuild
path.
Copy link
Copy Markdown
Collaborator

@GeigerJ2 GeigerJ2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Will squash-merge now.

I added some commits on top such that the IntegrityError fallback now handles both the common race (fallback get() finds the racing worker's row) and the rarer double-miss case (rebuild and store fresh). In addition, the caller-provided minimum_job_poll_interval / default_mpiprocs_per_machine now propagate through all recovery paths so they're never silently dropped. Tests cover both branches.

@GeigerJ2 GeigerJ2 merged commit f87e241 into aiidateam:main May 4, 2026
25 of 26 checks passed
@agoscinski agoscinski deleted the fix/xdist-database branch May 5, 2026 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants