🐛 Handle `IntegrityError` in `aiida_computer` fixture by agoscinski · Pull Request #7349 · aiidateam/aiida-core

agoscinski · 2026-04-29T16:50:32Z

Tried to fix issue #7347 here but this is just a tmp helper fix. This seems to fix category 1 problem but its more a bandaid than a real fix.

When aiida_profile_clean resets the database mid-session, subsequent calls to the aiida_computer fixture can hit a TOCTOU race: get() finds no matching computer, but store() fails with an IntegrityError because another fixture call already recreated it. Catch the error and fall back to get() by label.

codecov · 2026-04-29T16:53:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.27%. Comparing base (ad4a70a) to head (850d0a3).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7349      +/-   ##
==========================================
+ Coverage   80.25%   80.27%   +0.02%     
==========================================
  Files         577      577              
  Lines       45498    45507       +9     
==========================================
+ Hits        36512    36525      +13     
+ Misses       8986     8982       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

agoscinski · 2026-04-30T15:07:25Z

This fix still seems to be needed to solve some of the flaky behavior I observed in #7323 see CI run https://github.com/aiidateam/aiida-core/actions/runs/25172162526/job/73794412601?pr=7323 (fixed when adding this commit and reappeared when removing it)

When `aiida_profile_clean` resets the database mid-session, subsequent calls to the `aiida_computer` fixture can hit a TOCTOU race: `get()` finds no matching computer, but `store()` fails with an IntegrityError because another fixture call already recreated it. Catch the error and fall back to `get()` by label.

Drop the redundant `SqlaIntegrityError` catch. The storage layer (`ModelWrapper.save()`) already converts SQLAlchemy's `IntegrityError` to `aiida.common.exceptions.IntegrityError` before it reaches caller code, so catching the raw SQLAlchemy exception adds a needless dependency on storage internals. Also match the fallback `get()` on all four fields (label, hostname, scheduler_type, transport_type) instead of label alone, consistent with the original `get()` call.

Add a test that exercises the except-IntegrityError branch in the `aiida_computer` fixture. Mocks `get()` to miss and `store()` to raise IntegrityError, verifying that the fallback `get()` returns the correct computer.

The previous IntegrityError fallback had two latent issues: 1. ``set_minimum_job_poll_interval`` and ``set_default_mpiprocs_per_machine`` lived in an ``else:`` branch and were skipped on the fallback path, silently dropping caller-provided values when a race fired. 2. The fallback ``get()`` could itself raise ``NotExistent`` if the racing worker's row was already gone, and that propagated as a confusing exception rather than triggering a fresh ``store()``. Move the setters out of ``else:`` so they always run, and add a build-and-store fallback for the rare inner ``NotExistent``. Factor the ``Computer(...)`` construction into a small ``_build()`` closure to avoid duplication.

Collapse the recovery test into a single test parametrized over both branches of the inner ``except IntegrityError``: - ``fallback_get_succeeds``: the fallback ``get()`` finds the racing worker's row and returns it (common case). - ``fallback_get_misses_rebuilds``: the fallback ``get()`` also raises ``NotExistent`` because the racing worker's row vanished; the fixture rebuilds and stores fresh. The second branch was previously uncovered. Adding it closes the patch-coverage gap from the recent ``orm.py`` change and exercises the inner ``except NotExistent`` block that drives the rebuild path.

GeigerJ2

LGTM! Will squash-merge now.

I added some commits on top such that the IntegrityError fallback now handles both the common race (fallback get() finds the racing worker's row) and the rarer double-miss case (rebuild and store fresh). In addition, the caller-provided minimum_job_poll_interval / default_mpiprocs_per_machine now propagate through all recovery paths so they're never silently dropped. Tests cover both branches.

agoscinski force-pushed the fix/xdist-database branch 3 times, most recently from 55a300e to 7c29c05 Compare April 29, 2026 20:08

agoscinski mentioned this pull request Apr 29, 2026

✨ Add daemon version drift warning #7318

Merged

agoscinski force-pushed the fix/xdist-database branch 2 times, most recently from 4055a54 to 6ada13b Compare April 30, 2026 05:32

agoscinski closed this Apr 30, 2026

agoscinski deleted the fix/xdist-database branch April 30, 2026 05:44

agoscinski restored the fix/xdist-database branch April 30, 2026 15:06

agoscinski reopened this Apr 30, 2026

agoscinski force-pushed the fix/xdist-database branch from 6ada13b to e0a6e48 Compare April 30, 2026 15:07

agoscinski marked this pull request as ready for review April 30, 2026 15:07

agoscinski force-pushed the fix/xdist-database branch from e0a6e48 to eb24f4b Compare April 30, 2026 20:31

GeigerJ2 self-requested a review May 1, 2026 09:47

agoscinski and others added 2 commits May 1, 2026 11:59

GeigerJ2 force-pushed the fix/xdist-database branch from fddf7b6 to 9e1d96f Compare May 1, 2026 09:59

GeigerJ2 added 4 commits May 1, 2026 12:18

🧪 test_orm: cover IntegrityError fallback

8118321

Add a test that exercises the except-IntegrityError branch in the `aiida_computer` fixture. Mocks `get()` to miss and `store()` to raise IntegrityError, verifying that the fallback `get()` returns the correct computer.

drop parametrization; two individual tests

850d0a3

GeigerJ2 approved these changes May 4, 2026

View reviewed changes

GeigerJ2 merged commit f87e241 into aiidateam:main May 4, 2026
25 of 26 checks passed

agoscinski deleted the fix/xdist-database branch May 5, 2026 10:27

agoscinski mentioned this pull request May 5, 2026

Update node serialization/deserialization and other Pydantic issues #6990

Merged

GeigerJ2 mentioned this pull request May 7, 2026

🐛 aiida_localhost: suffix label with xdist-worker id #7363

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Handle `IntegrityError` in `aiida_computer` fixture#7349

🐛 Handle `IntegrityError` in `aiida_computer` fixture#7349
GeigerJ2 merged 6 commits into
aiidateam:mainfrom
agoscinski:fix/xdist-database

agoscinski commented Apr 29, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

agoscinski commented Apr 30, 2026

Uh oh!

GeigerJ2 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

agoscinski commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

agoscinski commented Apr 30, 2026

Uh oh!

GeigerJ2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agoscinski commented Apr 29, 2026 •

edited

Loading

codecov Bot commented Apr 29, 2026 •

edited

Loading