Skip to content

🐛 verdi process repair: fix ZMQ broker support#7350

Merged
GeigerJ2 merged 1 commit into
aiidateam:mainfrom
agoscinski:fix/zmq-process-repair
May 4, 2026
Merged

🐛 verdi process repair: fix ZMQ broker support#7350
GeigerJ2 merged 1 commit into
aiidateam:mainfrom
agoscinski:fix/zmq-process-repair

Conversation

@agoscinski
Copy link
Copy Markdown
Collaborator

@agoscinski agoscinski commented Apr 29, 2026

The first design started and stopped the broker during process repair. A simpler solution that has less side effects is to write the revived tasks directly into the persistent storage of the zmq broker. Since it is local only broker, it is an okay solution.


For ZMQ, the broker only runs inside the daemon, which must be stopped for repair. Write revival tasks directly to the PersistentQueue on disk so the broker picks them up when the daemon is restarted. This avoids starting a temporary broker process and the timing issues that come with it. Also replace RabbitMQ-specific wording in user-facing messages and test assertions with broker-agnostic terms.

@agoscinski agoscinski changed the title 🐛 verdi process repair: start temporary ZMQ broker for revival (#7284) 🐛 verdi process repair: start temporary ZMQ broker for revival Apr 29, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.26%. Comparing base (06968d4) to head (23fe21a).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7350      +/-   ##
==========================================
- Coverage   80.26%   80.26%   -0.00%     
==========================================
  Files         577      577              
  Lines       45497    45510      +13     
==========================================
+ Hits        36514    36523       +9     
- Misses       8983     8987       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
…team#7350

The ZMQ broker only runs inside the daemon, but `verdi process repair`
requires the daemon to be stopped.  When zombie processes need to be
revived, a temporary broker subprocess is now started so the
communicator can submit `continue_process` tasks, and stopped in a
`finally` block afterwards.  Also replace RabbitMQ-specific wording
in user-facing messages with the broker-agnostic term.
@agoscinski agoscinski force-pushed the fix/zmq-process-repair branch from e28f95f to 0e4428f Compare April 29, 2026 20:22
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
…team#7350

The ZMQ broker only runs inside the daemon, but `verdi process repair`
requires the daemon to be stopped.  When zombie processes need to be
revived, a temporary broker subprocess is now started so the
communicator can submit `continue_process` tasks, and stopped in a
`finally` block afterwards.  Also replace RabbitMQ-specific wording
in user-facing messages with the broker-agnostic term.
@agoscinski agoscinski force-pushed the fix/zmq-process-repair branch from 0e4428f to 4d98b1e Compare April 29, 2026 20:22
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 30, 2026
…team#7350

The ZMQ broker only runs inside the daemon, but `verdi process repair`
requires the daemon to be stopped.  When zombie processes need to be
revived, a temporary broker subprocess is now started so the
communicator can submit `continue_process` tasks, and stopped in a
`finally` block afterwards.  Also replace RabbitMQ-specific wording
in user-facing messages with the broker-agnostic term.
@agoscinski agoscinski force-pushed the fix/zmq-process-repair branch from 4d98b1e to 425770b Compare April 30, 2026 13:49
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 30, 2026
For ZMQ, the broker only runs inside the daemon, which must be stopped
for repair.  Write revival tasks directly to the PersistentQueue on
disk so the broker picks them up when the daemon is restarted.  This
avoids starting a temporary broker process and the timing issues that
come with it.  Also replace RabbitMQ-specific wording in user-facing
messages and test assertions with broker-agnostic terms.
@agoscinski agoscinski force-pushed the fix/zmq-process-repair branch from 425770b to 033c187 Compare April 30, 2026 15:16
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 30, 2026
For ZMQ, the broker only runs inside the daemon, which must be stopped
for repair.  Write revival tasks directly to the PersistentQueue on
disk so the broker picks them up when the daemon is restarted.  This
avoids starting a temporary broker process and the timing issues that
come with it.  Also replace RabbitMQ-specific wording in user-facing
messages and test assertions with broker-agnostic terms.
@agoscinski agoscinski force-pushed the fix/zmq-process-repair branch from 033c187 to e50ae14 Compare April 30, 2026 15:19
@agoscinski agoscinski requested a review from GeigerJ2 April 30, 2026 15:49
@agoscinski agoscinski marked this pull request as ready for review April 30, 2026 15:49
queue.push(task_id, {'body': body, 'no_reply': True})
echo.echo_report(f'Revived process `{pid}`')
else:
process_controller = manager.get_process_controller()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't

For ZMQ, the broker only runs inside the daemon, which must be stopped
for repair.  Write revival tasks directly to the PersistentQueue on
disk so the broker picks them up when the daemon is restarted.  This
avoids starting a temporary broker process and the timing issues that
come with it.  Also replace RabbitMQ-specific wording in user-facing
messages and test assertions with broker-agnostic terms.
@agoscinski agoscinski force-pushed the fix/zmq-process-repair branch from e50ae14 to 23fe21a Compare April 30, 2026 20:31
@GeigerJ2 GeigerJ2 changed the title 🐛 verdi process repair: start temporary ZMQ broker for revival 🐛 verdi process repair: fix ZMQ broker support May 4, 2026
Copy link
Copy Markdown
Collaborator

@GeigerJ2 GeigerJ2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will open a follow-up PR to clean up the code a bit — mostly lifting the isinstance(broker, ZmqBroker) branching onto the broker interface itself, so the CLI doesn't need to special-case ZMQ vs RabbitMQ. But OK to get this one merged before.

@GeigerJ2 GeigerJ2 merged commit db1a28b into aiidateam:main May 4, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants