🐛 verdi process repair: fix ZMQ broker support#7350
Merged
Conversation
verdi process repair: start temporary ZMQ broker for revival (#7284)verdi process repair: start temporary ZMQ broker for revival
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7350 +/- ##
==========================================
- Coverage 80.26% 80.26% -0.00%
==========================================
Files 577 577
Lines 45497 45510 +13
==========================================
+ Hits 36514 36523 +9
- Misses 8983 8987 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
agoscinski
added a commit
to agoscinski/aiida-core
that referenced
this pull request
Apr 29, 2026
…team#7350 The ZMQ broker only runs inside the daemon, but `verdi process repair` requires the daemon to be stopped. When zombie processes need to be revived, a temporary broker subprocess is now started so the communicator can submit `continue_process` tasks, and stopped in a `finally` block afterwards. Also replace RabbitMQ-specific wording in user-facing messages with the broker-agnostic term.
e28f95f to
0e4428f
Compare
agoscinski
added a commit
to agoscinski/aiida-core
that referenced
this pull request
Apr 29, 2026
…team#7350 The ZMQ broker only runs inside the daemon, but `verdi process repair` requires the daemon to be stopped. When zombie processes need to be revived, a temporary broker subprocess is now started so the communicator can submit `continue_process` tasks, and stopped in a `finally` block afterwards. Also replace RabbitMQ-specific wording in user-facing messages with the broker-agnostic term.
0e4428f to
4d98b1e
Compare
agoscinski
added a commit
to agoscinski/aiida-core
that referenced
this pull request
Apr 30, 2026
…team#7350 The ZMQ broker only runs inside the daemon, but `verdi process repair` requires the daemon to be stopped. When zombie processes need to be revived, a temporary broker subprocess is now started so the communicator can submit `continue_process` tasks, and stopped in a `finally` block afterwards. Also replace RabbitMQ-specific wording in user-facing messages with the broker-agnostic term.
4d98b1e to
425770b
Compare
agoscinski
added a commit
to agoscinski/aiida-core
that referenced
this pull request
Apr 30, 2026
For ZMQ, the broker only runs inside the daemon, which must be stopped for repair. Write revival tasks directly to the PersistentQueue on disk so the broker picks them up when the daemon is restarted. This avoids starting a temporary broker process and the timing issues that come with it. Also replace RabbitMQ-specific wording in user-facing messages and test assertions with broker-agnostic terms.
425770b to
033c187
Compare
agoscinski
added a commit
to agoscinski/aiida-core
that referenced
this pull request
Apr 30, 2026
For ZMQ, the broker only runs inside the daemon, which must be stopped for repair. Write revival tasks directly to the PersistentQueue on disk so the broker picks them up when the daemon is restarted. This avoids starting a temporary broker process and the timing issues that come with it. Also replace RabbitMQ-specific wording in user-facing messages and test assertions with broker-agnostic terms.
033c187 to
e50ae14
Compare
mbercx
reviewed
Apr 30, 2026
| queue.push(task_id, {'body': body, 'no_reply': True}) | ||
| echo.echo_report(f'Revived process `{pid}`') | ||
| else: | ||
| process_controller = manager.get_process_controller() |
For ZMQ, the broker only runs inside the daemon, which must be stopped for repair. Write revival tasks directly to the PersistentQueue on disk so the broker picks them up when the daemon is restarted. This avoids starting a temporary broker process and the timing issues that come with it. Also replace RabbitMQ-specific wording in user-facing messages and test assertions with broker-agnostic terms.
e50ae14 to
23fe21a
Compare
verdi process repair: start temporary ZMQ broker for revivalverdi process repair: fix ZMQ broker support
GeigerJ2
approved these changes
May 4, 2026
Collaborator
GeigerJ2
left a comment
There was a problem hiding this comment.
LGTM. Will open a follow-up PR to clean up the code a bit — mostly lifting the isinstance(broker, ZmqBroker) branching onto the broker interface itself, so the CLI doesn't need to special-case ZMQ vs RabbitMQ. But OK to get this one merged before.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The first design started and stopped the broker during process repair. A simpler solution that has less side effects is to write the revived tasks directly into the persistent storage of the zmq broker. Since it is local only broker, it is an okay solution.
For ZMQ, the broker only runs inside the daemon, which must be stopped for repair. Write revival tasks directly to the PersistentQueue on disk so the broker picks them up when the daemon is restarted. This avoids starting a temporary broker process and the timing issues that come with it. Also replace RabbitMQ-specific wording in user-facing messages and test assertions with broker-agnostic terms.