I create the issue to keep the particularity in mind of this fix. I don't think we have any resources to solve this problem in any near-term future.
Context
PR #7344 introduced create_runner(communicator=None) in run* launchers and FunctionProcess.run_get_node to avoid eagerly connecting to the broker when running processes locally. These runners are intentionally not stored on the Manager (i.e. they don't become the global runner).
Why not use the global runner?
The global runner (Manager.get_runner()) is cached and shared. If we create it without a communicator for a run* call, a subsequent submit would get the same runner — without a communicator — and fail. We can't set the communicator after construction because the Runner wires it up in __init__. Also resetting the runner in submit can cause issues with the global event loop.
Why not close the local runner?
Runner.close() closes the underlying asyncio event loop. The runner doesn't create its own loop — it calls plumpy.get_or_create_event_loop(), which returns the global event loop if one exists. Closing it would break anything else sharing that loop (the global runner, the daemon, etc.).
We could check whether an event loop existed before creating the runner and only close if we "own" it, but this leaks event loop implementation details into the launcher layer and is fragile.
Why is the leak acceptable?
plumpy.get_or_create_event_loop() maintains a single global event loop. All unmanaged runners reuse it, so repeated run* calls don't accumulate loops. The only leaked object is the Runner itself (transport queue, job manager) which is lightweight.
Proper fix
Decouple the communicator from Runner construction so it can be attached lazily. This would allow the global runner to start without a communicator and connect one on demand when submit needs it. This requires changes in plumpy (Runner/event loop ownership) and should be addressed there.
I create the issue to keep the particularity in mind of this fix. I don't think we have any resources to solve this problem in any near-term future.
Context
PR #7344 introduced
create_runner(communicator=None)inrun*launchers andFunctionProcess.run_get_nodeto avoid eagerly connecting to the broker when running processes locally. These runners are intentionally not stored on theManager(i.e. they don't become the global runner).Why not use the global runner?
The global runner (
Manager.get_runner()) is cached and shared. If we create it without a communicator for arun*call, a subsequentsubmitwould get the same runner — without a communicator — and fail. We can't set the communicator after construction because theRunnerwires it up in__init__. Also resetting the runner insubmitcan cause issues with the global event loop.Why not close the local runner?
Runner.close()closes the underlying asyncio event loop. The runner doesn't create its own loop — it callsplumpy.get_or_create_event_loop(), which returns the global event loop if one exists. Closing it would break anything else sharing that loop (the global runner, the daemon, etc.).We could check whether an event loop existed before creating the runner and only close if we "own" it, but this leaks event loop implementation details into the launcher layer and is fragile.
Why is the leak acceptable?
plumpy.get_or_create_event_loop()maintains a single global event loop. All unmanaged runners reuse it, so repeatedrun*calls don't accumulate loops. The only leaked object is theRunneritself (transport queue, job manager) which is lightweight.Proper fix
Decouple the communicator from
Runnerconstruction so it can be attached lazily. This would allow the global runner to start without a communicator and connect one on demand whensubmitneeds it. This requires changes in plumpy (Runner/event loop ownership) and should be addressed there.