Skip to content

restart: allow restarting of stopped workflows #752

@tiborsimko

Description

@tiborsimko

Current behaviour

When a user stops a long-running workflow for a reason or another, and would like to restart it, REANA currently replies:

$ reana-client restart -w myanalysis
==> ERROR: Cannot start workflow myanalysis:
Only finished or failed workflows can be restarted.

This is taking precautions that are useful for some workflows engines (such as Serial), but that are not necessary for other workflow enignes (such as Snakemake), where a stopped workflow should be perfectly restart-friendly due to Snakemake inner job-caching capabilities.

Expected behaviour

It would be useful to allow restarting workflows also from the stopped state. We have two options:

  1. Let the user make a choice via CLI option, i.e. by default do not allow restart of stopped workflows, but if the user usesrestart --force CLI option, then this would trigger restart for any supported workfow engine.
  2. Let the restart be allowed globally without any new --force option if the given workflow engine is safe to restart (such as Snakemake), and forbidden for others if they are not fully safe to restart at some random point in time (such as Serial, but we would have to study each supported workflow engine to confirm/infirm its safeness).

I think the second option would be probably more user-friendly.

How to reproduce

Take RooFit example and add some sleep time to the second fitting step:

$ cd reana-demo-root6-roofit
$ vim workflow/snakemake/Snakefile
$ $ git diff -- workflow/snakemake/Snakefile
diff --git a/workflow/snakemake/Snakefile b/workflow/snakemake/Snakefile
index 06cc008..c342319 100644
--- a/workflow/snakemake/Snakefile
+++ b/workflow/snakemake/Snakefile
@@ -41,4 +41,4 @@ rule fitdata:
     resources:
         kubernetes_memory_limit="256Mi"
     shell:
-        "root -b -q '{input.fitdata_tool}(\"{input.data}\",\"{output}\")'"
+        "sleep 60 && root -b -q '{input.fitdata_tool}(\"{input.data}\",\"{output}\")'"

Run the workflow and stop it whilst being on the second step:

$ reana-client run -w myanalysis -f reana-snakemake.yaml
$ sleep 20
$ reana-client stop -w myanalysis --force
$ reana-client status -w myanalysis | awk '{print $6}'
STATUS
stopped

Try to restart and see troubles:

$ reana-client restart -w myanalysis
==> ERROR: Cannot start workflow myanalysis:
Only finished or failed workflows can be restarted.

Manually update its status from 'stopped' to 'failed':

$ kubectl exec -i -t deployment/reana-db -- /bin/bash
reana=# update __reana.workflow set status='failed' where status='stopped' and id_='36264f19-48a5-419f-a919-4584d48450e7';                                                                                                                                                                                            
UPDATE 1

Now restart should work:

$ reana-client restart -w myanalysis

Observe that the workflow finishes well and that it ran only the 2nd fitting step (1/1):

$ reana-client status -w myanalysis
NAME         RUN_NUMBER   CREATED               STARTED               ENDED                 STATUS     PROGRESS
myanalysis   2.1          2025-11-26T10:04:09   2025-11-26T10:04:20   2025-11-26T10:05:40   finished   1/1

It is the goal of this issue to avoid having to do the manual stopped->failed workflow status update step so that the reana-client restart command would work out-of-the-box for Snakemake workflows even from the "stopped" state.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Ready for work

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions