Current behaviour
When a user stops a long-running workflow for a reason or another, and would like to restart it, REANA currently replies:
$ reana-client restart -w myanalysis
==> ERROR: Cannot start workflow myanalysis:
Only finished or failed workflows can be restarted.
This is taking precautions that are useful for some workflows engines (such as Serial), but that are not necessary for other workflow enignes (such as Snakemake), where a stopped workflow should be perfectly restart-friendly due to Snakemake inner job-caching capabilities.
Expected behaviour
It would be useful to allow restarting workflows also from the stopped state. We have two options:
- Let the user make a choice via CLI option, i.e. by default do not allow restart of stopped workflows, but if the user uses
restart --force CLI option, then this would trigger restart for any supported workfow engine.
- Let the restart be allowed globally without any new
--force option if the given workflow engine is safe to restart (such as Snakemake), and forbidden for others if they are not fully safe to restart at some random point in time (such as Serial, but we would have to study each supported workflow engine to confirm/infirm its safeness).
I think the second option would be probably more user-friendly.
How to reproduce
Take RooFit example and add some sleep time to the second fitting step:
$ cd reana-demo-root6-roofit
$ vim workflow/snakemake/Snakefile
$ $ git diff -- workflow/snakemake/Snakefile
diff --git a/workflow/snakemake/Snakefile b/workflow/snakemake/Snakefile
index 06cc008..c342319 100644
--- a/workflow/snakemake/Snakefile
+++ b/workflow/snakemake/Snakefile
@@ -41,4 +41,4 @@ rule fitdata:
resources:
kubernetes_memory_limit="256Mi"
shell:
- "root -b -q '{input.fitdata_tool}(\"{input.data}\",\"{output}\")'"
+ "sleep 60 && root -b -q '{input.fitdata_tool}(\"{input.data}\",\"{output}\")'"
Run the workflow and stop it whilst being on the second step:
$ reana-client run -w myanalysis -f reana-snakemake.yaml
$ sleep 20
$ reana-client stop -w myanalysis --force
$ reana-client status -w myanalysis | awk '{print $6}'
STATUS
stopped
Try to restart and see troubles:
$ reana-client restart -w myanalysis
==> ERROR: Cannot start workflow myanalysis:
Only finished or failed workflows can be restarted.
Manually update its status from 'stopped' to 'failed':
$ kubectl exec -i -t deployment/reana-db -- /bin/bash
reana=# update __reana.workflow set status='failed' where status='stopped' and id_='36264f19-48a5-419f-a919-4584d48450e7';
UPDATE 1
Now restart should work:
$ reana-client restart -w myanalysis
Observe that the workflow finishes well and that it ran only the 2nd fitting step (1/1):
$ reana-client status -w myanalysis
NAME RUN_NUMBER CREATED STARTED ENDED STATUS PROGRESS
myanalysis 2.1 2025-11-26T10:04:09 2025-11-26T10:04:20 2025-11-26T10:05:40 finished 1/1
It is the goal of this issue to avoid having to do the manual stopped->failed workflow status update step so that the reana-client restart command would work out-of-the-box for Snakemake workflows even from the "stopped" state.
Current behaviour
When a user stops a long-running workflow for a reason or another, and would like to restart it, REANA currently replies:
This is taking precautions that are useful for some workflows engines (such as Serial), but that are not necessary for other workflow enignes (such as Snakemake), where a stopped workflow should be perfectly restart-friendly due to Snakemake inner job-caching capabilities.
Expected behaviour
It would be useful to allow restarting workflows also from the stopped state. We have two options:
restart --forceCLI option, then this would trigger restart for any supported workfow engine.--forceoption if the given workflow engine is safe to restart (such as Snakemake), and forbidden for others if they are not fully safe to restart at some random point in time (such as Serial, but we would have to study each supported workflow engine to confirm/infirm its safeness).I think the second option would be probably more user-friendly.
How to reproduce
Take RooFit example and add some sleep time to the second fitting step:
Run the workflow and stop it whilst being on the second step:
Try to restart and see troubles:
Manually update its status from 'stopped' to 'failed':
Now restart should work:
$ reana-client restart -w myanalysisObserve that the workflow finishes well and that it ran only the 2nd fitting step (1/1):
It is the goal of this issue to avoid having to do the manual stopped->failed workflow status update step so that the
reana-client restartcommand would work out-of-the-box for Snakemake workflows even from the "stopped" state.