The issue
When using Shifter, some processes hang indefinitely for some containers.
Description
I'm using nextflow to run a pipeline on a server with SLURM and Shifter. I'm aware that shifter support is still experimental, but the experience has been mostly smooth. However, I noticed an issue where some processes that dependent on containers hanged indefinitely. After some digging, I found that the problem was in the shifter_pull function created in the sbatch, more precisely here:
function shifter_pull() {
local image=$1
local STATUS=$(shifter_img lookup $image)
if [[ $STATUS != READY && $STATUS != '' ]]; then
STATUS=$(shifter_img pull $image)
while [[ $STATUS != READY && $STATUS != FAILURE && $STATUS != '' ]]; do
sleep 5
STATUS=$(shifter_img pull $image)
done
fi
[[ $STATUS == FAILURE || $STATUS == '' ]] && echo "Shifter failed to pull image \`$image\`" >&2 && exit 1
}
While the function that fetches the STATUS mostly works (yielding only "READY" and proceeding), we have some images in our server that continuously yield:
This happens because the output the shifterimg -v $cmd $image command for those images is:
Message: {
"ENTRY": null,
"ENV": [
"PATH=/NGStools/gff3toembl/gff3toembl/scripts:/NGStools/ncbi-blast-2.6.0+/bin:/NGStools/cd-hit-v4.6.8-2017-0621:/NGStools/cd-hit-v4.6.8-2017-0621/cd-hit-auxtools:/NGStools/exonerate-2.2.0-x86_64/bin:/NGStools/prokka/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"LD_LIBRARY_PATH=/usr/local/lib:",
"PYTHONPATH=/NGStools/gff3toembl/gff3toembl:"
],
"WORKDIR": "MISSING",
"groupACL": [],
"id": "d67d39f3088e8c5c36909dfd20a0d59bf7491279419bb002c8248c39788018fe",
"itype": "docker",
"last_pull": 1509118253.997599,
"status": "READY",
"status_message": "Image ready",
"system": "lobo",
"tag": [
"ummidock/prokka:1.12"
],
"userACL": []
}
And the command shifterimg -v $cmd $image | awk -F: '$0~/status":/{gsub("[\\", ]","",$2);print $2}\' fetches the two status and status_message lines. It appears that, in these cases, the sbatch will never exit the while loop
I'm proposing a possible fix soon.
The issue
When using Shifter, some processes hang indefinitely for some containers.
Description
I'm using nextflow to run a pipeline on a server with SLURM and Shifter. I'm aware that shifter support is still experimental, but the experience has been mostly smooth. However, I noticed an issue where some processes that dependent on containers hanged indefinitely. After some digging, I found that the problem was in the
shifter_pullfunction created in the sbatch, more precisely here:While the function that fetches the
STATUSmostly works (yielding only "READY" and proceeding), we have some images in our server that continuously yield:This happens because the output the
shifterimg -v $cmd $imagecommand for those images is:And the command
shifterimg -v $cmd $image | awk -F: '$0~/status":/{gsub("[\\", ]","",$2);print $2}\'fetches the twostatusandstatus_messagelines. It appears that, in these cases, the sbatch will never exit thewhileloopI'm proposing a possible fix soon.