Skip to content

Pulling shifter image hangs #500

@ODiogoSilva

Description

@ODiogoSilva

The issue

When using Shifter, some processes hang indefinitely for some containers.

Description

I'm using nextflow to run a pipeline on a server with SLURM and Shifter. I'm aware that shifter support is still experimental, but the experience has been mostly smooth. However, I noticed an issue where some processes that dependent on containers hanged indefinitely. After some digging, I found that the problem was in the shifter_pull function created in the sbatch, more precisely here:

function shifter_pull() {
  local image=$1
  local STATUS=$(shifter_img lookup $image)
  if [[ $STATUS != READY && $STATUS != '' ]]; then
    STATUS=$(shifter_img pull $image)
    while [[ $STATUS != READY && $STATUS != FAILURE && $STATUS != '' ]]; do
      sleep 5
      STATUS=$(shifter_img pull $image)
    done
  fi

  [[ $STATUS == FAILURE || $STATUS == '' ]] && echo "Shifter failed to pull image \`$image\`" >&2  && exit 1
}

While the function that fetches the STATUS mostly works (yielding only "READY" and proceeding), we have some images in our server that continuously yield:

READY
Imageready

This happens because the output the shifterimg -v $cmd $image command for those images is:

Message: {
  "ENTRY": null, 
  "ENV": [
    "PATH=/NGStools/gff3toembl/gff3toembl/scripts:/NGStools/ncbi-blast-2.6.0+/bin:/NGStools/cd-hit-v4.6.8-2017-0621:/NGStools/cd-hit-v4.6.8-2017-0621/cd-hit-auxtools:/NGStools/exonerate-2.2.0-x86_64/bin:/NGStools/prokka/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", 
    "LD_LIBRARY_PATH=/usr/local/lib:", 
    "PYTHONPATH=/NGStools/gff3toembl/gff3toembl:"
  ], 
  "WORKDIR": "MISSING", 
  "groupACL": [], 
  "id": "d67d39f3088e8c5c36909dfd20a0d59bf7491279419bb002c8248c39788018fe", 
  "itype": "docker", 
  "last_pull": 1509118253.997599, 
  "status": "READY", 
  "status_message": "Image ready", 
  "system": "lobo", 
  "tag": [
    "ummidock/prokka:1.12"
  ], 
  "userACL": []
}

And the command shifterimg -v $cmd $image | awk -F: '$0~/status":/{gsub("[\\", ]","",$2);print $2}\' fetches the two status and status_message lines. It appears that, in these cases, the sbatch will never exit the while loop

I'm proposing a possible fix soon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions