Skip to content

Release management for VGCN images #78

@domgz

Description

@domgz

Edited by @mira-miracoli

Tasks:

  • create python script using argparse (usegalaxy-eu/vgcn#80)
  • automatic requirements installation/check/remove with conda
  • publishing images to OpenStack and static/vgcn with flags
  • creates filenames the following way (see last comments): vgcn~<provisioning>~<os>~<date>~<seconds>~<branch>~<hash>~<comment>
  • tested on own Laptop and on Jenkins with <comment> = "test"
  • images tested for functioning with HTCondor

Original post from @kysrpex
At the moment, our publicly released images are named like this: "vggp-v60-j324-67edcc87f400-main" (see https://usegalaxy.eu/static/vgcn/ for more examples). Images uploaded to OpenStack have similar names.

The names are constructed from a build tag

# Construct build tag
GIT_COMMIT_SHORT=`git log --format="%H" -n 1 | sed -e 's/^\(.\{12\}\).*/\1/g'`
echo "GIT_COMMIT_SHORT=$GIT_COMMIT_SHORT"
VG_BUILD=`cat ansible-roles/group_vars/all.yml | grep '^vg_build:' | sed 's/vg_build: //g'`
echo "VG_BUILD=$VG_BUILD"
NICE_BRANCH=`git name-rev --name-only HEAD | sed 's|remotes/origin/||g'`
echo "NICE_BRANCH=$NICE_BRANCH"
# continue with the same build01 numbering
BN=`expr $BUILD_NUMBER + 205`
echo "BN=$BN"
BUILD_TAG="v$VG_BUILD-j$BN-$GIT_COMMIT_SHORT-$NICE_BRANCH"
echo "BUILD_TAG=$BUILD_TAG"

that is used to generate the release tag (the final name)

function release_tag {
    local flavour=$1
    local tag
    if [[ "$flavour" == "vgcn-bwcloud-gpu"* ]]; then
        tag="vggp-gpu-$BUILD_TAG"
    elif [[ "$flavour" == "vgcn-bwcloud-"* ]]; then
        tag="vggp-$BUILD_TAG"
    else
        tag="$flavour-$BUILD_TAG"
    fi
    echo $tag    
}

by calling the release_tag function using $flavour as an input.

release_tag $flavour

The good choices I see in our current approach are:

  • We include the commit hash in the image name.
  • We include the branch name in the image name.

The bad choices are:

  • We have defined some meaning for the version number (e.g. v60 includes features A and B), but it is not clear under which conditions this number should change and how should it change.
  • We have made the mistake of resetting the build number at some point, meaning that images with a higher build number are not necessarily more recent.
  • We do not include any hint of the operating system on which the images are based (e.g. Rocky Linux 9.2, AlmaLinux 8.8).

Neutral choices:

  • What does "vggp" mean?

The bad choices make it difficult to "compare" two image versions (determining which one is newer when it makes sense), or quickly inferring what is in them (fortunately the branch name and commit hash can still be used for this purpose).

We need a naming scheme (and possibly a simple branching model) that addresses those problems. Let's think of what information is needed or would be useful to put on a version tag for an image.

  • Commit date.
  • Commit time.
  • Git branch, assuming it conveys behavior differences from the main branch (e.g. internal images that connect to the secondary HTCondor cluster, or the patched GPU images that run a specific kernel version).
  • Provisioning (a.k.a. flavors).
  • Commit hash.
  • Operating system.
  • Extra information that conveys the image may still be different from what one would get building it from the git repository (e.g. local changes).

I think we could leave out:

  • Attempting to include a number (e.g. v60) that conveys the features of the image. We are failing at doing this extra effort already and thus not getting anything valuable out of it.
  • Build numbers. The "same image" (we are not aiming here at real reproducible builds) is supposed to be produced from the same commit hash.

A possible naming scheme could be vgcn-<date>-<seconds>-<branch>~<provisioning>~<os>~<hash>-<comment>, where:

  • Using vgcn could immediately help distinguish the old naming scheme from the new one (if there is not a good reason for the name vggp).
  • <date> is the commit date in YYYY-MM-DD format. For example: 2023-10-19.
  • <seconds> is the commit time measured in the number of seconds elapsed since the start of the commit date (i.e. since YYYY-MM-DDT00:00:00). For example: 58706.
  • <branch> is the name of the Git branch. Git branches should differ from main only in small details and be temporary, just for patchwork.
  • <provisioning> are the playbooks that have been used to provision the image. It could be a + separated list of playbook names, for example generic+workers+internal+gpu. Optionally we could allow the use of at most one alias (the string would start with +) to have shorter names and avoid confusing people. For example, +external could be an alias for generic+workers+external, and the alias could be combined with more playbooks, for example +external+gpu.
  • <os> is the name of the Packer build without the source (e.g. centos-8.5.2111-x86_64).
  • <hash> is the Git commit hash.
  • -<comment> is any extra string to identify deviations from the Git commit that was checked out at the moment of building the image. Jenkins would omit this part. If you build an image locally with any deviation from the checked out commit and share it with others, it would be nice if you include something here.

A sample name using this scheme could be vgcn-2023-10-19-58706-main~+internal~centos-8.5.2111~x86_64-687c70f-local_build_different_motd.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions