Release management for VGCN images

Edited by @mira-miracoli

Tasks:

- [x] create python script using argparse ([usegalaxy-eu/vgcn#80](https://github.com/usegalaxy-eu/vgcn/pull/80))
- [x] automatic requirements installation/check/remove with conda
- [x] publishing images to OpenStack and static/vgcn with flags
- [x] creates filenames the following way (see last comments): `vgcn~<provisioning>~<os>~<date>~<seconds>~<branch>~<hash>~<comment>`
- [x] tested on own Laptop and on Jenkins with `<comment> = "test"`
- [ ] images tested for functioning with HTCondor

---

Original post from @kysrpex 
At the moment, our publicly released images are named like this: "vggp-v60-j324-67edcc87f400-main" (see https://usegalaxy.eu/static/vgcn/ for more examples). Images uploaded to OpenStack have similar names.

The names are constructed from a [build tag](https://github.com/usegalaxy-eu/jenkins-scripts/blob/0cef63dc72eec2171868719341b8cce9c9c247fa/vgcn-build.sh#L23-L34)
```bash
# Construct build tag
GIT_COMMIT_SHORT=`git log --format="%H" -n 1 | sed -e 's/^$.\{12\}$.*/\1/g'`
echo "GIT_COMMIT_SHORT=$GIT_COMMIT_SHORT"
VG_BUILD=`cat ansible-roles/group_vars/all.yml | grep '^vg_build:' | sed 's/vg_build: //g'`
echo "VG_BUILD=$VG_BUILD"
NICE_BRANCH=`git name-rev --name-only HEAD | sed 's|remotes/origin/||g'`
echo "NICE_BRANCH=$NICE_BRANCH"
# continue with the same build01 numbering
BN=`expr $BUILD_NUMBER + 205`
echo "BN=$BN"
BUILD_TAG="v$VG_BUILD-j$BN-$GIT_COMMIT_SHORT-$NICE_BRANCH"
echo "BUILD_TAG=$BUILD_TAG"
```
that is used to generate the [release tag](https://github.com/usegalaxy-eu/jenkins-scripts/blob/0cef63dc72eec2171868719341b8cce9c9c247fa/vgcn-build.sh#L68-L79) (the final name)
```bash
function release_tag {
    local flavour=$1
    local tag
    if [[ "$flavour" == "vgcn-bwcloud-gpu"* ]]; then
        tag="vggp-gpu-$BUILD_TAG"
    elif [[ "$flavour" == "vgcn-bwcloud-"* ]]; then
        tag="vggp-$BUILD_TAG"
    else
        tag="$flavour-$BUILD_TAG"
    fi
    echo $tag    
}
```
by calling the `release_tag` function using `$flavour` as an input.
```bash
release_tag $flavour
```

The good choices I see in our current approach are:
- We include the commit hash in the image name.
- We include the branch name in the image name.

The bad choices are:
- We have defined some meaning for the version number (e.g. v60 includes features A and B), but it is not clear under which conditions this number should change and how should it change.
- We have made the mistake of resetting the build number at some point, meaning that images with a higher build number are not necessarily more recent.
- We do not include any hint of the operating system on which the images are based (e.g. Rocky Linux 9.2, AlmaLinux 8.8).

Neutral choices:
- What does "vggp" mean?

The bad choices make it difficult to "compare" two image versions (determining which one is newer when it makes sense), or quickly inferring what is in them (fortunately the branch name and commit hash can still be used for this purpose).

We need a naming scheme (and possibly a simple branching model) that addresses those problems. Let's think of what information is needed or would be useful to put on a version tag for an image.

- Commit date.
- Commit time.
- Git branch, assuming it conveys behavior differences from the main branch (e.g. internal images that connect to the [secondary HTCondor cluster](https://github.com/usegalaxy-eu/issues/issues/348), or the patched GPU images that run a specific kernel version).
- Provisioning (a.k.a. flavors).
- Commit hash.
- Operating system.
- Extra information that conveys the image may still be different from what one would get building it from the git repository (e.g. local changes).

I think we could leave out:
- Attempting to include a number (e.g. v60) that conveys the features of the image. We are failing at doing this extra effort already and thus not getting anything valuable out of it.
- Build numbers. The "same image" (we are not aiming here at real reproducible builds) is supposed to be produced from the same commit hash.

A possible naming scheme could be `vgcn-<date>-<seconds>-<branch>~<provisioning>~<os>~<hash>-<comment>`, where:
- Using `vgcn` could immediately help distinguish the old naming scheme from the new one (if there is not a good reason for the name `vggp`).
- `<date>` is the commit date in `YYYY-MM-DD` format. For example: `2023-10-19`.
- `<seconds>` is the commit time measured in the number of seconds elapsed since the start of the commit date (i.e. since `YYYY-MM-DDT00:00:00`). For example: `58706`.
- `<branch>` is the name of the Git branch. Git branches should differ from _main_ only in small details and be temporary, just for patchwork.
- `<provisioning>` are the playbooks that have been used to provision the image. It could be a + separated list of playbook names, for example `generic+workers+internal+gpu`. Optionally we could allow the use of at most one alias (the string would start with `+`) to have shorter names and avoid confusing people. For example, `+external` could be an alias for `generic+workers+external`, and the alias could be combined with more playbooks, for example `+external+gpu`.
- `<os>` is the name of the Packer build without the source (e.g. `centos-8.5.2111-x86_64`).
- `<hash>` is the Git commit hash.
- `-<comment>` is any extra string to identify deviations from the Git commit that was checked out at the moment of building the image. Jenkins would omit this part. If you build an image locally with any deviation from the checked out commit and share it with others, it would be nice if you include something here.

A sample name using this scheme could be `vgcn-2023-10-19-58706-main~+internal~centos-8.5.2111~x86_64-687c70f-local_build_different_motd`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release management for VGCN images #78

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Release management for VGCN images #78

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions