ansible validation fixes by bgruening · Pull Request #1999 · usegalaxy-eu/infrastructure-playbook

bgruening · 2026-04-11T23:42:22Z

This needs #1998

It implements basic YAML validation but also ansible-lint validation.
To make ansible-lint work on public CI without vaults, I move the loading of vault-files into a pre_task. I hope this is ok, given the nice syntax checks that we get. I tried then to fix all remaining lints and hits and I think the playbook is with this in a better shape. I recommend reviewing this PR after #1998 and then commit-by-commit.

Closes https://github.com/usegalaxy-eu/issues/issues/362.

…inting

…g lint time and makes it work on CI

… x.collection.role

Postpone the use of `make validate` for PR #1999. Co-authored-by: Björn Grüning <bjoern@gruenings.eu>

gsaudade99

It was just sintax review, I need to think on the new way of loading secrets within the pre_tasks

gsaudade99 · 2026-04-14T13:44:30Z

        autoremove: yes

    - name: Upgrade packages
-      apt:


Do we need this one-off roles?

I don't know :) I decided to leave it in ... because I have no idea if this is still useful.

That can be cleaned up separately I guess.

gsaudade99 · 2026-04-14T14:54:16Z

    - hxr.autofs
    # BEGIN custom
-    - usegalaxy-eu.gxadmin
+    - galaxyproject.gxadmin


Should we out it out of Begin Custom?

I'll try to rephrase it: "Should we drop our fork?" (involves checking what are the differences with upstream).

Oh, yes. We should not rely on a fork. Everything as upstream as possible.

Co-authored-by: Björn Grüning <bjoern@gruenings.eu>

domgz

Fundamentally, I strongly disagree with loading variables in this way,

---
- name: Denbi stratum0
  hosts: denbistratum0
  become: true
  vars:
    cvmfs_role: 'stratum0'
    usegalaxy_eu_autofs_mounts:
      - vdb
    playbook_secret_vars_files:
      - secret_group_vars/all.yml
  pre_tasks:
    - name: Load secret variables
      ansible.builtin.include_vars:
        file: "{{ item }}"
      loop: "{{ playbook_secret_vars_files }}"
      no_log: true
    - name: Create /srv symlink to mounted data
      ansible.builtin.file:
        src: /data/vol/
        dest: /srv
        owner: root
...

because it "messes up" with the variable loading mechanism in Ansible. We lose flexibility to define variables in the various different allowed ways (e.g. I guess this doesn't work with import_task and import_playbook, which I plan to use heavily to blend playbooks with KVM images), and it forces us to distinguish between secret and non-secret variables. Moreover we need to have that extra piece of code all over the place.

I would rather strongly advocate for:

Moving playbook-dependent workflows to Jenkins and use the GitHub integration (we already do this for the infrastructure repository). I have also proposed this for the TPV Linting workflow https://github.com/usegalaxy-eu/issues/issues/944.
Simply remove all encrypted vars files for GitHub workflows (they're easy to detect because they have a header). If the files don't exist, Ansible cant't try to load them. Variables are lazily loaded by Ansible: if a variable is defined in terms of an encrypted variable located within one of the removed vault files, Ansible should complain only when actually trying to access that encrypted variable. This is precisely the workaround used for the TPV linting workflow, we could simply generalize it to wipe all encrypted files. Arguably this is the closest to a do-nothing strategy: one snippet of code takes care of all rather than loads of them that you have to remember to include and flexibility to put variables wherever it is convenient remains (which also reduces verbosity because you do not have to explicitly point to the vars files in the playbooks, Ansible knows what to load for example from the inventory groups that the host belongs to).

      - name: Workaround Ansible vault password not being available to GitHub.
        working-directory: "infrastructure-playbook"
        run: |
          rm -f group_vars/htcondor/vault.yml
          rm -f group_vars/htcondor-secondary/vault.yml
          rm -f group_vars/all/ssh-keys_vault.yml

In addition, suggest adding a .git-blame-ignore-revs file in an extra commit that references most commits from this PR (should be done only right before merging, for now changes may still occur).

I see many cases in which shell was replaced by command. Beware that command is very different from shell. command runs the executable directly, while shell uses a shell. What works in a shell is not guaranteed to work when using command (I have experienced this myself). We should just fix the aesthetics, i.e. shell -> ansible.builtin.shell without changing the module.

What happened to the mounts submodule exactly?

The rest looks good :)

EDIT: but ofc under no circumstances merge if linting is not passing and/or concerns are still unresolved! I'd suggest also doing a test run --diff --check of most playbooks before merging, really our whole infrastructure can "break" from this PR.

domgz · 2026-04-15T08:30:44Z

        autoremove: yes

    - name: Upgrade packages
-      apt:


That can be cleaned up separately I guess.

domgz · 2026-04-15T08:44:51Z

    - name: Restart Galaxy
-      shell: |
-        cd /opt/galaxy/ && source /opt/galaxy/.bashrc  && /usr/bin/gxadmin gunicorn handler-restart && sudo -u galaxy /usr/bin/galaxy-sync-to-nfs
+      ansible.builtin.command: cd /opt/galaxy/ && source /opt/galaxy/.bashrc && /usr/bin/gxadmin gunicorn handler-restart && sudo -u galaxy /usr/bin/galaxy-sync-to-nfs


This is a very good example of a case in which shell cannot be replaced with command (see main review comment). The ability to chain commands is provided by the shell.

Suggested change

ansible.builtin.command: cd /opt/galaxy/ && source /opt/galaxy/.bashrc && /usr/bin/gxadmin gunicorn handler-restart && sudo -u galaxy /usr/bin/galaxy-sync-to-nfs

ansible.builtin.shell: cd /opt/galaxy/ && source /opt/galaxy/.bashrc && /usr/bin/gxadmin gunicorn handler-restart && sudo -u galaxy /usr/bin/galaxy-sync-to-nfs

domgz · 2026-04-15T08:45:40Z

    - hxr.autofs
    # BEGIN custom
-    - usegalaxy-eu.gxadmin
+    - galaxyproject.gxadmin


I'll try to rephrase it: "Should we drop our fork?" (involves checking what are the differences with upstream).

Co-authored-by: José Manuel Domínguez <43052541+kysrpex@users.noreply.github.com>

bgruening · 2026-04-15T10:02:57Z

I don't know ansible enough to discuss the variable loading, and know about the implications. My thinking was: linting should be fast, a few seconds, cached, and on github actions. But I was also not expecting that ansible has no build-in way to ignore vaults. The way here in the PR was the only way I found. I have not tried deleting the encrypted files I think. I can try this.
- I will not spent much more time on this PR, and I also don't think its a priority, maybe we leave all the github stuff out and you need to run it manually via Makefile for the time being
command -> shell: good catch, I will try to work on that
mount submodule: Upps, I thought it was just on my local checkout. Do we need it as submodule? If so why? I will revert that.

domgz · 2026-04-15T10:14:48Z

* mount submodule: Upps, I thought it was just on my local checkout. Do we need it as submodule? If so why? I will revert that.

Having some sort of in-repository integration is very important. Variables from the mounts repository are used in this repository. Not having this integration means variables are not available, with all it entails. Of course if you remember to (and document) that you have to download another repository to use this one you don't need integration, but the submodule makes it clear to everybody (agents be it human or artificial, and systems) that there is a dependency.

Nowadays I am also aware of Git subtree, and while it looks nice I am not sure it is better than submodule for this use case.

bgruening · 2026-04-15T14:02:08Z

Its green again.

bgruening · 2026-04-15T14:05:58Z

import_tasks and import_playbook should work imho, but only if they do not need the secrets at parse time. Which I think we don't need.

domgz · 2026-04-15T14:22:01Z

Thanks 👍. This needs some testing from our side, there are still some shell -> command changes and stuff like that, and a script for removing all vault files still needs to be part of the PR. Since none of us really has time to work deeper on it now, it should stay open for a couple of days.

bgruening added 30 commits April 12, 2026 00:09

fix the YAML lint warning about missing document starts

726a398

use ansible.builtin where we can

4e14bd9

add names to all tasks

c573858

[needs proper review] add explicit mode to tasks

b13e89f

convert services to buildin.service

fb0f268

improve whitespaces

42ca5aa

improve tasks names

a538fca

the "collections:" shortcut is not needed anymore, according to the l…

f4d6790

…inting

convert more modules to proper ansible.* modules

9d5839c

replace shell with ansible.buildin.command

24e9eeb

replace even more modules with builtin

ded41de

fix a few more task names

530bcda

one more name change

fc9b746

use string for octal modes

8b62113

exclude external roles

1c54eb6

narf

7f5cbde

extent lint config

955cac7

add roles/collections explicitly

c1ed7a8

Load secreats as pre-task, this way the secrets are not loading durin…

c356721

…g lint time and makes it work on CI

use "name" instead of "package" in APT module

1ab7e1f

use builtin module for "mount"

553a001

quote octal value

82a54db

use "role" instead of "name" when we list roles

67bb4de

[needs review] add changed_when

48d8541

add traefik template, it has some loading issues

32d4c1f

tighten YAML lint

7907dc0

when we list roles that are part of a collection, use the full syntax…

54fec69

… x.collection.role

swap "alternative against community.general.alternatives"

a2db792

use nested syntax for autofs vars

26f7bf4

create fake ansible file, to make linting pass

a71e1a7

bgruening added 3 commits April 12, 2026 10:10

use DNF instead of YUM module

abc1155

add caching to CI

871bca3

fix ansible version for testing

8169b95

bgruening force-pushed the validate_fixes branch from 130d350 to 8169b95 Compare April 12, 2026 09:17

domgz added the enhancement label Apr 13, 2026

domgz added a commit that referenced this pull request Apr 13, 2026

Postpone make validate

e9069c3

Postpone the use of `make validate` for PR #1999. Co-authored-by: Björn Grüning <bjoern@gruenings.eu>

domgz mentioned this pull request Apr 13, 2026

Spring cleanup #1998

Merged

gsaudade99 reviewed Apr 14, 2026

View reviewed changes

bgruening commented Apr 14, 2026

View reviewed changes

Comment thread requirements.yaml

bgruening added 2 commits April 14, 2026 18:58

Apply suggestions from code review

ffa8f20

Co-authored-by: Björn Grüning <bjoern@gruenings.eu>

Merge branch 'master' into validate_fixes

94c972a

domgz reviewed Apr 15, 2026

View reviewed changes

bgruening and others added 2 commits April 15, 2026 11:50

Apply suggestions from code review

1abecb4

Co-authored-by: José Manuel Domínguez <43052541+kysrpex@users.noreply.github.com>

Update .github/requirements-python-lint.txt

58ffe08

bgruening added 3 commits April 15, 2026 13:30

use builtin shell not command

bdd8276

add submodule back

b275dac

fix CI

3082ffc

	ansible.builtin.command: cd /opt/galaxy/ && source /opt/galaxy/.bashrc && /usr/bin/gxadmin gunicorn handler-restart && sudo -u galaxy /usr/bin/galaxy-sync-to-nfs
	ansible.builtin.shell: cd /opt/galaxy/ && source /opt/galaxy/.bashrc && /usr/bin/gxadmin gunicorn handler-restart && sudo -u galaxy /usr/bin/galaxy-sync-to-nfs

Conversation

bgruening commented Apr 11, 2026 • edited by domgz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gsaudade99 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

domgz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

domgz Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bgruening commented Apr 15, 2026

Uh oh!

domgz commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bgruening commented Apr 15, 2026

Uh oh!

bgruening commented Apr 15, 2026

Uh oh!

domgz commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bgruening commented Apr 11, 2026 •

edited by domgz

Loading

domgz left a comment •

edited

Loading

domgz Apr 15, 2026 •

edited

Loading

domgz commented Apr 15, 2026 •

edited

Loading