Skip to content

ansible validation fixes#1999

Open
bgruening wants to merge 40 commits intomasterfrom
validate_fixes
Open

ansible validation fixes#1999
bgruening wants to merge 40 commits intomasterfrom
validate_fixes

Conversation

@bgruening
Copy link
Copy Markdown
Member

@bgruening bgruening commented Apr 11, 2026

This needs #1998

It implements basic YAML validation but also ansible-lint validation.
To make ansible-lint work on public CI without vaults, I move the loading of vault-files into a pre_task. I hope this is ok, given the nice syntax checks that we get. I tried then to fix all remaining lints and hits and I think the playbook is with this in a better shape. I recommend reviewing this PR after #1998 and then commit-by-commit.

Closes https://github.com/usegalaxy-eu/issues/issues/362.

domgz added a commit that referenced this pull request Apr 13, 2026
Postpone the use of `make validate` for PR #1999.

Co-authored-by: Björn Grüning <bjoern@gruenings.eu>
@domgz domgz mentioned this pull request Apr 13, 2026
Copy link
Copy Markdown
Contributor

@gsaudade99 gsaudade99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just sintax review, I need to think on the new way of loading secrets within the pre_tasks

autoremove: yes

- name: Upgrade packages
apt:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this one-off roles?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know :) I decided to leave it in ... because I have no idea if this is still useful.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be cleaned up separately I guess.

Comment thread roles/hostname/tasks/main.yml Outdated
Comment thread roles/hxr.docker-ssl-client/tasks/main.yml Outdated
Comment thread requirements.yaml
Comment thread telescope.yml
- hxr.autofs
# BEGIN custom
- usegalaxy-eu.gxadmin
- galaxyproject.gxadmin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we out it out of Begin Custom?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon? :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to rephrase it: "Should we drop our fork?" (involves checking what are the differences with upstream).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes. We should not rely on a fork. Everything as upstream as possible.

Comment thread requirements.yaml
Copy link
Copy Markdown
Contributor

@domgz domgz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally, I strongly disagree with loading variables in this way,

---
- name: Denbi stratum0
  hosts: denbistratum0
  become: true
  vars:
    cvmfs_role: 'stratum0'
    usegalaxy_eu_autofs_mounts:
      - vdb
    playbook_secret_vars_files:
      - secret_group_vars/all.yml
  pre_tasks:
    - name: Load secret variables
      ansible.builtin.include_vars:
        file: "{{ item }}"
      loop: "{{ playbook_secret_vars_files }}"
      no_log: true
    - name: Create /srv symlink to mounted data
      ansible.builtin.file:
        src: /data/vol/
        dest: /srv
        owner: root
...

because it "messes up" with the variable loading mechanism in Ansible. We lose flexibility to define variables in the various different allowed ways (e.g. I guess this doesn't work with import_task and import_playbook, which I plan to use heavily to blend playbooks with KVM images), and it forces us to distinguish between secret and non-secret variables. Moreover we need to have that extra piece of code all over the place.

I would rather strongly advocate for:

  • Moving playbook-dependent workflows to Jenkins and use the GitHub integration (we already do this for the infrastructure repository). I have also proposed this for the TPV Linting workflow https://github.com/usegalaxy-eu/issues/issues/944.
  • Simply remove all encrypted vars files for GitHub workflows (they're easy to detect because they have a header). If the files don't exist, Ansible cant't try to load them. Variables are lazily loaded by Ansible: if a variable is defined in terms of an encrypted variable located within one of the removed vault files, Ansible should complain only when actually trying to access that encrypted variable. This is precisely the workaround used for the TPV linting workflow, we could simply generalize it to wipe all encrypted files. Arguably this is the closest to a do-nothing strategy: one snippet of code takes care of all rather than loads of them that you have to remember to include and flexibility to put variables wherever it is convenient remains (which also reduces verbosity because you do not have to explicitly point to the vars files in the playbooks, Ansible knows what to load for example from the inventory groups that the host belongs to).
      - name: Workaround Ansible vault password not being available to GitHub.
        working-directory: "infrastructure-playbook"
        run: |
          rm -f group_vars/htcondor/vault.yml
          rm -f group_vars/htcondor-secondary/vault.yml
          rm -f group_vars/all/ssh-keys_vault.yml

In addition, suggest adding a .git-blame-ignore-revs file in an extra commit that references most commits from this PR (should be done only right before merging, for now changes may still occur).


I see many cases in which shell was replaced by command. Beware that command is very different from shell. command runs the executable directly, while shell uses a shell. What works in a shell is not guaranteed to work when using command (I have experienced this myself). We should just fix the aesthetics, i.e. shell -> ansible.builtin.shell without changing the module.


What happened to the mounts submodule exactly?


The rest looks good :)

EDIT: but ofc under no circumstances merge if linting is not passing and/or concerns are still unresolved! I'd suggest also doing a test run --diff --check of most playbooks before merging, really our whole infrastructure can "break" from this PR.

Comment thread .github/workflows/ci.yml Outdated
Comment thread .github/requirements-python-lint.txt Outdated
Comment thread .github/workflows/ci.yml Outdated
Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/ci.yml
autoremove: yes

- name: Upgrade packages
apt:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be cleaned up separately I guess.

Comment thread roles/hxr.monitor-cvmfs/tasks/main.yml Outdated
Comment thread subdomains.yml Outdated
- name: Restart Galaxy
shell: |
cd /opt/galaxy/ && source /opt/galaxy/.bashrc && /usr/bin/gxadmin gunicorn handler-restart && sudo -u galaxy /usr/bin/galaxy-sync-to-nfs
ansible.builtin.command: cd /opt/galaxy/ && source /opt/galaxy/.bashrc && /usr/bin/gxadmin gunicorn handler-restart && sudo -u galaxy /usr/bin/galaxy-sync-to-nfs
Copy link
Copy Markdown
Contributor

@domgz domgz Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good example of a case in which shell cannot be replaced with command (see main review comment). The ability to chain commands is provided by the shell.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ansible.builtin.command: cd /opt/galaxy/ && source /opt/galaxy/.bashrc && /usr/bin/gxadmin gunicorn handler-restart && sudo -u galaxy /usr/bin/galaxy-sync-to-nfs
ansible.builtin.shell: cd /opt/galaxy/ && source /opt/galaxy/.bashrc && /usr/bin/gxadmin gunicorn handler-restart && sudo -u galaxy /usr/bin/galaxy-sync-to-nfs

Comment thread telescope.yml
- hxr.autofs
# BEGIN custom
- usegalaxy-eu.gxadmin
- galaxyproject.gxadmin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to rephrase it: "Should we drop our fork?" (involves checking what are the differences with upstream).

bgruening and others added 2 commits April 15, 2026 11:50
Co-authored-by: José Manuel Domínguez <43052541+kysrpex@users.noreply.github.com>
@bgruening
Copy link
Copy Markdown
Member Author

  • I don't know ansible enough to discuss the variable loading, and know about the implications. My thinking was: linting should be fast, a few seconds, cached, and on github actions. But I was also not expecting that ansible has no build-in way to ignore vaults. The way here in the PR was the only way I found. I have not tried deleting the encrypted files I think. I can try this.

    • I will not spent much more time on this PR, and I also don't think its a priority, maybe we leave all the github stuff out and you need to run it manually via Makefile for the time being
  • command -> shell: good catch, I will try to work on that

  • mount submodule: Upps, I thought it was just on my local checkout. Do we need it as submodule? If so why? I will revert that.

@domgz
Copy link
Copy Markdown
Contributor

domgz commented Apr 15, 2026

* mount submodule: Upps, I thought it was just on my local checkout. Do we need it as submodule? If so why? I will revert that.

Having some sort of in-repository integration is very important. Variables from the mounts repository are used in this repository. Not having this integration means variables are not available, with all it entails. Of course if you remember to (and document) that you have to download another repository to use this one you don't need integration, but the submodule makes it clear to everybody (agents be it human or artificial, and systems) that there is a dependency.

Nowadays I am also aware of Git subtree, and while it looks nice I am not sure it is better than submodule for this use case.

@bgruening
Copy link
Copy Markdown
Member Author

Its green again.

@bgruening
Copy link
Copy Markdown
Member Author

import_tasks and import_playbook should work imho, but only if they do not need the secrets at parse time. Which I think we don't need.

@domgz
Copy link
Copy Markdown
Contributor

domgz commented Apr 15, 2026

Thanks 👍. This needs some testing from our side, there are still some shell -> command changes and stuff like that, and a script for removing all vault files still needs to be part of the PR. Since none of us really has time to work deeper on it now, it should stay open for a couple of days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants