Issue
$ eb --fetch PETSc-3.20.3-foss-2023a.eb --force-download
Couldn't find file petsc-3.20.3.tar.gz anywhere, and downloading it didn't work either... Paths attempted (in order): ...
/home/casparl/.local/easybuild/sources/petsc-3.20.3.tar.gz, https://web.cels.anl.gov/projects/petsc/download/release-snapshots/petsc-3.20.3.tar.gz, https://sources.easybuild.io/p/PETSc/petsc-3.20.3.tar.gz (took 0 secs)
In the logs:
== 2025-06-17 11:24:31,727 filetools.py:903 INFO Attempt 1 of downloading https://web.cels.anl.gov/projects/petsc/download/release-snapshots/petsc-3.20.3.tar.gz to /home/casparl/.local/easybuild/sources/p/PETSc/petsc-3.20.3.tar.gz failed, trying again...
== 2025-06-17 11:24:31,727 filetools.py:908 INFO Downloading using requests package instead of urllib2
== 2025-06-17 11:24:32,037 filetools.py:886 WARNING URL https://web.cels.anl.gov/projects/petsc/download/release-snapshots/petsc-3.20.3.tar.gz was not found (HTTP response code 403), not trying again
== 2025-06-17 11:24:32,037 filetools.py:923 WARNING Download of https://web.cels.anl.gov/projects/petsc/download/release-snapshots/petsc-3.20.3.tar.gz to /home/casparl/.local/easybuild/sources/p/PETSc/petsc-3.20.3.tar.gz failed, done trying
== 2025-06-17 11:24:32,105 filetools.py:886 WARNING URL https://sources.easybuild.io/p/PETSc/petsc-3.20.3.tar.gz was not found (HTTP response code 404), not trying again
== 2025-06-17 11:24:32,106 filetools.py:923 WARNING Download of https://sources.easybuild.io/p/PETSc/petsc-3.20.3.tar.gz to /home/casparl/.local/easybuild/sources/p/PETSc/petsc-3.20.3.tar.gz failed, done trying
Cause
In EasyBuild
# use custom HTTP header
headers = {'User-Agent': 'EasyBuild', 'Accept': '*/*'}
Trying this interactively, I see:
>>> headers = {
... "User-Agent": "EasyBuild",
... "Accept": "*/*",
... }
>>> response = requests.get(url, headers=headers)
>>> response.status_code
403
However, when we pretend to be Wget:
>>> headers = {
... "User-Agent": "Wget/1.21.1",
... "Accept": "*/*",
... }
>>> response = requests.get(url, headers=headers)
>>> response.status_code
200
According to my AI friend, some sites only allow specific values for the User-Agent, and block the rest (e.g. to avoid scraping etc). We could make our header such that we pretend to be Wget. Note that we could even do that as a third option, if all else fails (currently option one is using urllib, option 2 is using requests).
Issue
In the logs:
Cause
In EasyBuild
Trying this interactively, I see:
However, when we pretend to be Wget:
According to my AI friend, some sites only allow specific values for the User-Agent, and block the rest (e.g. to avoid scraping etc). We could make our header such that we pretend to be
Wget. Note that we could even do that as a third option, if all else fails (currently option one is usingurllib, option 2 is usingrequests).