PETSc downloads broken, because the site only seems to accept certain User-Agent values in the header

**Issue**

```
$ eb --fetch PETSc-3.20.3-foss-2023a.eb --force-download
Couldn't find file petsc-3.20.3.tar.gz anywhere, and downloading it didn't work either... Paths attempted (in order): ...
/home/casparl/.local/easybuild/sources/petsc-3.20.3.tar.gz, https://web.cels.anl.gov/projects/petsc/download/release-snapshots/petsc-3.20.3.tar.gz, https://sources.easybuild.io/p/PETSc/petsc-3.20.3.tar.gz  (took 0 secs)
```
In the logs:
```
== 2025-06-17 11:24:31,727 filetools.py:903 INFO Attempt 1 of downloading https://web.cels.anl.gov/projects/petsc/download/release-snapshots/petsc-3.20.3.tar.gz to /home/casparl/.local/easybuild/sources/p/PETSc/petsc-3.20.3.tar.gz failed, trying again...
== 2025-06-17 11:24:31,727 filetools.py:908 INFO Downloading using requests package instead of urllib2
== 2025-06-17 11:24:32,037 filetools.py:886 WARNING URL https://web.cels.anl.gov/projects/petsc/download/release-snapshots/petsc-3.20.3.tar.gz was not found (HTTP response code 403), not trying again
== 2025-06-17 11:24:32,037 filetools.py:923 WARNING Download of https://web.cels.anl.gov/projects/petsc/download/release-snapshots/petsc-3.20.3.tar.gz to /home/casparl/.local/easybuild/sources/p/PETSc/petsc-3.20.3.tar.gz failed, done trying
== 2025-06-17 11:24:32,105 filetools.py:886 WARNING URL https://sources.easybuild.io/p/PETSc/petsc-3.20.3.tar.gz was not found (HTTP response code 404), not trying again
== 2025-06-17 11:24:32,106 filetools.py:923 WARNING Download of https://sources.easybuild.io/p/PETSc/petsc-3.20.3.tar.gz to /home/casparl/.local/easybuild/sources/p/PETSc/petsc-3.20.3.tar.gz failed, done trying
```

**Cause**

In EasyBuild

```
 # use custom HTTP header
    headers = {'User-Agent': 'EasyBuild', 'Accept': '*/*'}
```

Trying this interactively, I see:

```
>>> headers = {
...     "User-Agent": "EasyBuild",
...     "Accept": "*/*",
... }
>>> response = requests.get(url, headers=headers)
>>> response.status_code
403
```

However, when we _pretend_ to be Wget:

```
>>> headers = {
...     "User-Agent": "Wget/1.21.1",
...     "Accept": "*/*",
... }
>>> response = requests.get(url, headers=headers)
>>> response.status_code
200
```

According to my AI friend, some sites only allow specific values for the User-Agent, and block the rest (e.g. to avoid scraping etc). We _could_ make our header such that we _pretend_ to be `Wget`. Note that we could even do that as a third option, if all else fails (currently option one is using `urllib`, option 2 is using `requests`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PETSc downloads broken, because the site only seems to accept certain User-Agent values in the header #4925

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PETSc downloads broken, because the site only seems to accept certain User-Agent values in the header #4925

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions