Skip to content

[editorial] Use canonical link into semconv#4554

Merged
reyang merged 1 commit intoopen-telemetry:mainfrom
chalin:chalin-im-use-canonical-urls-2025-06-12
Jun 12, 2025
Merged

[editorial] Use canonical link into semconv#4554
reyang merged 1 commit intoopen-telemetry:mainfrom
chalin:chalin-im-use-canonical-urls-2025-06-12

Conversation

@chalin
Copy link
Copy Markdown
Contributor

@chalin chalin commented Jun 12, 2025

@chalin
Copy link
Copy Markdown
Contributor Author

chalin commented Jun 12, 2025

Can someone rerun the checks, now that NPM is up again.

@reyang reyang enabled auto-merge June 12, 2025 22:26
@reyang reyang added this pull request to the merge queue Jun 12, 2025
Merged via the queue into open-telemetry:main with commit 8f01d12 Jun 12, 2025
8 of 11 checks passed
@chalin chalin deleted the chalin-im-use-canonical-urls-2025-06-12 branch June 12, 2025 22:49
github-merge-queue Bot pushed a commit that referenced this pull request Jun 16, 2025
Related to @chalin's
#4554

We could enforce no redirects via lychee's `max_redirects = 0`
configuration, but we'd need to make a few exclusions for that to work,
and it would only make our link check failures even more common as 3rd
party sites move things around. Probably a better option to address
@chalin's specific ask would be to have a separate lychee run with
`max_redirects = 0` that only checks https://opentelemetry.io links.

Most of this was done using:

<details>
<summary>python script</summary>

```
import os
import re
import requests
import concurrent.futures

def update_links_in_file(filepath):
    with open(filepath, 'r', encoding='utf-8') as f:
        content = f.read()

    def replacer(match):
        url = match.group(2)
        new_url = get_redirect_url(url)
        if new_url and new_url != url:
            return f'[{match.group(1)}]({new_url})'
        return match.group(0)

    def replacer_ref(match):
        url = match.group(2)
        new_url = get_redirect_url(url)
        if new_url and new_url != url:
            return f'[{match.group(1)}]: {new_url}'
        return match.group(0)

    def replacer_html(match):
        url = match.group(1)
        new_url = get_redirect_url(url)
        if new_url and new_url != url:
            return f'href="{new_url}"'
        return match.group(0)

    # Markdown link: [text](https://...)
    pattern = re.compile(
        r'\['               # opening square bracket for the text
        r'([^]]+)'          # group 1: text
        r']'                # closing square bracket for the text
        r'\('               # opening parenthesis for the URL
        r'(https://[^)]+)'  # group 2: URL
        r'\)'               # closing parenthesis for the URL
    )
    new_content = pattern.sub(replacer, content)

    # Markdown link: reference-style [label]: https://...
    pattern = re.compile(
        r'\['               # opening square bracket for the ref
        r'([^]]+)'          # group 1: ref
        r']: '              # closing square bracket for the ref
        r'(https://.*)'  # group 2: URL
    )
    new_content = pattern.sub(replacer_ref, new_content)

    # Markdown link: html
    pattern = re.compile(
        r'href="'
        r'(https://[^"]+)'
        r'"'
    )
    new_content = pattern.sub(replacer_html, new_content)

    if new_content != content:
        with open(filepath, 'w', encoding='utf-8') as f:
            f.write(new_content)
        print(f'Updated: {filepath}')

def get_redirect_url(url):
    if url.startswith('https://cloud-native.slack.com/archives/'):
        # keep these short links as they are
        return None
    try:
        resp = requests.head(url, allow_redirects=True, timeout=5)
        if resp.history and resp.status_code == 200:
            for r in resp.history:
                if r.status_code == 301 or r.status_code == 302:
                    if resp.url.startswith('https://en.wikipedia.org'):
                        return resp.url.replace('https://en.wikipedia.org', 'https://wikipedia.org')
                    if resp.url.startswith('https://github.com/login?return_to') or resp.url.startswith('https://accounts.google.com/v3/signin/'):
                        # this link requires authentication, so we can't do anything with it
                        return None
                    if resp.url.startswith('http://arxiv.org'):
                        return resp.url.replace('http://arxiv.org', 'https://arxiv.org')
                    if resp.url.startswith('https://pkg.go.dev/'):
                        # no need for this query parameter
                        return re.sub(r'\?utm_source=godoc(?=#|$)', '', resp.url)
                    return resp.url
    except Exception:
        pass
    return None


filepaths = []

for dirpath, _, filenames in os.walk('.'):
    if 'node_modules' in dirpath.split(os.sep):
        continue
    for filename in filenames:
        if filename == 'CHANGELOG.md':
            continue
        if filename.endswith('.md'):
            filepaths.append(os.path.join(dirpath, filename))
            # update_links_in_file(os.path.join(dirpath, filename))

with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(update_links_in_file, filepaths)
```
</details>

---------

Co-authored-by: Carlos Alberto Cortez <calberto.cortez@gmail.com>
github-merge-queue Bot pushed a commit that referenced this pull request Jun 20, 2025
Resolves @chalin's #4554 "how to avoid this in the future"

---------

Co-authored-by: Patrice Chalin <chalin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants