Fix IPv6 Zone ID decoding to correctly handle RFC 6874 %25 separator#1653
Open
rodrigobnogueira wants to merge 1 commit intoaio-libs:masterfrom
Open
Fix IPv6 Zone ID decoding to correctly handle RFC 6874 %25 separator#1653rodrigobnogueira wants to merge 1 commit intoaio-libs:masterfrom
rodrigobnogueira wants to merge 1 commit intoaio-libs:masterfrom
Conversation
Per RFC 6874, an IPv6 Zone ID in a URI is encoded as: IPv6addrz = IPv6address "%25" ZoneID So in 'http://[fe80::1%251]/', the zone ID is '1', not '251'. Previously, _encode_host() split the host on bare '%', treating '251' as the zone ID. The host property also returned the raw (encoded) value unchanged for IP addresses, so %25 was never decoded. Fix _encode_host() to partition on '%25' (RFC 6874 separator) when present, preserving it verbatim in raw_host / str(url), and update the host property to decode '%25' -> '%' so callers receive the human-readable zone identifier (e.g. 'fe80::1%1' / 'fe80::1%eth0'). Tests added for: - Numeric zone ID: http://[fe80::1%251]/ -> host='fe80::1%1' - String zone ID: http://[fe80::1%25eth0]/ -> host='fe80::1%eth0'
a9bd0c5 to
97ac79d
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project check has failed because the head coverage (97.63%) is below the target coverage (100.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## master #1653 +/- ##
=======================================
Coverage 99.47% 99.47%
=======================================
Files 30 30
Lines 5942 5952 +10
Branches 283 285 +2
=======================================
+ Hits 5911 5921 +10
Misses 22 22
Partials 9 9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Member
Author
|
The codecov/project/typing check is failing at 97.63% against a 100% target, but this is a pre-existing issue unrelated to this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What do these changes do?
Fixes incorrect decoding of IPv6 Zone IDs in URLs containing the RFC 6874
%25-encoded zone separator.Background
RFC 6874 defines the format for IPv6 Zone IDs in URIs:
So in
http://[fe80::1%251]/, the zone ID is1(not251), because%25is the percent-encoding of%.The bug
Two issues in
yarl/_url.py:_encode_host()split the host on bare%, treating the raw-encoded zone string251as the zone ID instead of recognising%25as the delimiter..hostproperty returned the raw (percent-encoded) value unchanged for IP addresses, so%25was never decoded to%for the caller.The fix
_encode_host(): when%25is present in the host string, partition on%25(the RFC 6874 separator) instead of bare%. The raw form (raw_host/str(url)) is preserved unchanged..hostproperty: for IP addresses that contain%25, replace%25with%before returning, so callers receive the human-readable zone identifier.Before / After
.raw_host.host(before).host(after)http://[fe80::1%251]/fe80::1%251fe80::1%251❌fe80::1%1✅http://[fe80::1%25eth0]/fe80::1%25eth0fe80::1%25eth0❌fe80::1%eth0✅Related
This was identified as part of a security report about URL parsing inconsistencies. While this specific bug is not a practical SSRF vector (zone IDs are local-scope only), the incorrect decoding is a standard compliance issue that could cause parser disagreements with other RFC 6874-aware parsers.