๐ Describe the bug
As described in RFC 3986 ยง 5.2.4. Remove Dot Segments, the remove_dot_segments algorithm removes any extra /../ parts of the URL, ignoring errors when the stack is empty.
However, yarl.URL() behavior at the moment is to raise an exception, ValueError, when that happens.
๐ก To Reproduce
Instantiate URL class with such a URL string:
yarl.URL('https://example.com/alice/../../eve')
๐ก Expected behavior
Follow the RFC, and ignore the error, to get it working like this:
In [16]: yarl.URL('https://example.com/alice/../eve')
Out[16]: URL('https://example.com/eve')
In [17]: yarl.URL('https://example.com/alice/../../eve')
Out[17]: URL('https://example.com/eve')
In [18]: yarl.URL('https://example.com/alice/../../../eve')
Out[18]: URL('https://example.com/eve')
We probably want to also test (and fix, if needed), any other parts of the API that would involve path resolution (including remove_dot_segments) steps.
๐ Logs/tracebacks
In [15]: yarl.URL('https://example.com/alice/../../eve')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-e72c86c9e132> in <module>()
----> 1 yarl.URL('https://example.com/alice/../../eve')
~/ans/venv/lib/python3.6/site-packages/yarl/_url.py in __new__(cls, val, encoded, strict)
181 path = cls._normalize_path(path)
182
--> 183 cls._validate_authority_uri_abs_path(host=host, path=path)
184 query = cls._QUERY_REQUOTER(val[3])
185 fragment = cls._FRAGMENT_REQUOTER(val[4])
~/ans/venv/lib/python3.6/site-packages/yarl/_url.py in _validate_authority_uri_abs_path(host, path)
675 if len(host) > 0 and len(path) > 0 and not path.startswith("/"):
676 raise ValueError(
--> 677 "Path in a URL with authority should start with a slash ('/') if set"
678 )
679
ValueError: Path in a URL with authority should start with a slash ('/') if set
๐ Your version of the Python
$ python --version
Python 3.6.10
๐ Your version of the aiohttp/yarl/multidict distributions
$ python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
$ python -m pip show multidict
Name: multidict
Version: 4.7.6
$ python -m pip show yarl
Name: yarl
Version: 1.6.2
๐ Additional context
The proposed behavior would put yarl.URL on par with DOM's URL object, which already follows the RFC on this:
> String(new URL("https://example.com/alice/../eve"))
< "https://example.com/eve"
> String(new URL("https://example.com/alice/../../eve"))
< "https://example.com/eve"
> String(new URL("https://example.com/alice/../../../eve"))
< "https://example.com/eve"
As well as rust-url library (from the unit tests):
{
"input": "http://example.com/foo/bar/../ton/../../a",
// ...
"pathname": "/a",
// ...
},
{
"input": "http://example.com/foo/../../..",
// ...
"pathname": "/",
// ...
},
{
"input": "http://example.com/foo/../../../ton",
// ...
"pathname": "/ton",
// ...
},
which has tests based on (warning: broken links):
"# Based on http://trac.webkit.org/browser/trunk/LayoutTests/fast/url/script-tests/segments.js",
"# AS OF https://github.com/jsdom/whatwg-url/commit/35f04dfd3048cf6362f4398745bb13375c5020c2",
๐ Describe the bug
As described in RFC 3986 ยง 5.2.4. Remove Dot Segments, the
remove_dot_segmentsalgorithm removes any extra/../parts of the URL, ignoring errors when the stack is empty.However,
yarl.URL()behavior at the moment is to raise an exception,ValueError, when that happens.๐ก To Reproduce
Instantiate URL class with such a URL string:
๐ก Expected behavior
Follow the RFC, and ignore the error, to get it working like this:
We probably want to also test (and fix, if needed), any other parts of the API that would involve path resolution (including
remove_dot_segments) steps.๐ Logs/tracebacks
In [15]: yarl.URL('https://example.com/alice/../../eve') --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-15-e72c86c9e132> in <module>() ----> 1 yarl.URL('https://example.com/alice/../../eve') ~/ans/venv/lib/python3.6/site-packages/yarl/_url.py in __new__(cls, val, encoded, strict) 181 path = cls._normalize_path(path) 182 --> 183 cls._validate_authority_uri_abs_path(host=host, path=path) 184 query = cls._QUERY_REQUOTER(val[3]) 185 fragment = cls._FRAGMENT_REQUOTER(val[4]) ~/ans/venv/lib/python3.6/site-packages/yarl/_url.py in _validate_authority_uri_abs_path(host, path) 675 if len(host) > 0 and len(path) > 0 and not path.startswith("/"): 676 raise ValueError( --> 677 "Path in a URL with authority should start with a slash ('/') if set" 678 ) 679 ValueError: Path in a URL with authority should start with a slash ('/') if set๐ Your version of the Python
๐ Your version of the aiohttp/yarl/multidict distributions
๐ Additional context
The proposed behavior would put
yarl.URLon par with DOM'sURLobject, which already follows the RFC on this:As well as
rust-urllibrary (from the unit tests):which has tests based on (warning: broken links):