Skip to content

ValueError on path remove_dot_segments when there's extra dot-dot (/../) segmentsย #536

@besfahbod

Description

@besfahbod

๐Ÿž Describe the bug
As described in RFC 3986 ยง 5.2.4. Remove Dot Segments, the remove_dot_segments algorithm removes any extra /../ parts of the URL, ignoring errors when the stack is empty.

However, yarl.URL() behavior at the moment is to raise an exception, ValueError, when that happens.

๐Ÿ’ก To Reproduce
Instantiate URL class with such a URL string:

yarl.URL('https://example.com/alice/../../eve')

๐Ÿ’ก Expected behavior
Follow the RFC, and ignore the error, to get it working like this:

In [16]: yarl.URL('https://example.com/alice/../eve')
Out[16]: URL('https://example.com/eve')

In [17]: yarl.URL('https://example.com/alice/../../eve')
Out[17]: URL('https://example.com/eve')

In [18]: yarl.URL('https://example.com/alice/../../../eve')
Out[18]: URL('https://example.com/eve')

We probably want to also test (and fix, if needed), any other parts of the API that would involve path resolution (including remove_dot_segments) steps.

๐Ÿ“‹ Logs/tracebacks

In [15]: yarl.URL('https://example.com/alice/../../eve')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-e72c86c9e132> in <module>()
----> 1 yarl.URL('https://example.com/alice/../../eve')

~/ans/venv/lib/python3.6/site-packages/yarl/_url.py in __new__(cls, val, encoded, strict)
    181                 path = cls._normalize_path(path)
    182
--> 183             cls._validate_authority_uri_abs_path(host=host, path=path)
    184             query = cls._QUERY_REQUOTER(val[3])
    185             fragment = cls._FRAGMENT_REQUOTER(val[4])

~/ans/venv/lib/python3.6/site-packages/yarl/_url.py in _validate_authority_uri_abs_path(host, path)
    675         if len(host) > 0 and len(path) > 0 and not path.startswith("/"):
    676             raise ValueError(
--> 677                 "Path in a URL with authority should start with a slash ('/') if set"
    678             )
    679

ValueError: Path in a URL with authority should start with a slash ('/') if set

๐Ÿ“‹ Your version of the Python

$ python --version
Python 3.6.10

๐Ÿ“‹ Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
$ python -m pip show multidict
Name: multidict
Version: 4.7.6
$ python -m pip show yarl
Name: yarl
Version: 1.6.2

๐Ÿ“‹ Additional context
The proposed behavior would put yarl.URL on par with DOM's URL object, which already follows the RFC on this:

> String(new URL("https://example.com/alice/../eve"))
< "https://example.com/eve"
> String(new URL("https://example.com/alice/../../eve"))
< "https://example.com/eve"
> String(new URL("https://example.com/alice/../../../eve"))
< "https://example.com/eve"

As well as rust-url library (from the unit tests):

  {
    "input": "http://example.com/foo/bar/../ton/../../a",
    // ...
    "pathname": "/a",
    // ...
  },
  {
    "input": "http://example.com/foo/../../..",
    // ...
    "pathname": "/",
    // ...
  },
  {
    "input": "http://example.com/foo/../../../ton",
    // ...
    "pathname": "/ton",
    // ...
  },

which has tests based on (warning: broken links):

  "# Based on http://trac.webkit.org/browser/trunk/LayoutTests/fast/url/script-tests/segments.js",
  "# AS OF https://github.com/jsdom/whatwg-url/commit/35f04dfd3048cf6362f4398745bb13375c5020c2",

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      โšก