Author maggyero
Recipients Jeremy.Hylton, maggyero, orsenthil
Date 2019-08-28.14:54:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
The Python library documentation of the `urllib.parse.urlunparse <>`_ and `urllib.parse.urlunsplit <>`_ functions states:

    This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters (for example, a ? with an empty query; the RFC states that these are equivalent).

So with the <> URI::

    >>> import urllib.parse
    >>> urllib.parse.urlunparse(urllib.parse.urlparse(""))
    >>> urllib.parse.urlunsplit(urllib.parse.urlsplit(""))

But `RFC 3986 <>`_ states the exact opposite:

    Normalization should not remove delimiters when their associated component is empty unless licensed to do so by the scheme specification.  For example, the URI "" cannot be assumed to be equivalent to any of the examples above.  Likewise, the presence or absence of delimiters within a userinfo subcomponent is usually significant to its interpretation.  The fragment component is not subject to any scheme-based normalization; thus, two URIs that differ only by the suffix "#" are considered different regardless of the scheme.

So maybe `urllib.parse.urlunparse` ∘ `urllib.parse.urlparse` and `urllib.parse.urlunsplit` ∘ `urllib.parse.urlsplit` are not supposed to be used for `syntax-based normalization <>`_ of URIs. But still, both `urllib.parse.urlparse` or `urllib.parse.urlsplit` lose the "delimiter + empty component" information of the URI string, so they report false equivalent URIs::

    >>> import urllib.parse
    >>> urllib.parse.urlparse("") == urllib.parse.urlparse("")
    >>> urllib.parse.urlsplit("") == urllib.parse.urlsplit("")

P.-S. — Is there a syntax-based normalization function of URIs in the Python library?
Date User Action Args
2019-08-28 14:54:51maggyerosetrecipients: + maggyero, orsenthil, Jeremy.Hylton
2019-08-28 14:54:51maggyerosetmessageid: <>
2019-08-28 14:54:51maggyerolinkissue37969 messages
2019-08-28 14:54:51maggyerocreate