This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [Security] urllib and anti-slash (\) in the hostname
Type: security Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: rschiron, shihai1991, vstinner
Priority: normal Keywords:

Created on 2020-04-20 14:43 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (5)
msg366832 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-20 14:43
David Schütz reported the following urllib vulnerability to the PSRT at 2020-03-29.

He wrote an article about a similar vulnerability in Closure (Javascript):
https://bugs.xdavidhu.me/google/2020/03/08/the-unexpected-google-wide-domain-check-bypass/

David was able to bypass a wildcard domain check in Closure by using the "\" character in the URL like this:

  https://xdavidhu.me\test.corp.google.com

Example in Python:

>>> from urllib.parse import urlparse
>>> urlparse("https://xdavidhu.me\\test.corp.google.com")
ParseResult(scheme='https', netloc='xdavidhu.me\\test.corp.google.com', path='', params='', query='', fragment='')

urlparse() currently accepts "\" in the netloc.

This could present issues if server-side checks are used by applications to validate a URLs authority.

The problem emerges from the fact that the RFC and the WHATWG specifications differ, and the RFC does not mention the "\":

* RFC: https://tools.ietf.org/html/rfc3986#appendix-B
* WHATWG: https://url.spec.whatwg.org/#relative-state

This specification difference might cause issues, since David do understand that the parser is implemented by the RFC, but the WHATWG spec is what the browsers are using, who will mainly be the ones opening the URL.
msg366833 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-20 14:46
(The first message is basically David's email rephrased. Here is my reply ;-))

> This could present issues if server-side checks are used by applications to validate a URLs authority.

Which kind of application would be affected by this vulnerability?

It's unclear to me if urllib should be modified to explicitly reject \ in netloc, or if only third-party code should pay attention to this corner case (potential vulnerability).


The urllib module has _parse_proxy() and HTTPPasswordMgr.reduce_uri() code which use an "authority" variable.

Example:
---
from urllib.parse import urlsplit, _splitport, _splittype, _splituser,
_splitpasswd

def _parse_proxy(proxy):
    """Return (scheme, user, password, host/port) given a URL or an authority.

    If a URL is supplied, it must have an authority (host:port) component.
    According to RFC 3986, having an authority component means the URL must
    have two slashes after the scheme.
    """
    scheme, r_scheme = _splittype(proxy)
    if not r_scheme.startswith("/"):
        # authority
        scheme = None
        authority = proxy
    else:
        # URL
        if not r_scheme.startswith("//"):
            raise ValueError("proxy URL with no authority: %r" % proxy)
        # We have an authority, so for RFC 3986-compliant URLs (by ss 3.
        # and 3.3.), path is empty or starts with '/'
        end = r_scheme.find("/", 2)
        if end == -1:
            end = None
        authority = r_scheme[2:end]
    userinfo, hostport = _splituser(authority)
    if userinfo is not None:
        user, password = _splitpasswd(userinfo)
    else:
        user = password = None
    return scheme, user, password, hostport


def reduce_uri(uri, default_port=True):
    """Accept authority or URI and extract only the authority and path."""
    # note HTTP URLs do not have a userinfo component
    parts = urlsplit(uri)
    if parts[1]:
        # URI
        scheme = parts[0]
        authority = parts[1]
        path = parts[2] or '/'
    else:
        # host or host:port
        scheme = None
        authority = uri
        path = '/'
    host, port = _splitport(authority)
    if default_port and port is None and scheme is not None:
        dport = {"http": 80,
                 "https": 443,
                 }.get(scheme)
        if dport is not None:
            authority = "%s:%d" % (host, dport)
    return authority, path

def test(uri):
    print(f"{uri} => reduce_uri: {reduce_uri(uri)}")
    print(f"{uri} => _parse_proxy: {_parse_proxy(uri)}")

test(r"https://www.example.com")
test(r"https://user@www.example.com")
test(r"https://xdavidhu.me\test.corp.google.com")
test(r"https://user:password@xdavidhu.me\test.corp.google.com")
---

Output on Python 3.9:
---
https://www.example.com => reduce_uri: ('www.example.com:443', '/')
https://www.example.com => _parse_proxy: ('https', None, None,
'www.example.com')
https://user@www.example.com => reduce_uri: ('user@www.example.com:443', '/')
https://user@www.example.com => _parse_proxy: ('https', 'user', None,
'www.example.com')
https://xdavidhu.me\test.corp.google.com => reduce_uri:
('xdavidhu.me\\test.corp.google.com:443', '/')
https://xdavidhu.me\test.corp.google.com => _parse_proxy: ('https',
None, None, 'xdavidhu.me\\test.corp.google.com')
https://user:password@xdavidhu.me\test.corp.google.com => reduce_uri:
('user:password@xdavidhu.me\\test.corp.google.com:443', '/')
https://user:password@xdavidhu.me\test.corp.google.com =>
_parse_proxy: ('https', 'user', 'password',
'xdavidhu.me\\test.corp.google.com')
---

It seems to behave as expected, no?
msg367008 - (view) Author: Hai Shi (shihai1991) * (Python triager) Date: 2020-04-22 15:06
>It seems to behave as expected
+1. This is an interesting test;)
msg367409 - (view) Author: Riccardo Schirone (rschiron) Date: 2020-04-27 10:48
I agree I don't see a clear vulnerability here.
msg367417 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-27 13:23
We consider that the stdlib is not vulnerable, so I close the issue.

Feel free to report vulnerabilities to third party projects which are vulnerable.

Thanks for the report anyway David Schütz!
History
Date User Action Args
2022-04-11 14:59:29adminsetgithub: 84518
2020-04-27 13:23:13vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg367417

stage: resolved
2020-04-27 10:48:59rschironsetnosy: + rschiron
messages: + msg367409
2020-04-22 15:06:12shihai1991setmessages: + msg367008
2020-04-20 15:31:02shihai1991setnosy: + shihai1991
2020-04-20 14:46:13vstinnersetmessages: + msg366833
2020-04-20 14:43:38vstinnercreate