Issue 40338: [Security] urllib and anti-slash (\) in the hostname

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/84518

classification

Title:	[Security] urllib and anti-slash (\) in the hostname
Type:	security	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.9

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	rschiron, shihai1991, vstinner
Priority:	normal	Keywords:

Created on 2020-04-20 14:43 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (5)
msg366832 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-04-20 14:43
David Schütz reported the following urllib vulnerability to the PSRT at 2020-03-29. He wrote an article about a similar vulnerability in Closure (Javascript): https://bugs.xdavidhu.me/google/2020/03/08/the-unexpected-google-wide-domain-check-bypass/ David was able to bypass a wildcard domain check in Closure by using the "\" character in the URL like this: https://xdavidhu.me\test.corp.google.com Example in Python: >>> from urllib.parse import urlparse >>> urlparse("https://xdavidhu.me\\test.corp.google.com") ParseResult(scheme='https', netloc='xdavidhu.me\\test.corp.google.com', path='', params='', query='', fragment='') urlparse() currently accepts "\" in the netloc. This could present issues if server-side checks are used by applications to validate a URLs authority. The problem emerges from the fact that the RFC and the WHATWG specifications differ, and the RFC does not mention the "\": * RFC: https://tools.ietf.org/html/rfc3986#appendix-B * WHATWG: https://url.spec.whatwg.org/#relative-state This specification difference might cause issues, since David do understand that the parser is implemented by the RFC, but the WHATWG spec is what the browsers are using, who will mainly be the ones opening the URL.
msg366833 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-04-20 14:46
(The first message is basically David's email rephrased. Here is my reply ;-)) > This could present issues if server-side checks are used by applications to validate a URLs authority. Which kind of application would be affected by this vulnerability? It's unclear to me if urllib should be modified to explicitly reject \ in netloc, or if only third-party code should pay attention to this corner case (potential vulnerability). The urllib module has _parse_proxy() and HTTPPasswordMgr.reduce_uri() code which use an "authority" variable. Example: --- from urllib.parse import urlsplit, _splitport, _splittype, _splituser, _splitpasswd def _parse_proxy(proxy): """Return (scheme, user, password, host/port) given a URL or an authority. If a URL is supplied, it must have an authority (host:port) component. According to RFC 3986, having an authority component means the URL must have two slashes after the scheme. """ scheme, r_scheme = _splittype(proxy) if not r_scheme.startswith("/"): # authority scheme = None authority = proxy else: # URL if not r_scheme.startswith("//"): raise ValueError("proxy URL with no authority: %r" % proxy) # We have an authority, so for RFC 3986-compliant URLs (by ss 3. # and 3.3.), path is empty or starts with '/' end = r_scheme.find("/", 2) if end == -1: end = None authority = r_scheme[2:end] userinfo, hostport = _splituser(authority) if userinfo is not None: user, password = _splitpasswd(userinfo) else: user = password = None return scheme, user, password, hostport def reduce_uri(uri, default_port=True): """Accept authority or URI and extract only the authority and path.""" # note HTTP URLs do not have a userinfo component parts = urlsplit(uri) if parts[1]: # URI scheme = parts[0] authority = parts[1] path = parts[2] or '/' else: # host or host:port scheme = None authority = uri path = '/' host, port = _splitport(authority) if default_port and port is None and scheme is not None: dport = {"http": 80, "https": 443, }.get(scheme) if dport is not None: authority = "%s:%d" % (host, dport) return authority, path def test(uri): print(f"{uri} => reduce_uri: {reduce_uri(uri)}") print(f"{uri} => _parse_proxy: {_parse_proxy(uri)}") test(r"https://www.example.com") test(r"https://user@www.example.com") test(r"https://xdavidhu.me\test.corp.google.com") test(r"https://user:password@xdavidhu.me\test.corp.google.com") --- Output on Python 3.9: --- https://www.example.com => reduce_uri: ('www.example.com:443', '/') https://www.example.com => _parse_proxy: ('https', None, None, 'www.example.com') https://user@www.example.com => reduce_uri: ('user@www.example.com:443', '/') https://user@www.example.com => _parse_proxy: ('https', 'user', None, 'www.example.com') https://xdavidhu.me\test.corp.google.com => reduce_uri: ('xdavidhu.me\\test.corp.google.com:443', '/') https://xdavidhu.me\test.corp.google.com => _parse_proxy: ('https', None, None, 'xdavidhu.me\\test.corp.google.com') https://user:password@xdavidhu.me\test.corp.google.com => reduce_uri: ('user:password@xdavidhu.me\\test.corp.google.com:443', '/') https://user:password@xdavidhu.me\test.corp.google.com => _parse_proxy: ('https', 'user', 'password', 'xdavidhu.me\\test.corp.google.com') --- It seems to behave as expected, no?
msg367008 - (view)	Author: Hai Shi (shihai1991) *	Date: 2020-04-22 15:06
>It seems to behave as expected +1. This is an interesting test;)
msg367409 - (view)	Author: Riccardo Schirone (rschiron)	Date: 2020-04-27 10:48
I agree I don't see a clear vulnerability here.
msg367417 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-04-27 13:23
We consider that the stdlib is not vulnerable, so I close the issue. Feel free to report vulnerabilities to third party projects which are vulnerable. Thanks for the report anyway David Schütz!

History
Date	User	Action	Args
2022-04-11 14:59:29	admin	set	github: 84518
2020-04-27 13:23:13	vstinner	set	status: open -> closed resolution: not a bug messages: + msg367417 stage: resolved
2020-04-27 10:48:59	rschiron	set	nosy: + rschiron messages: + msg367409
2020-04-22 15:06:12	shihai1991	set	messages: + msg367008
2020-04-20 15:31:02	shihai1991	set	nosy: + shihai1991
2020-04-20 14:46:13	vstinner	set	messages: + msg366833
2020-04-20 14:43:38	vstinner	create