Message 366832 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	vstinner
Date	2020-04-20.14:43:38
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1587393818.69.0.0669874545699.issue40338@roundup.psfhosted.org>
In-reply-to

Content
David Schütz reported the following urllib vulnerability to the PSRT at 2020-03-29. He wrote an article about a similar vulnerability in Closure (Javascript): https://bugs.xdavidhu.me/google/2020/03/08/the-unexpected-google-wide-domain-check-bypass/ David was able to bypass a wildcard domain check in Closure by using the "\" character in the URL like this: https://xdavidhu.me\test.corp.google.com Example in Python: >>> from urllib.parse import urlparse >>> urlparse("https://xdavidhu.me\\test.corp.google.com") ParseResult(scheme='https', netloc='xdavidhu.me\\test.corp.google.com', path='', params='', query='', fragment='') urlparse() currently accepts "\" in the netloc. This could present issues if server-side checks are used by applications to validate a URLs authority. The problem emerges from the fact that the RFC and the WHATWG specifications differ, and the RFC does not mention the "\": * RFC: https://tools.ietf.org/html/rfc3986#appendix-B * WHATWG: https://url.spec.whatwg.org/#relative-state This specification difference might cause issues, since David do understand that the parser is implemented by the RFC, but the WHATWG spec is what the browsers are using, who will mainly be the ones opening the URL.

David Schütz reported the following urllib vulnerability to the PSRT at 2020-03-29.

He wrote an article about a similar vulnerability in Closure (Javascript):
https://bugs.xdavidhu.me/google/2020/03/08/the-unexpected-google-wide-domain-check-bypass/

David was able to bypass a wildcard domain check in Closure by using the "\" character in the URL like this:

  https://xdavidhu.me\test.corp.google.com

Example in Python:

>>> from urllib.parse import urlparse
>>> urlparse("https://xdavidhu.me\\test.corp.google.com")
ParseResult(scheme='https', netloc='xdavidhu.me\\test.corp.google.com', path='', params='', query='', fragment='')

urlparse() currently accepts "\" in the netloc.

This could present issues if server-side checks are used by applications to validate a URLs authority.

The problem emerges from the fact that the RFC and the WHATWG specifications differ, and the RFC does not mention the "\":

* RFC: https://tools.ietf.org/html/rfc3986#appendix-B
* WHATWG: https://url.spec.whatwg.org/#relative-state

This specification difference might cause issues, since David do understand that the parser is implemented by the RFC, but the WHATWG spec is what the browsers are using, who will mainly be the ones opening the URL.

History
Date	User	Action	Args
2020-04-20 14:43:38	vstinner	set	recipients: + vstinner
2020-04-20 14:43:38	vstinner	set	messageid: <1587393818.69.0.0669874545699.issue40338@roundup.psfhosted.org>
2020-04-20 14:43:38	vstinner	link	issue40338 messages
2020-04-20 14:43:38	vstinner	create