This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients vstinner
Date 2020-04-20.14:43:38
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1587393818.69.0.0669874545699.issue40338@roundup.psfhosted.org>
In-reply-to
Content
David Schütz reported the following urllib vulnerability to the PSRT at 2020-03-29.

He wrote an article about a similar vulnerability in Closure (Javascript):
https://bugs.xdavidhu.me/google/2020/03/08/the-unexpected-google-wide-domain-check-bypass/

David was able to bypass a wildcard domain check in Closure by using the "\" character in the URL like this:

  https://xdavidhu.me\test.corp.google.com

Example in Python:

>>> from urllib.parse import urlparse
>>> urlparse("https://xdavidhu.me\\test.corp.google.com")
ParseResult(scheme='https', netloc='xdavidhu.me\\test.corp.google.com', path='', params='', query='', fragment='')

urlparse() currently accepts "\" in the netloc.

This could present issues if server-side checks are used by applications to validate a URLs authority.

The problem emerges from the fact that the RFC and the WHATWG specifications differ, and the RFC does not mention the "\":

* RFC: https://tools.ietf.org/html/rfc3986#appendix-B
* WHATWG: https://url.spec.whatwg.org/#relative-state

This specification difference might cause issues, since David do understand that the parser is implemented by the RFC, but the WHATWG spec is what the browsers are using, who will mainly be the ones opening the URL.
History
Date User Action Args
2020-04-20 14:43:38vstinnersetrecipients: + vstinner
2020-04-20 14:43:38vstinnersetmessageid: <1587393818.69.0.0669874545699.issue40338@roundup.psfhosted.org>
2020-04-20 14:43:38vstinnerlinkissue40338 messages
2020-04-20 14:43:38vstinnercreate