Message 337336 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steve.dower
Recipients	benjamin.peterson, ezio.melotti, larry, ned.deily, steve.dower, vstinner
Date	2019-03-06.17:37:20
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1551893840.49.0.433864450493.issue36216@roundup.psfhosted.org>
In-reply-to

Content
URLs encoded with Punycode/IDNA use NFKC normalization to decompose characters [1]. This can result in some characters introducing new segments into a URL. For example, \uFF03 is not equal to '#' under direct comparison, but normalizes to '#' which changes the fragment part of the URL. Similarly \u2100 normalizes to 'a/c' which introduces a path segment. Currently, urlsplit() does not normalize, which may result in it returning a different netloc from what a browser would >>> u = "https://example.com\uFF03@bing.com" >>> urlsplit(u).netloc.rpartition("@")[2] bing.com >>> # Simulate >>> u = "https://example.com\uFF03@bing.com".encode("idna").decode("ascii") >>> urlsplit(u).netloc.rpartition("@")[2] example.com (Note that .netloc includes user/pass and .rpartition("@") is often used to remove it.) This may be used to steal cookies or authentication data from applications that use the netloc to cache or retrieve this information. The preferred fix for the urllib module is to detect and raise ValueError if NFKC-normalization of the netloc introduce any of '/?#@:'. Applications that want to avoid this error should perform their own decomposition using unicodedata or transcode to ASCII via IDNA. >>> # New behavior >>> u = "https://example.com\uFF03@bing.com" >>> urlsplit(u) ... ValueError: netloc 'example.com#@bing.com' contains invalid characters under NFKC normalization >>> # Workaround 1 >>> u2 = unicodedata.normalize("NFKC", u) >>> urlsplit(u2) SplitResult(scheme='https', netloc='example.com', path='', query='', fragment='@bing.com') >>> # Workaround 2 >>> u3 = u.encode("idna").decode("ascii") >>> urlsplit(u3) SplitResult(scheme='https', netloc='example.com', path='', query='', fragment='@bing.com') Note that we do not address other characters, such as those that convert into period. The error is only raised for changes that affect how urlsplit() locates the netloc and the very common next step of removing credentials from the netloc. This vulnerability was reported by Jonathan Birch of Microsoft Corporation and Panayiotis Panayiotou (p.panayiotou2@gmail.com) via the Python Security Response Team. A CVE number has been requested. [1]: https://unicode.org/reports/tr46/

URLs encoded with Punycode/IDNA use NFKC normalization to decompose characters [1]. This can result in some characters introducing new segments into a URL.

For example, \uFF03 is not equal to '#' under direct comparison, but normalizes to '#' which changes the fragment part of the URL. Similarly \u2100 normalizes to 'a/c' which introduces a path segment.

Currently, urlsplit() does not normalize, which may result in it returning a different netloc from what a browser would

>>> u = "https://example.com\uFF03@bing.com"
>>> urlsplit(u).netloc.rpartition("@")[2]
bing.com

>>> # Simulate
>>> u = "https://example.com\uFF03@bing.com".encode("idna").decode("ascii")
>>> urlsplit(u).netloc.rpartition("@")[2]
example.com

(Note that .netloc includes user/pass and .rpartition("@") is often used to remove it.)

This may be used to steal cookies or authentication data from applications that use the netloc to cache or retrieve this information.

The preferred fix for the urllib module is to detect and raise ValueError if NFKC-normalization of the netloc introduce any of '/?#@:'. Applications that want to avoid this error should perform their own decomposition using unicodedata or transcode to ASCII via IDNA.

>>> # New behavior
>>> u = "https://example.com\uFF03@bing.com"
>>> urlsplit(u)
...
ValueError: netloc 'example.com#@bing.com' contains invalid characters under NFKC normalization

>>> # Workaround 1
>>> u2 = unicodedata.normalize("NFKC", u)
>>> urlsplit(u2)
SplitResult(scheme='https', netloc='example.com', path='', query='', fragment='@bing.com')

>>> # Workaround 2
>>> u3 = u.encode("idna").decode("ascii")
>>> urlsplit(u3)
SplitResult(scheme='https', netloc='example.com', path='', query='', fragment='@bing.com')

Note that we do not address other characters, such as those that convert into period. The error is only raised for changes that affect how urlsplit() locates the netloc and the very common next step of removing credentials from the netloc.

This vulnerability was reported by Jonathan Birch of Microsoft Corporation and Panayiotis Panayiotou (p.panayiotou2@gmail.com) via the Python Security Response Team. A CVE number has been requested.

[1]: https://unicode.org/reports/tr46/

History
Date	User	Action	Args
2019-03-06 17:37:20	steve.dower	set	recipients: + steve.dower, vstinner, larry, benjamin.peterson, ned.deily, ezio.melotti
2019-03-06 17:37:20	steve.dower	set	messageid: <1551893840.49.0.433864450493.issue36216@roundup.psfhosted.org>
2019-03-06 17:37:20	steve.dower	link	issue36216 messages
2019-03-06 17:37:20	steve.dower	create