New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CVE-2019-9636: urlsplit does not handle NFKC normalization #80397
Comments
URLs encoded with Punycode/IDNA use NFKC normalization to decompose characters 1. This can result in some characters introducing new segments into a URL. For example, \uFF03 is not equal to '#' under direct comparison, but normalizes to '#' which changes the fragment part of the URL. Similarly \u2100 normalizes to 'a/c' which introduces a path segment. Currently, urlsplit() does not normalize, which may result in it returning a different netloc from what a browser would >>> u = "https://example.com\uFF03@bing.com"
>>> urlsplit(u).netloc.rpartition("@")[2]
bing.com
>>> # Simulate
>>> u = "https://example.com\uFF03@bing.com".encode("idna").decode("ascii")
>>> urlsplit(u).netloc.rpartition("@")[2]
example.com (Note that .netloc includes user/pass and .rpartition("@") is often used to remove it.) This may be used to steal cookies or authentication data from applications that use the netloc to cache or retrieve this information. The preferred fix for the urllib module is to detect and raise ValueError if NFKC-normalization of the netloc introduce any of '/?#@:'. Applications that want to avoid this error should perform their own decomposition using unicodedata or transcode to ASCII via IDNA. >>> # New behavior
>>> u = "https://example.com\uFF03@bing.com"
>>> urlsplit(u)
...
ValueError: netloc 'example.com#@bing.com' contains invalid characters under NFKC normalization
>>> # Workaround 1
>>> u2 = unicodedata.normalize("NFKC", u)
>>> urlsplit(u2)
SplitResult(scheme='https', netloc='example.com', path='', query='', fragment='@bing.com')
>>> # Workaround 2
>>> u3 = u.encode("idna").decode("ascii")
>>> urlsplit(u3)
SplitResult(scheme='https', netloc='example.com', path='', query='', fragment='@bing.com') Note that we do not address other characters, such as those that convert into period. The error is only raised for changes that affect how urlsplit() locates the netloc and the very common next step of removing credentials from the netloc. This vulnerability was reported by Jonathan Birch of Microsoft Corporation and Panayiotis Panayiotou (p.panayiotou2@gmail.com) via the Python Security Response Team. A CVE number has been requested. |
FYI I added this vulnerability to: |
This issue is now assigned CVE-2019-9636 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-9636 |
A missed print statement in the 2.7 patch is causing the tests to fail. Line 647 of Lib/test/test_urlparse.py |
How does that cause tests to fail? Is it going to stderr? Or just causing an error. |
It is causing an "unexpected output error". When the test is re-run at the |
Sample buildbot log of print statement in testcase causing rerun of test : https://buildbot.python.org/all/#/builders/101/builds/364/steps/4/logs/stdio |
Thanks. Just posted a PR that puts the print behind a verbose flag (on Python 3 it uses subtest so that we see which character caused the failure). If that doesn't work for whatever reason, I'll just leave it out and hope that we never have to debug it on Python 2 :) |
Commit e37ef41 introduced a regression on AMD64 Windows8.1 Refleaks 2.7: test_urlparse leaked [114, 114, 114] references, sum=342 https://buildbot.python.org/all/#/builders/33/builds/532 It is fixed by PR 12291. You can test: python -m test -R 3:3 test_urlparse |
If it's fixed by not printing to the console, then it must be a refleak in print. Or perhaps in the test failure/repeat. That PR is going to be merged as soon as AppVeyor finishes, so nothing to worry about here. |
Hah, I typo'd the commit message and marked it as 3.7 instead of 2.7. Oh well. |
I am trying to investigate the impact of this bug on Python 2.6 (yes, it is for SLE), and I have hard to replicate the steps in the description even on 2.7: ~$ ipython2 IPython 5.8.0 -- An enhanced Interactive Python. In [1]: from urlparse import urlsplit In [2]: u = "https://example.com\\uFF03@bing.com".encode("idna").decode("ascii") In [3]: u In [4]: urlsplit(u).netloc.rpartition('@')[2] In [5]: u = "https://example.com\\uFF03@bing.com" In [6]: urlsplit(u).netloc.rpartition('@')[2] In [7]: u = u.encode("idna").decode("ascii") In [8]: urlsplit(u).netloc.rpartition('@')[2] In [9]: import unicodedata In [10]: u2 = unicodedata.normalize('NFKC', u) In [11]: u2 In [12]: urlsplit(u2) In [13]: Yes, the results are weird, and most likely they would break any software relying on them, but I am not sure that it is a security issue. vstinner ? steve.dower ? What do you think? |
You need a "u" prefix on some of your strings or they're probably being immediately decomposed. The result of urlsplit should be unicode on Python 2 for a Unicode input, and yours are not. |
You are right. Thank you. |
Fix meta (not incl 2.7 which is no longer available to select). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: