[security] CVE-2022-0391: urllib.parse should sanitize urls containing ASCII newline and tabs. #88048

orsenthil · 2021-04-18T19:37:00Z

BPO	43882
Nosy	@gpshead, @orsenthil, @vstinner, @ned-deily, @OddBloke, @ambv, @mgorny, @apollo13, @mlissner, @pablogsal, @miss-islington, @tirkarthi, @felixxm, @sethmlarson
PRs	bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. #25595 [3.9] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) #25725 [3.8] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) #25726 [3.7] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) #25727 [3.6] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) #25728 [3.9] bpo-43882 Remove the newline, and tab early. From query and fragments. #25853 bpo-43882 Remove the newline, and tab early. From query and fragments. #25921 [3.7] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. #25923 [3.6] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs #25924 [3.10] bpo-43882 Remove the newline, and tab early. From query and fragments. (GH-25921) #25936 [3.7] bpo-43882 - Mention urllib.parse changes in Whats New section for 3.7.11 #26267 [3.6] bpo-43882 - Mention urllib.parse changes in Whats New section for 3.6.14 #26268 [3.10] bpo-43882 - Mention urllib.parse changes in Whats new section. #26275 [3.9] bpo-43882 - Mention urllib.parse changes in Whats new section. #26276 [3.8] bpo-43882 - Mention urllib.parse changes in Whats new section. #26277

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/orsenthil'
closed_at = <Date 2021-06-02.01:26:09.260>
created_at = <Date 2021-04-18.19:37:00.259>
labels = ['type-security', '3.8', '3.9', '3.10', '3.11', '3.7', 'library']
title = '[security] CVE-2022-0391: urllib.parse should sanitize urls containing ASCII newline and tabs.'
updated_at = <Date 2022-02-09.11:40:17.243>
user = 'https://github.com/orsenthil'

bugs.python.org fields:

activity = <Date 2022-02-09.11:40:17.243>
actor = 'felixxm'
assignee = 'orsenthil'
closed = True
closed_date = <Date 2021-06-02.01:26:09.260>
closer = 'gregory.p.smith'
components = ['Library (Lib)']
creation = <Date 2021-04-18.19:37:00.259>
creator = 'orsenthil'
dependencies = []
files = []
hgrepos = []
issue_num = 43882
keywords = ['patch']
message_count = 47.0
messages = ['391343', '391352', '391426', '391859', '392334', '392338', '392611', '392781', '392808', '392835', '392873', '392926', '392944', '392971', '392995', '393009', '393025', '393030', '393033', '393034', '393039', '393049', '393107', '393108', '393136', '393139', '393142', '393144', '393146', '393149', '393150', '393198', '393203', '393205', '393207', '393211', '393997', '394056', '394057', '394058', '394062', '394112', '394113', '396628', '412688', '412705', '412821']
nosy_count = 14.0
nosy_names = ['gregory.p.smith', 'orsenthil', 'vstinner', 'ned.deily', 'odd_bloke', 'lukasz.langa', 'mgorny', 'apollo13', 'Mike.Lissner', 'pablogsal', 'miss-islington', 'xtreak', 'felixxm', 'sethmlarson']
pr_nums = ['25595', '25725', '25726', '25727', '25728', '25853', '25921', '25923', '25924', '25936', '26267', '26268', '26275', '26276', '26277']
priority = 'high'
resolution = 'fixed'
stage = 'commit review'
status = 'closed'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue43882'
versions = ['Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10', 'Python 3.11']

orsenthil · 2021-04-18T19:36:58Z

A security issue was reported by Mike Lissner wherein an attacker was able to use \r\n in the url path, the urlparse method didn't sanitize and allowed those characters be present in the request.

In [9]: from urllib.parse import urlsplit
In [10]: urlsplit("java\nscript:alert('bad')")
Out[10]: SplitResult(scheme='', netloc='', path="java\nscript:alert('bad')", query='', fragment='')

Firefox and other browsers ignore newlines in the scheme. From
the browser console:

> new URL("java\nscript:alert(bad)")
<< URL { href: "javascript:alert(bad)", origin: "null", protocol:
"javascript:", username: "", password: "", host: "", hostname: "", port: "", pathname: "alert(bad)", search: ""

Mozilla Developers informed about the controlling specification for URLs is in fact defined by the "URL Spec"
from WHATWG which updates RFC 3986 and specifies that tabs and newlines
should be stripped from the scheme.

See: https://url.spec.whatwg.org/#concept-basic-url-parser

That link defines an automaton for URL parsing. From that link, steps 2 and 3 of scheme parsing read:

If input contains any ASCII tab or newline, validation error.
3. Remove all ASCII tab or newline from input.

urlparse module behavior should be updated, and an ASCII tab or newline should be removed from the url (sanitized) before it is sent to the request, as WHATWG spec.

tirkarthi · 2021-04-19T03:24:08Z

See also a related issue to sanitise newline on other helper functions https://bugs.python.org/issue30713

See also discussion and compatibility on disallowing control characters : https://bugs.python.org/issue30458

vstinner · 2021-04-20T10:41:07Z

See also bpo-43883.

orsenthil · 2021-04-25T14:53:45Z

I have added a PR to remove ascii newlines and tabs from URL input. It is as per the WHATWG spec.

However, I still like to research more and find out if this isn't introducing behavior that will break existing systems. It should also be aligned the decisions we have made with previous related bug reports.

Please review.

orsenthil · 2021-04-29T17:16:55Z

New changeset 76cd81d by Senthil Kumaran in branch 'master':
bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595)
76cd81d

orsenthil · 2021-04-29T17:57:46Z

New changeset 491fde0 by Miss Islington (bot) in branch '3.9':
[3.9] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) (GH-25725)
491fde0

gpshead · 2021-05-01T17:26:20Z

I think there's still a flaw in the fixes implemented in 3.10 and 3.9 so far. We're closer, but probably not quite good enough yet.

why? We aren't stripping the newlines+tab early enough.

I think we need to do the stripping *right after* the _coerce_args(url, ...) call at the start of the function.

Otherwise we
(1) are storing url variants with the bad characters in _parse_cache [a mere slowdown in the worst case as it'd just overflow the cache sooner]
(2) are splitting the scheme off the URL prior to stripping. in 3.9+ there is a check for valid scheme characters, which will defer to the default scheme when found. The WHATWG basic url parsing has these characters stripped before any parts are split off though, so 'ht\rtps' - for example - would wind up as 'https' rather than our behavior so far of deferring to the default scheme.

I noticed this when reviewing the pending 3.8 PR as it made it more obvious due to the structure of the code and would've allowed characters through into query and fragment in some cases. #25726 (review)

ambv · 2021-05-03T09:10:34Z

Good catch, Greg. Since it's not merged already, this change will miss 3.8.10 but as a security fix will be included in 3.8.11 later in the year.

The partial fix already landed in 3.9 will be released in 3.9.5 later today unless it's amended or reverted in a few hours.

orsenthil · 2021-05-03T14:12:08Z

Based on Greg's review comment, I have pushed the fix for 3.9, and 3.8

[3.9] [3.9] bpo-43882 Remove the newline, and tab early. From query and fragments. #25853
[3.8] [3.8] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) #25726

There is no need to hold off releases for these alone. If we get it merged before the release cut today, fine, otherwise, they will be in the next security fix.

orsenthil · 2021-05-03T19:09:07Z

New changeset 8a59574 by Senthil Kumaran in branch '3.9':
[3.9] bpo-43882 Remove the newline, and tab early. From query and fragments. (bpo-25853)
8a59574

mgorny · 2021-05-04T10:57:40Z

I hate to be the bearer of bad news but I've already found this change to be breaking tests of botocore and django. In both cases, the test failure is apparently because upstream used to reject URLs after finding newlines in the split components, and now they're silently stripped away.

Filed bugs:
boto/botocore#2377
https://code.djangoproject.com/ticket/32713

Note that I'm not saying the change should be reverted.

sethmlarson · 2021-05-04T17:26:45Z

Leaving a thought here, I'm highlighting that we're now implementing two different standards, RFC 3986 with hints of WHATWG-URL. There are pitfalls to doing so as now a strict URL parser for RFC 3986 (like the one used by urllib3/requests) will give different results compared to Python and thus opens up the door for SSRF vulnerabilities 1.

mlissner · 2021-05-04T20:16:12Z

I haven't watched that Blackhat presentation yet, but from the slides, it seems like the fix is to get all languages parsing URLs the same as the browsers. That's what @orsenthil has been doing here and plans to do in https://bugs.python.org/issue43883.

Should we get a bug filed with requests/urllib3 too? Seems like a good idea if it suffers from the same problems.

gpshead · 2021-05-05T02:19:35Z

Both Django and Botocore issues appear to be in the category of: "depending on invalid data being passed through our urlsplit API so that they could look for it later" Not much sympathy. We never guaranteed we'd pass invalid data through. They're depending on an implementation detail (Hyrum's law). Invalid data causes other people who don't check for it problems. There is no valid solution on our end within the stdlib that won't frustrate somebody.

We chose to move towards safer (undoubtedly not perfect) by default.

Instead of the patches as you see them, we could've raised an exception. I'm sure that would also also have tripped up existing code depending on the undesirable behavior.

If one wants to reject invalid data as an application/library/framework, they need a validator. The Python stdlib does not provide a URL validation API. I'm not convinced we would even want to (though that could be something bpo-43883 winds up providing) given how perilous that is to get right: Who's version of right? which set of standards? when and why? Conclusion: The web... such a mess.

ned-deily · 2021-05-07T19:12:23Z

My reading of the previous message was, even if we raised exception
or gave as a parameter, it wont be any better for certain downstream
users, as we let the security problem open, and have it only as opt-in fix.

Senthil, I am not sure which previous message you are referring to but, with regards to my comment about revert the recent fixes for 3.7 and 3.6 until the reported problems are resolved, I should add that, given the recent input from downstream users about the side effects, the only way we *should* proceed with the current changes is by including more information in a What's New entry and the NEWS blurb about that the implications to users are of these changes.

gpshead · 2021-05-07T19:15:24Z

There is no less intrusive fix as far as I can see. I believe we're down to either stick with what we've done, or do nothing. It doesn't have to be the same choice in all release branches, being more conservative with changes the older the stable branch is okay. (ie: removing this from 3.6 and 3.7 seems fine even if more recent ones do otherwise)

Based on my testing, raising an exception is more intrusive to existing tests (which we can only ever hope is representative of code) than stripping. At least as exposed by running the changes through many tens of thousands of unittest suites at work.

ie: if we raise an exception, pandas.read_json() starts failing because that winds up using urlsplit in hopes of extracting the scheme and comparing that to known values as their method of deciding if something should be treated as a URL to data rather than data. Pandas would need to be fixed.

That urlsplit() API use pattern is repeated in various other pieces of code: urlsplit is not expected to raise an exception. The caller then has a conditional or two testing some parts of the urlsplit result to make a guess as to if something should be considered a URL or not. Doing code inspection, pandas included, this code pretty much always then goes on to pass the original url value off to some other library, be it urllib, or requests, or ...).

Consequences of that code inspection finding? With our existing character stripping change, new data is then allowed to pass through these urlsplit uses and be considered a URL. Which leads to some code sending the url with embedded \r\n\t chars on to other APIs - a concern expressed a couple of times above.

Even though urlsplit isn't itself a validation API, it gets used as an early step in peoples custom identification and validation attempts. So *any* change we make to it at all in any way breaks someones expectations, even if they probably shouldn't have had those expectations and aren't doing wise validation.

Treat this analysis as a sign that we should provide an explicit url validator because almost everyone is doing it some form of wrong. (bpo-43883)

I did wonder if Mike's suggestion of removing the characters during processing, but leaving them in the final result in https://bugs.python.org/issue43882#msg393033 is feasible as remediation for this? My gut feeling is that it isn't. It doesn't solve the problem of preventing the bad data from going where it shouldn't. Even if we happen to parse that example differently, the unwanted characters are still retained in other places they don't belong. Fundamantelly: We require people to make a different series of API call and choices in the end user code to **explicitly not use unvalidated inputs**. Our stdlib API surface can't satisfy that today and use of unvalidated data in wrong places is a broad software security antipattern theme.

orsenthil · 2021-05-07T19:37:59Z

Ned wrote:

Senthil, I am not sure which previous message you are referring to but.

I meant, the messages from other developers who raised that change broke certain test cases.

Ned, but I got little concerned, if we planned to revert the change.

the only way we *should* proceed with the current changes is by including more information in a What's New entry and the NEWS blurb about that the implications to users are of these changes.

I agree with completely. I will include an additional blurb for this change for security fix versions.

Greg wrote:

There is no less intrusive fix as far as I can see. I believe we're down to either stick with what we've done, or do nothing.

Exactly my feeling too.

It doesn't have to be the same choice in all release branches, being more conservative with changes the older the stable branch is okay. (ie: removing this from 3.6 and 3.7 seems fine even if more recent ones do otherwise)

I hadn't considered that. But it wont save much will be my opinion. The users will have to upgrade to supported versions anyway and it will break then. The problem is only pushed a little.

So, keeping it consistent seems alright to me. It is a little additional for everyone, but we seem to be doing it.

ned-deily · 2021-05-20T02:11:52Z

I will include an additional blurb for this change for security fix versions.

Ping. This issue is still blocking 3.7 and 3.6 security releases.

ned-deily · 2021-05-20T20:15:09Z

New changeset c723d51 by Senthil Kumaran in branch '3.7':
[3.7] bpo-43882 - Mention urllib.parse changes in Whats New section for 3.7.11 (GH-26267)
c723d51

ned-deily · 2021-05-20T20:16:19Z

New changeset 6f743e7 by Senthil Kumaran in branch '3.6':
[3.6] bpo-43882 - Mention urllib.parse changes in Whats New section for 3.6.14 (GH-26268)
6f743e7

ned-deily · 2021-05-20T20:18:58Z

Thanks, Senthil and Greg! The updates for 3.7 and 3.6 are now merged. Is there anything else that needs to be done for this issue or can it now be closed?

gpshead · 2021-05-20T20:33:55Z

Lets get equivalent whatsnew text into the 3.8 and 3.9 and 3.10 branches before closing it.

orsenthil · 2021-05-21T12:29:36Z

New changeset f14015a by Senthil Kumaran in branch '3.10':
[3.10] bpo-43882 - Mention urllib.parse changes in Whats new section. (GH-26275)
f14015a

orsenthil · 2021-05-21T12:30:08Z

New changeset 0593ae8 by Senthil Kumaran in branch '3.9':
[3.9] bpo-43882 - Mention urllib.parse changes in Whats new section. (GH-26276)
0593ae8

ambv · 2021-06-28T10:05:35Z

New changeset 634da2d by Senthil Kumaran in branch '3.8':
[3.8] bpo-43882 - Mention urllib.parse changes in Whats new section. (bpo-26277)
634da2d

vstinner · 2022-02-06T23:39:40Z

CVE-2022-0391 has been assigned to this vulnerability.

mlissner · 2022-02-07T03:06:06Z

Looks like that CVE isn't public yet.

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-0391

Any chance I can get access (I originally reported this vuln.). My email is mike@free.law, if it's possible and my email is needed.

Thanks!

vstinner · 2022-02-08T08:44:47Z

Looks like that CVE isn't public yet.
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-0391
Any chance I can get access (I originally reported this vuln.).

Message from Gaurav Kamathe who requested the CVE:

"We've sent a request to MITRE to get this published and it'll be available on MITRE shortly."

timmc · 2022-08-25T19:05:36Z

@gpshead urlsplit already raises an exception for some malformed inputs: https://github.com/python/cpython/blob/v3.9.13/Lib/urllib/parse.py#L484 -- you can try it out with urlsplit('http://]/').

This kind of eclectic approach -- taking part of one spec, part of another, ignoring invalid values, turning invalid outputs into answers without a warning -- is just going to cause more vulnerabilities. A URL with newlines in it was never valid, for browsers or anything else. Now there's just another parser that will give answers that don't match other parsers (whether it's browsers, or curl, or anything else). Better to bite the bullet and say "hey, that's not parseable".

gpshead · 2022-08-25T19:10:34Z

If you have a concrete proposal to do something different or have found other bugs, please open a new Issue. Comments added to an old merged PR are likely to be ignored and unseen.

timmc · 2022-08-25T19:23:55Z

Understood. I mostly wanted to correct the record on urlsplit's existing behavior, for people looking back at the git blame to figure out why urlsplit behaves the way it does.

(And of course I couldn't resist tacking on a warning about parser mismatch, which I don't think got enough air time in the discussion.)

parth-gr · 2022-09-05T08:15:06Z

@orsenthil Why isn't this backported to python2?

Should I open a manual backport of this? As it will be helpful for python2 users also

gpshead · 2022-09-05T16:30:21Z

python2 is EOL and receives zero support here.

orsenthil added 3.10 only security fixes 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes labels Apr 18, 2021

orsenthil self-assigned this Apr 18, 2021

orsenthil added 3.10 only security fixes type-security A security issue 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes labels Apr 18, 2021

orsenthil self-assigned this Apr 18, 2021

orsenthil added the type-security A security issue label Apr 18, 2021

vstinner changed the title ~~urllib.parse should sanitize urls containing ASCII newline and tabs.~~ [security] urllib.parse should sanitize urls containing ASCII newline and tabs. Apr 20, 2021

vstinner added stdlib Python modules in the Lib dir labels Apr 20, 2021

ned-deily added release-blocker labels May 20, 2021

ned-deily removed release-blocker labels May 20, 2021

gpshead closed this as completed Jun 2, 2021

vstinner changed the title ~~[security] urllib.parse should sanitize urls containing ASCII newline and tabs.~~ [security] CVE-2022-0391: urllib.parse should sanitize urls containing ASCII newline and tabs. Feb 6, 2022

ezio-melotti transferred this issue from another repository Apr 10, 2022

gpshead mentioned this issue Apr 1, 2023

urllib.parse space handling CVE-2023-24329 appears unfixed #102153

Closed

ERosendo mentioned this issue Jun 22, 2023

Upgrade to Django 4.2 freelawproject/bigcases2#275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[security] CVE-2022-0391: urllib.parse should sanitize urls containing ASCII newline and tabs. #88048

[security] CVE-2022-0391: urllib.parse should sanitize urls containing ASCII newline and tabs. #88048

orsenthil commented Apr 18, 2021

orsenthil commented Apr 18, 2021

tirkarthi commented Apr 19, 2021

vstinner commented Apr 20, 2021

orsenthil commented Apr 25, 2021

orsenthil commented Apr 29, 2021

orsenthil commented Apr 29, 2021

gpshead commented May 1, 2021

ambv commented May 3, 2021

orsenthil commented May 3, 2021

orsenthil commented May 3, 2021

mgorny mannequin commented May 4, 2021

sethmlarson mannequin commented May 4, 2021

mlissner mannequin commented May 4, 2021

gpshead commented May 5, 2021

ned-deily commented May 7, 2021

gpshead commented May 7, 2021

orsenthil commented May 7, 2021

ned-deily commented May 20, 2021

ned-deily commented May 20, 2021

ned-deily commented May 20, 2021

ned-deily commented May 20, 2021

gpshead commented May 20, 2021

orsenthil commented May 21, 2021

orsenthil commented May 21, 2021

ambv commented Jun 28, 2021

vstinner commented Feb 6, 2022

mlissner mannequin commented Feb 7, 2022

vstinner commented Feb 8, 2022

timmc commented Aug 25, 2022

gpshead commented Aug 25, 2022

timmc commented Aug 25, 2022

parth-gr commented Sep 5, 2022 •

edited

gpshead commented Sep 5, 2022

[security] CVE-2022-0391: urllib.parse should sanitize urls containing ASCII newline and tabs. #88048

[security] CVE-2022-0391: urllib.parse should sanitize urls containing ASCII newline and tabs. #88048

Comments

orsenthil commented Apr 18, 2021

orsenthil commented Apr 18, 2021

tirkarthi commented Apr 19, 2021

vstinner commented Apr 20, 2021

orsenthil commented Apr 25, 2021

orsenthil commented Apr 29, 2021

orsenthil commented Apr 29, 2021

gpshead commented May 1, 2021

ambv commented May 3, 2021

orsenthil commented May 3, 2021

orsenthil commented May 3, 2021

mgorny mannequin commented May 4, 2021

sethmlarson mannequin commented May 4, 2021

mlissner mannequin commented May 4, 2021

gpshead commented May 5, 2021

ned-deily commented May 7, 2021

gpshead commented May 7, 2021

orsenthil commented May 7, 2021

ned-deily commented May 20, 2021

ned-deily commented May 20, 2021

ned-deily commented May 20, 2021

ned-deily commented May 20, 2021

gpshead commented May 20, 2021

orsenthil commented May 21, 2021

orsenthil commented May 21, 2021

ambv commented Jun 28, 2021

vstinner commented Feb 6, 2022

mlissner mannequin commented Feb 7, 2022

vstinner commented Feb 8, 2022

timmc commented Aug 25, 2022

gpshead commented Aug 25, 2022

timmc commented Aug 25, 2022

parth-gr commented Sep 5, 2022 • edited

gpshead commented Sep 5, 2022

parth-gr commented Sep 5, 2022 •

edited