classification
Title: ReDoS in urllib.request
Type: security Stage: patch review
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: miss-islington, orsenthil, serhiy.storchaka, vstinner, yetingli
Priority: normal Keywords: easy, newcomer friendly, patch

Created on 2021-01-30 08:11 by yetingli, last changed 2021-04-07 15:58 by vstinner.

Files
File name Uploaded Description Edit
redos_python.py yetingli, 2021-01-30 08:11
redos_python2.py vstinner, 2021-04-07 11:25
Pull Requests
URL Status Linked Edit
PR 24391 merged yetingli, 2021-01-31 05:12
PR 25247 merged miss-islington, 2021-04-07 11:28
PR 25248 merged miss-islington, 2021-04-07 11:28
PR 25249 open miss-islington, 2021-04-07 11:29
PR 25250 open miss-islington, 2021-04-07 11:29
Messages (12)
msg385974 - (view) Author: yeting li (yetingli) * Date: 2021-01-30 08:11
Hi,

I find this regex '(?:^|,)[ \t]*([^ \t]+)[ \t]+' may be stucked by input.

The vulnerable regex is located in https://github.com/python/cpython/blob/5c5a938573ce665f00e362c7766912d9b3f3b44e/Lib/urllib/request.py#L946

The ReDOS vulnerability of the regex is mainly due to the sub-pattern ',([^ \t]+)' and can be exploited with the following string
attack_str = "," * 10000

You can execute redos_python.py to reproduce the ReDos vulnerability.


I am willing to suggest that you replace '(?:^|,)[ \t]*([^ \t]+)[ \t]+' with '(?:^|,)[ \t]*([^ \t,]+)[ \t]+'

Looking forward for your response‚Äč!

Best,
Yeting Li
msg385987 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-01-30 16:32
I agree. There is no catastrophic backtracking here (it was fixed in issue39503), but the complexity of matching the regular expression is linear. Searching the pattern in a sequence of commas has quadratic complexity, because every step has linear complexity and we advance only one character at every attempt.

The proposed solution looks correct to me and fixes the issue. Yeting Li, do you mind to create a pull request for it? I can do it myself, but since you have found the problem and the solution, it would be better if the commit be attributed to you.
msg385993 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2021-01-30 18:59
+1. The suggested fix looks good to me.
msg386009 - (view) Author: yeting li (yetingli) * Date: 2021-01-31 05:44
Thank you for your quick reply!

I agree. Catastrophic backtracking is typically regarded as a regex with exponential worst-case matching time. Besides regexes with exponential worst-case time complexity, ReDoS also includes ones with  other super-linear (e.g., quadratic) worst-case time complexity.


Thanks again for your reply, I'm trying to create a pull request for it.
msg388358 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-09 11:51
I see that you attached a redos_python.py benchmark (which looks like a benchmark that I wrote recently ;-)) but you didn't give results. Can you please show that your fix is effective to avoid catastrophic performances?

Is this issue related to the CVE-2020-8492? Is it the same issue or is it different?
https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
msg388665 - (view) Author: yeting li (yetingli) * Date: 2021-03-14 08:50
Sorry for the delay. I analyzed the performance of the current version '(?:^|,)[ \t]*([^ \t]+)[ \t]+' and the fixed version '(?:^|,)[ \t]*([^ \t,]+)[ \t]+'. I ran the following HTTP header ten times:

header = '' + ',' * (10 ** 5)

The current version takes about 139.178s-140.946s, while the repaired version takes about 0.006s.

You can analyze them with the code below.

    from time import perf_counter
    for _ in range(0, 10):
        BEGIN = perf_counter()
        header = repeat_10_5_simple
        headers = Headers(header)
        handler.http_error_auth_reqed("WWW-Authenticate", host, req, Headers(header))
        DURATION = perf_counter() - BEGIN
        print(f"took {DURATION} seconds!") 

For CVE-2020-8492, it is the backtracking performance caused by some ambiguity during the matching, and this issue is caused by the regex engine constantly moves the matching regex across the malicious string that does not have a match for the regex.

Because the locations of the vulnerabilities are the same, so I refer to your code. Thanks for the code ;-)!
msg390415 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-07 10:59
> header = '' + ',' * (10 ** 5)

I guess that a more generic protection against future attacks would be to limit the maximum length of a HTTP header. 100,000 characters for a HTTP Basic authentification does not sound reasonable.

But for now, let's fix the regex.
msg390417 - (view) Author: yeting li (yetingli) * Date: 2021-04-07 11:14
For a regex has polynomial worst-case complexity, limiting the maximum input length is indeed a very effective method.

As shown below, as the input length becomes smaller, the matching time becomes significantly smaller.

header = '' + ',' * (10 ** 4)    1.617s
header = '' + ',' * (10 ** 3)    0.014s
header = '' + ',' * (10 ** 2)    0.00017s
msg390419 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-07 11:25
redos_python2.py: Updated benchmark.

I confirm that PR 24391 fix a worst case performance, starting with 100 characters.

Since the complexity is quadratic, strings longer 10^4 characters are likely to hang a client for several minutes.

== Reference (vulnerable) ==

simple: Mean +- std dev: 2.10 us +- 0.05 us
repeat 10: Mean +- std dev: 3.85 us +- 0.13 us
repeat 10^2: Mean +- std dev: 133 us +- 3 us
repeat 10^4: Mean +- std dev: 1.23 sec +- 0.05 sec

== With the PR 24391 fix ==

simple: Mean +- std dev: 2.15 us +- 0.15 us
repeat 10: Mean +- std dev: 2.44 us +- 0.04 us
repeat 10^2: Mean +- std dev: 7.45 us +- 0.17 us
repeat 10^4: Mean +- std dev: 574 us +- 28 us

== Comparison ==

simple: Mean +- std dev: [ref] 2.10 us +- 0.05 us -> [fix] 2.15 us +- 0.15 us: 1.02x slower
repeat 10: Mean +- std dev: [ref] 3.85 us +- 0.13 us -> [fix] 2.44 us +- 0.04 us: 1.58x faster
repeat 10^2: Mean +- std dev: [ref] 133 us +- 3 us -> [fix] 7.45 us +- 0.17 us: 17.80x faster
repeat 10^4: Mean +- std dev: [ref] 1.23 sec +- 0.05 sec -> [fix] 574 us +- 28 us: 2152.36x faster

Geometric mean: 15.59x faster
msg390420 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-07 11:27
New changeset 7215d1ae25525c92b026166f9d5cac85fb1defe1 by Yeting Li in branch 'master':
bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391)
https://github.com/python/cpython/commit/7215d1ae25525c92b026166f9d5cac85fb1defe1
msg390425 - (view) Author: miss-islington (miss-islington) Date: 2021-04-07 11:45
New changeset e7654b6046090914a8323931ed759a94a5f85d60 by Miss Islington (bot) in branch '3.8':
bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391)
https://github.com/python/cpython/commit/e7654b6046090914a8323931ed759a94a5f85d60
msg390441 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-07 15:58
New changeset a21d4fbd549ec9685068a113660553d7f80d9b09 by Miss Islington (bot) in branch '3.9':
bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391) (GH-25247)
https://github.com/python/cpython/commit/a21d4fbd549ec9685068a113660553d7f80d9b09
History
Date User Action Args
2021-04-07 15:58:07vstinnersetmessages: + msg390441
2021-04-07 11:45:13miss-islingtonsetmessages: + msg390425
2021-04-07 11:29:19miss-islingtonsetpull_requests: + pull_request23989
2021-04-07 11:29:11miss-islingtonsetpull_requests: + pull_request23988
2021-04-07 11:28:20miss-islingtonsetpull_requests: + pull_request23987
2021-04-07 11:28:09miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request23986
2021-04-07 11:27:50vstinnersetmessages: + msg390420
2021-04-07 11:25:55vstinnersetfiles: + redos_python2.py

messages: + msg390419
2021-04-07 11:14:28yetinglisetmessages: + msg390417
2021-04-07 10:59:29vstinnersetmessages: + msg390415
2021-03-14 08:50:54yetinglisetmessages: + msg388665
2021-03-09 11:51:39vstinnersetnosy: + vstinner
messages: + msg388358
2021-03-03 13:42:33taleinatsetkeywords: + newcomer friendly
2021-03-03 03:02:02zach.waresetkeywords: + patch
2021-03-03 03:00:23zach.waresetkeywords: + easy, - patch, easy (C)
2021-03-03 02:45:12eric.araujosettitle: ReDoS in request -> ReDoS in urllib.request
2021-03-03 00:50:17orsenthilsetkeywords: + easy (C)
2021-01-31 05:44:12yetinglisetmessages: + msg386009
2021-01-31 05:12:49yetinglisetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request23205
2021-01-30 18:59:10orsenthilsetmessages: + msg385993
2021-01-30 16:32:26serhiy.storchakasetstage: needs patch
versions: + Python 3.10
2021-01-30 16:32:06serhiy.storchakasetmessages: + msg385987
2021-01-30 09:21:50xtreaksetnosy: + orsenthil, serhiy.storchaka
type: security
2021-01-30 08:11:46yetinglicreate