Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVE-2021-3733: ReDoS in urllib.request #87241

Closed
yetingli mannequin opened this issue Jan 30, 2021 · 20 comments
Closed

CVE-2021-3733: ReDoS in urllib.request #87241

yetingli mannequin opened this issue Jan 30, 2021 · 20 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes easy stdlib Python modules in the Lib dir type-security A security issue

Comments

@yetingli
Copy link
Mannequin

yetingli mannequin commented Jan 30, 2021

BPO 43075
Nosy @orsenthil, @vstinner, @ned-deily, @ambv, @serhiy-storchaka, @miss-islington, @yetingli, @StayPirate
PRs
  • bpo-43075: Fix ReDoS in request #24391
  • [3.9] bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391) #25247
  • [3.8] bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391) #25248
  • [3.7] bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391) #25249
  • [3.6] bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391) #25250
  • Files
  • redos_python.py
  • redos_python2.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-05-06.17:00:47.929>
    created_at = <Date 2021-01-30.08:11:46.452>
    labels = ['type-security', 'easy', '3.8', '3.9', '3.10', '3.7', 'library']
    title = 'CVE-2021-3733: ReDoS in urllib.request'
    updated_at = <Date 2021-09-07.20:12:23.266>
    user = 'https://github.com/yetingli'

    bugs.python.org fields:

    activity = <Date 2021-09-07.20:12:23.266>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-05-06.17:00:47.929>
    closer = 'ned.deily'
    components = ['Library (Lib)']
    creation = <Date 2021-01-30.08:11:46.452>
    creator = 'yetingli'
    dependencies = []
    files = ['49778', '49938']
    hgrepos = []
    issue_num = 43075
    keywords = ['patch', 'easy', 'newcomer friendly']
    message_count = 16.0
    messages = ['385974', '385987', '385993', '386009', '388358', '388665', '390415', '390417', '390419', '390420', '390425', '390441', '392885', '393109', '400130', '401341']
    nosy_count = 8.0
    nosy_names = ['orsenthil', 'vstinner', 'ned.deily', 'lukasz.langa', 'serhiy.storchaka', 'miss-islington', 'yetingli', 'crazybyte']
    pr_nums = ['24391', '25247', '25248', '25249', '25250']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue43075'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10']

    @yetingli
    Copy link
    Mannequin Author

    yetingli mannequin commented Jan 30, 2021

    Hi,

    I find this regex '(?:^|,)[ \t]*([^ \\t]+)[ \t]+' may be stucked by input.

    The vulnerable regex is located in

    rx = re.compile('(?:^|,)' # start of the string or ','

    The ReDOS vulnerability of the regex is mainly due to the sub-pattern ',([^ \\t]+)' and can be exploited with the following string
    attack_str = "," * 10000

    You can execute redos_python.py to reproduce the ReDos vulnerability.

    I am willing to suggest that you replace '(?:^|,)[ \t]([^ \\t]+)[ \t]+' with '(?:^|,)[ \t]([^ \\t,]+)[ \t]+'

    Looking forward for your response​!

    Best,
    Yeting Li

    @yetingli yetingli mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir labels Jan 30, 2021
    @tirkarthi tirkarthi added type-security A security issue labels Jan 30, 2021
    @serhiy-storchaka
    Copy link
    Member

    I agree. There is no catastrophic backtracking here (it was fixed in bpo-39503), but the complexity of matching the regular expression is linear. Searching the pattern in a sequence of commas has quadratic complexity, because every step has linear complexity and we advance only one character at every attempt.

    The proposed solution looks correct to me and fixes the issue. Yeting Li, do you mind to create a pull request for it? I can do it myself, but since you have found the problem and the solution, it would be better if the commit be attributed to you.

    @serhiy-storchaka serhiy-storchaka added 3.10 only security fixes labels Jan 30, 2021
    @orsenthil
    Copy link
    Member

    +1. The suggested fix looks good to me.

    @yetingli
    Copy link
    Mannequin Author

    yetingli mannequin commented Jan 31, 2021

    Thank you for your quick reply!

    I agree. Catastrophic backtracking is typically regarded as a regex with exponential worst-case matching time. Besides regexes with exponential worst-case time complexity, ReDoS also includes ones with other super-linear (e.g., quadratic) worst-case time complexity.

    Thanks again for your reply, I'm trying to create a pull request for it.

    @merwok merwok changed the title ReDoS in request ReDoS in urllib.request Mar 3, 2021
    @merwok merwok changed the title ReDoS in request ReDoS in urllib.request Mar 3, 2021
    @zware zware added easy and removed easy labels Mar 3, 2021
    @vstinner
    Copy link
    Member

    vstinner commented Mar 9, 2021

    I see that you attached a redos_python.py benchmark (which looks like a benchmark that I wrote recently ;-)) but you didn't give results. Can you please show that your fix is effective to avoid catastrophic performances?

    Is this issue related to the CVE-2020-8492? Is it the same issue or is it different?
    https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html

    @yetingli
    Copy link
    Mannequin Author

    yetingli mannequin commented Mar 14, 2021

    Sorry for the delay. I analyzed the performance of the current version '(?:^|,)[ \t]([^ \\t]+)[ \t]+' and the fixed version '(?:^|,)[ \t]([^ \\t,]+)[ \t]+'. I ran the following HTTP header ten times:

    header = '' + ',' * (10 ** 5)

    The current version takes about 139.178s-140.946s, while the repaired version takes about 0.006s.

    You can analyze them with the code below.

        from time import perf_counter
        for _ in range(0, 10):
            BEGIN = perf_counter()
            header = repeat_10_5_simple
            headers = Headers(header)
            handler.http_error_auth_reqed("WWW-Authenticate", host, req, Headers(header))
            DURATION = perf_counter() - BEGIN
            print(f"took {DURATION} seconds!") 

    For CVE-2020-8492, it is the backtracking performance caused by some ambiguity during the matching, and this issue is caused by the regex engine constantly moves the matching regex across the malicious string that does not have a match for the regex.

    Because the locations of the vulnerabilities are the same, so I refer to your code. Thanks for the code ;-)!

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2021

    header = '' + ',' * (10 ** 5)

    I guess that a more generic protection against future attacks would be to limit the maximum length of a HTTP header. 100,000 characters for a HTTP Basic authentification does not sound reasonable.

    But for now, let's fix the regex.

    @yetingli
    Copy link
    Mannequin Author

    yetingli mannequin commented Apr 7, 2021

    For a regex has polynomial worst-case complexity, limiting the maximum input length is indeed a very effective method.

    As shown below, as the input length becomes smaller, the matching time becomes significantly smaller.

    header = '' + ',' * (10 ** 4) 1.617s
    header = '' + ',' * (10 ** 3) 0.014s
    header = '' + ',' * (10 ** 2) 0.00017s

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2021

    redos_python2.py: Updated benchmark.

    I confirm that PR 24391 fix a worst case performance, starting with 100 characters.

    Since the complexity is quadratic, strings longer 10^4 characters are likely to hang a client for several minutes.

    == Reference (vulnerable) ==

    simple: Mean +- std dev: 2.10 us +- 0.05 us
    repeat 10: Mean +- std dev: 3.85 us +- 0.13 us
    repeat 10^2: Mean +- std dev: 133 us +- 3 us
    repeat 10^4: Mean +- std dev: 1.23 sec +- 0.05 sec

    == With the PR 24391 fix ==

    simple: Mean +- std dev: 2.15 us +- 0.15 us
    repeat 10: Mean +- std dev: 2.44 us +- 0.04 us
    repeat 10^2: Mean +- std dev: 7.45 us +- 0.17 us
    repeat 10^4: Mean +- std dev: 574 us +- 28 us

    == Comparison ==

    simple: Mean +- std dev: [ref] 2.10 us +- 0.05 us -> [fix] 2.15 us +- 0.15 us: 1.02x slower
    repeat 10: Mean +- std dev: [ref] 3.85 us +- 0.13 us -> [fix] 2.44 us +- 0.04 us: 1.58x faster
    repeat 10^2: Mean +- std dev: [ref] 133 us +- 3 us -> [fix] 7.45 us +- 0.17 us: 17.80x faster
    repeat 10^4: Mean +- std dev: [ref] 1.23 sec +- 0.05 sec -> [fix] 574 us +- 28 us: 2152.36x faster

    Geometric mean: 15.59x faster

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2021

    New changeset 7215d1a by Yeting Li in branch 'master':
    bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391)
    7215d1a

    @miss-islington
    Copy link
    Contributor

    New changeset e7654b6 by Miss Islington (bot) in branch '3.8':
    bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391)
    e7654b6

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2021

    New changeset a21d4fb by Miss Islington (bot) in branch '3.9':
    bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391) (GH-25247)
    a21d4fb

    @ambv
    Copy link
    Contributor

    ambv commented May 4, 2021

    New changeset ada1499 by Miss Islington (bot) in branch '3.7':
    bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391) (bpo-25249)
    ada1499

    @ned-deily
    Copy link
    Member

    New changeset 3fbe961 by Miss Islington (bot) in branch '3.6':
    bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391) (GH-25250)
    3fbe961

    @staypirate
    Copy link
    Mannequin

    staypirate mannequin commented Aug 23, 2021

    RedHat has now assigned CVE-2021-3733 to this security bug.

    @vstinner vstinner changed the title ReDoS in urllib.request CVE-2021-3733: ReDoS in urllib.request Sep 7, 2021
    @vstinner vstinner changed the title ReDoS in urllib.request CVE-2021-3733: ReDoS in urllib.request Sep 7, 2021
    @vstinner
    Copy link
    Member

    vstinner commented Sep 7, 2021

    I created https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html to track this vulnerability.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @OmkarPatil11
    Copy link

    is there any affect of CVE-2021-3733 this security bug on python 2.7.18?

    @arhadthedev
    Copy link
    Member

    @OmkarPatil11 Unfortunately yes. Here how a fix for 3.11 looks like:

     class AbstractBasicAuthHandler:
         # XXX this allows for multiple auth-schemes, but will stupidly pick
         # the last one with a realm specified.
         # allow for double- and single-quoted realm values
         # (single quotes are a violation of the RFC, but appear in the wild)
         rx = re.compile('(?:^|,)'   # start of the string or ','
                         '[ \t]*'    # optional whitespaces
    -                    '([^ \t]+)' # scheme like "Basic"
    +                    '([^ \t,]+)' # scheme like "Basic"
                         '[ \t]+'    # mandatory whitespaces
                         # realm=xxx
                         # realm='xxx'
                         # realm="xxx"
                         'realm=(["\']?)([^"\']*)\\2',
                         re.I)
         # XXX could pre-emptively send auth info already accepted (RFC 2617,
         # end of section 2, and section 1.2 immediately after "credentials"
         # production).

    And here is the same fragment from v2.7.18, unfixed:

    cpython/Lib/urllib2.py

    Lines 852 to 864 in 8d21aa2

    class AbstractBasicAuthHandler:
    # XXX this allows for multiple auth-schemes, but will stupidly pick
    # the last one with a realm specified.
    # allow for double- and single-quoted realm values
    # (single quotes are a violation of the RFC, but appear in the wild)
    rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
    'realm=(["\']?)([^"\']*)\\2', re.I)
    # XXX could pre-emptively send auth info already accepted (RFC 2617,
    # end of section 2, and section 1.2 immediately after "credentials"
    # production).

    @orsenthil
    Copy link
    Member

    @OmkarPatil11 , Python 2.7.x is not supported. It doesn't receive security fixes too. Time to upgrade applications to Python 3.8 +

    @vstinner
    Copy link
    Member

    Contact your Linux distribution to get a fix on Python 2.7. On Fedora, there are:

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes easy stdlib Python modules in the Lib dir type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests