Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[security] Infinite loop on folding email (_fold_as_ew()) if an header has no spaces #77710

Closed
rad164 mannequin opened this issue May 16, 2018 · 11 comments
Closed

[security] Infinite loop on folding email (_fold_as_ew()) if an header has no spaces #77710

rad164 mannequin opened this issue May 16, 2018 · 11 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes topic-email type-security A security issue

Comments

@rad164
Copy link
Mannequin

rad164 mannequin commented May 16, 2018

BPO 33529
Nosy @warsaw, @vstinner, @ned-deily, @bitdancer, @maxking, @tirkarthi
PRs
  • bpo-33529: Fix Infinite loop on folding email if headers has non_ascii #7763
  • bpo-33529: Fix infinite loop in email header encoding #12020
  • [3.7] bpo-33529, email: Fix infinite loop in email header encoding (GH-12020) #13321
  • [3.6] bpo-33529, email: Fix infinite loop in email header encoding (GH-12020) #14162
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-06-18.08:31:25.562>
    created_at = <Date 2018-05-16.00:12:28.457>
    labels = ['type-security', '3.7', '3.8', 'expert-email']
    title = '[security] Infinite loop on folding email (_fold_as_ew()) if an header has no spaces'
    updated_at = <Date 2019-06-18.08:31:25.561>
    user = 'https://bugs.python.org/rad164'

    bugs.python.org fields:

    activity = <Date 2019-06-18.08:31:25.561>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-06-18.08:31:25.562>
    closer = 'vstinner'
    components = ['email']
    creation = <Date 2018-05-16.00:12:28.457>
    creator = 'rad164'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33529
    keywords = ['patch']
    message_count = 11.0
    messages = ['316747', '319807', '330925', '342487', '342512', '344561', '345874', '345878', '345879', '345938', '345960']
    nosy_count = 7.0
    nosy_names = ['barry', 'vstinner', 'ned.deily', 'r.david.murray', 'maxking', 'rad164', 'xtreak']
    pr_nums = ['7763', '12020', '13321', '14162']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue33529'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @rad164
    Copy link
    Mannequin Author

    rad164 mannequin commented May 16, 2018

    I just reported a bug about email folding at bpo-33524, but this issue is more fatal in some languages like Chinese or Japanese, which does not insert spaces between each words.
    Python 3.6.5 has this issue, while 3.6.4 does not.

    Create an email with longer header than max_line_length set by its policy. And the header contains non-ascii characters but no white spaces.
    When try to fold it, python gets stuck and finally system hangs. There are no output unless I stop it with Ctrl-C.

    ^CTraceback (most recent call last):
      File "emailtest.py", line 7, in <module>
        policy.fold("Subject", msg["Subject"])
      File "/usr/lib/python3.6/email/policy.py", line 183, in fold
        return self._fold(name, value, refold_binary=True)
      File "/usr/lib/python3.6/email/policy.py", line 205, in _fold
        return value.fold(policy=self)
      File "/usr/lib/python3.6/email/headerregistry.py", line 258, in fold
        return header.fold(policy=policy)
      File "/usr/lib/python3.6/email/_header_value_parser.py", line 144, in fold
        return _refold_parse_tree(self, policy=policy)
      File "/usr/lib/python3.6/email/_header_value_parser.py", line 2651, in _refold_parse_tree
        part.ew_combine_allowed, charset)
      File "/usr/lib/python3.6/email/_header_value_parser.py", line 2735, in _fold_as_ew
        ew = _ew.encode(first_part)
      File "/usr/lib/python3.6/email/_encoded_words.py", line 215, in encode
        blen = _cte_encode_length['b'](bstring)
      File "/usr/lib/python3.6/email/_encoded_words.py", line 130, in len_b
        groups_of_3, leftover = divmod(len(bstring), 3)
    KeyboardInterrupt

    Code to reproduce:

    from email.message import EmailMessage
    from email.policy import default
    
    policy = default # max_line_length = 78
    msg = EmailMessage()
    msg["Subject"] = "á"*100
    policy.fold("Subject", msg["Subject"])

    No problems in following cases:

    1. If the header is shorter than max_line_length.
    2. If the header can be split with spaces and the all chunk is shorter than max_line_length.
    3. If the header is fully composed with ascii characters. In this case, there is no problem even if it is very long without spaces.

    @rad164 rad164 mannequin added type-bug An unexpected behavior, bug, or error topic-email labels May 16, 2018
    @tirkarthi
    Copy link
    Member

    I tried the test case on master branch. I ran the test case on 1GB RAM Linux based digitalocean droplet to have the script killed. Please find the results as below :

    # Python build

    ➜ cpython git:(master) ✗ ./python
    Python 3.8.0a0 (heads/bpo33095-add-reference:9d49f85, Jun 17 2018, 07:22:33)
    [GCC 5.4.0 20160609] on linux
    Type "help", "copyright", "credits" or "license" for more information.

    >>

    # Test case

    ➜ cpython git:(master) ✗ cat foo.py

    from email.message import EmailMessage
    from email.policy import default
    
    policy = default # max_line_length = 78
    msg = EmailMessage()
    msg["Subject"] = "á"*100
    policy.fold("Subject", msg["Subject"])

    # Test case execution

    ➜ cpython git:(master) ✗ time ./python foo.py
    [2] 13637 killed ./python foo.py
    ./python foo.py 387.36s user 3.85s system 90% cpu 7:11.94 total

    # I tried to do Ctrl + C after 2 minutes to stop and the stack trace is as below :

    ➜  cpython git:(master) ✗ time ./python foo.py
    ^CTraceback (most recent call last):
      File "foo.py", line 7, in <module>
        policy.fold("Subject", msg["Subject"])
      File "/root/cpython/Lib/email/policy.py", line 183, in fold
        return self._fold(name, value, refold_binary=True)
      File "/root/cpython/Lib/email/policy.py", line 205, in _fold
        return value.fold(policy=self)
      File "/root/cpython/Lib/email/headerregistry.py", line 258, in fold
        return header.fold(policy=policy)
      File "/root/cpython/Lib/email/_header_value_parser.py", line 144, in fold
        return _refold_parse_tree(self, policy=policy)
      File "/root/cpython/Lib/email/_header_value_parser.py", line 2650, in _refold_parse_tree
        part.ew_combine_allowed, charset)
      File "/root/cpython/Lib/email/_header_value_parser.py", line 2728, in _fold_as_ew
        ew = _ew.encode(first_part, charset=encode_as)
      File "/root/cpython/Lib/email/_encoded_words.py", line 226, in encode
        qlen = _cte_encode_length['q'](bstring)
      File "/root/cpython/Lib/email/_encoded_words.py", line 93, in len_q
        return sum(len(_q_byte_map[x]) for x in bstring)
      File "/root/cpython/Lib/email/_encoded_words.py", line 93, in <genexpr>
        return sum(len(_q_byte_map[x]) for x in bstring)
    KeyboardInterrupt
    ./python foo.py  131.41s user 0.43s system 98% cpu 2:13.89 total

    Thanks

    @vstinner
    Copy link
    Member

    vstinner commented Dec 3, 2018

    Since it's a denial of service which can be triggered by an user, I mark this issue as a security issue.

    I can be wrong, but it seems like Python 2.7 isn't affected: Lib/email/_header_value_parser.py was added by bpo-12586 (commit 0b6f6c8). Python 2.7 doesn't have this file nor policies.

    @vstinner vstinner added 3.7 (EOL) end of life 3.8 only security fixes labels Dec 3, 2018
    @vstinner vstinner changed the title Infinite loop on folding email if headers has no spaces [security] Infinite loop on folding email (_fold_as_ew()) if an header has no spaces Dec 3, 2018
    @vstinner vstinner added type-security A security issue and removed type-bug An unexpected behavior, bug, or error labels Dec 3, 2018
    @vstinner
    Copy link
    Member

    New changeset c1f5667 by Victor Stinner (Krzysztof Wojcik) in branch 'master':
    bpo-33529, email: Fix infinite loop in email header encoding (GH-12020)
    c1f5667

    @vstinner
    Copy link
    Member

    New changeset 2fef5b0 by Victor Stinner (Miss Islington (bot)) in branch '3.7':
    bpo-33529, email: Fix infinite loop in email header encoding (GH-12020) (GH-13321)
    2fef5b0

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2019

    Python 3.6, 3.5 and 2.7 are still vulnerable. Is there someone interested to backport the fix?

    @vstinner
    Copy link
    Member

    It's unclear to me if Python 3.5 is affected or not.

    The fix changes the function _fold_as_ew(), Python 3.5 doesn't have this function *but* there is a call a _fold_as_ew() method!?

    Lib/email/_header_value_parser.py:427: in _fold() method

            ...
            if is_ew or last_ew:
                # It's too big to fit on the line, but since we've
                # got encoded words we can use encoded word folding.
                part._fold_as_ew(folded)
                continue
            ...
    

    If I backport the 2 tests, they fail *but* they don't hang forever (they complete in less than 1 second).

    ======================================================================
    FAIL: test_fold_overlong_words_using_RFC2047 (test.test_email.test_headerregistry.TestFolding)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/vstinner/prog/python/3.5/Lib/test/test_email/test_headerregistry.py", line 1601, in test_fold_overlong_words_using_RFC2047
        'X-Report-Abuse: =?utf-8?q?=3Chttps=3A//www=2Emailitapp=2E'
    AssertionError: 'X-Report-Abuse: <https://www.mailitapp.com/report_abuse.p[50 chars]x>\n' != 'X-Report-Abuse: =?utf-8?q?=3Chttps=3A//www=2Emailitapp=2E[114 chars]?=\n'
    - X-Report-Abuse: <https://www.mailitapp.com/report_abuse.php?mid=xxx-xxx-xxxxxxxxxxxxxxxxxxxxxxxx==-xxx-xx-xx>
    + X-Report-Abuse: =?utf-8?q?=3Chttps=3A//www=2Emailitapp=2Ecom/report=5Fabuse?=
    +  =?utf-8?q?=2Ephp=3Fmid=3Dxxx-xxx-xxxxxxxxxxxxxxxxxxxxxxxx=3D=3D-xxx-xx-xx?=
    +  =?utf-8?q?=3E?=

    ======================================================================
    FAIL: test_non_ascii_chars_do_not_cause_inf_loop (test.test_email.test_policy.PolicyAPITests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/vstinner/prog/python/3.5/Lib/test/test_email/test_policy.py", line 241, in test_non_ascii_chars_do_not_cause_inf_loop
        12 * ' =?utf-8?q?=C4=85?=\n')
    AssertionError: 'Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?=\n' != 'Subject: \n =?utf-8?q?=C4=85?=\n =?utf-8?q?=C4=85?[209 chars]?=\n'
    - Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?=
    + Subject: 
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=
    +  =?utf-8?q?=C4=85?=

    @vstinner
    Copy link
    Member

    Python 3.5 is not vulnerable, it doesn't hang on the following code:

    import email.policy
    policy = email.policy.default.clone(max_line_length=20)
    actual = policy.fold('Subject', '\u0105' * 12)

    @vstinner
    Copy link
    Member

    Python 2.7 doesn't have email.policy module.

    For Python 2.7, I wrote this code:
    ---

    import email.header
    import email.message
    
    msg = email.message.Message()
    msg.set_charset("UTF-8")
    msg['Subject'] = email.header.Header(u'\u0105' * 12, maxlinelen=20, charset="UTF-8")
    print(msg.as_string())

    I get this output:
    ---
    MIME-Version: 1.0
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: base64
    Subject: =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=
    =?utf-8?b?xIU=?=

    ---

    I have no idea if this example says that Python 2.7 is vulnerable or not. I get a different output on the master branch:
    ---
    MIME-Version: 1.0
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: base64
    Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?=

    ---

    But I don't know if I use the email API properly. "Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?=" is longer than 20 characters.

    @ned-deily
    Copy link
    Member

    New changeset 516a6a2 by Ned Deily (Victor Stinner) in branch '3.6':
    bpo-33529, email: Fix infinite loop in email header encoding (GH-12020) (GH-14162)
    516a6a2

    @vstinner
    Copy link
    Member

    Using git bisect, I found which commit introduced the regression, bpo-27240:

    commit a87ba60
    Author: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
    Date: Sun Dec 3 16:46:23 2017 -0800

    bpo-27240 Rewrite the email header folding algorithm. (GH-3488) (bpo-4693)
    
    The original algorithm tried to delegate the folding to the tokens so
    that those tokens whose folding rules differed could specify the
    differences.  However, this resulted in a lot of duplicated code because
    most of the rules were the same.
    
    The new algorithm moves all folding logic into a set of functions
    external to the token classes, but puts the information about which
    tokens can be folded in which ways on the tokens...with the exception of
    mime-parameters, which are a special case (which was not even
    implemented in the old folder).
    
    This algorithm can still probably be improved and hopefully simplified
    somewhat.
    
    Note that some of the test expectations are changed.  I believe the
    changes are toward more desirable and consistent behavior: in general
    when (re) folding a line the canonical version of the tokens is
    generated, rather than preserving errors or extra whitespace.
    (cherry picked from commit 85d5c18c9d83a1d54eecc4c2ad4dce63194107c6)
    

    The first vulnerable release is Python 3.6.4: Python 3.6.3 and older are not affected by this vulnerability. So yes, I confirm that Python 2.7 and 3.5 are not vulnerable. By the way, a backport to 3.5 was requested but rejected :-)
    https://bugs.python.org/issue27240#msg330030

    I close the issue. Thanks Rad164 for the report and thanks Krzysztof Wojcik fo the fix!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes topic-email type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants