classification
Title: [security] Infinite loop on folding email (_fold_as_ew()) if an header has no spaces
Type: security Stage: resolved
Components: email Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, maxking, ned.deily, r.david.murray, rad164, vstinner, xtreak
Priority: normal Keywords: patch

Created on 2018-05-16 00:12 by rad164, last changed 2019-06-18 08:31 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 7763 closed corona10, 2018-11-07 01:33
PR 12020 merged python-dev, 2019-02-24 17:49
PR 13321 merged miss-islington, 2019-05-14 16:55
PR 14162 merged vstinner, 2019-06-17 16:15
Messages (11)
msg316747 - (view) Author: Rad164 (rad164) Date: 2018-05-16 00:12
I just reported a bug about email folding at issue 33524, but this issue is more fatal in some languages like Chinese or Japanese, which does not insert spaces between each words.
Python 3.6.5 has this issue, while 3.6.4 does not.

Create an email with longer header than max_line_length set by its policy.  And the header contains non-ascii characters but no white spaces.
When try to fold it, python gets stuck and finally system hangs. There are no output unless I stop it with Ctrl-C.

^CTraceback (most recent call last):
  File "emailtest.py", line 7, in <module>
    policy.fold("Subject", msg["Subject"])
  File "/usr/lib/python3.6/email/policy.py", line 183, in fold
    return self._fold(name, value, refold_binary=True)
  File "/usr/lib/python3.6/email/policy.py", line 205, in _fold
    return value.fold(policy=self)
  File "/usr/lib/python3.6/email/headerregistry.py", line 258, in fold
    return header.fold(policy=policy)
  File "/usr/lib/python3.6/email/_header_value_parser.py", line 144, in fold
    return _refold_parse_tree(self, policy=policy)
  File "/usr/lib/python3.6/email/_header_value_parser.py", line 2651, in _refold_parse_tree
    part.ew_combine_allowed, charset)
  File "/usr/lib/python3.6/email/_header_value_parser.py", line 2735, in _fold_as_ew
    ew = _ew.encode(first_part)
  File "/usr/lib/python3.6/email/_encoded_words.py", line 215, in encode
    blen = _cte_encode_length['b'](bstring)
  File "/usr/lib/python3.6/email/_encoded_words.py", line 130, in len_b
    groups_of_3, leftover = divmod(len(bstring), 3)
KeyboardInterrupt


Code to reproduce:

from email.message import EmailMessage
from email.policy import default

policy = default # max_line_length = 78
msg = EmailMessage()
msg["Subject"] = "á"*100
policy.fold("Subject", msg["Subject"])


No problems in following cases:

1. If the header is shorter than max_line_length.
2. If the header can be split with spaces and the all chunk is shorter than max_line_length.
3. If the header is fully composed with ascii characters. In this case, there is no problem even if it is very long without spaces.
msg319807 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-06-17 07:47
I tried the test case on master branch. I ran the test case on 1GB RAM Linux based digitalocean droplet to have the script killed. Please find the results as below : 

# Python build

➜  cpython git:(master) ✗ ./python
Python 3.8.0a0 (heads/bpo33095-add-reference:9d49f85, Jun 17 2018, 07:22:33)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

# Test case 

➜  cpython git:(master) ✗ cat foo.py

from email.message import EmailMessage
from email.policy import default

policy = default # max_line_length = 78
msg = EmailMessage()
msg["Subject"] = "á"*100
policy.fold("Subject", msg["Subject"])

# Test case execution

➜  cpython git:(master) ✗ time ./python foo.py
[2]    13637 killed     ./python foo.py
./python foo.py  387.36s user 3.85s system 90% cpu 7:11.94 total

# I tried to do Ctrl + C after 2 minutes to stop and the stack trace is as below : 

➜  cpython git:(master) ✗ time ./python foo.py
^CTraceback (most recent call last):
  File "foo.py", line 7, in <module>
    policy.fold("Subject", msg["Subject"])
  File "/root/cpython/Lib/email/policy.py", line 183, in fold
    return self._fold(name, value, refold_binary=True)
  File "/root/cpython/Lib/email/policy.py", line 205, in _fold
    return value.fold(policy=self)
  File "/root/cpython/Lib/email/headerregistry.py", line 258, in fold
    return header.fold(policy=policy)
  File "/root/cpython/Lib/email/_header_value_parser.py", line 144, in fold
    return _refold_parse_tree(self, policy=policy)
  File "/root/cpython/Lib/email/_header_value_parser.py", line 2650, in _refold_parse_tree
    part.ew_combine_allowed, charset)
  File "/root/cpython/Lib/email/_header_value_parser.py", line 2728, in _fold_as_ew
    ew = _ew.encode(first_part, charset=encode_as)
  File "/root/cpython/Lib/email/_encoded_words.py", line 226, in encode
    qlen = _cte_encode_length['q'](bstring)
  File "/root/cpython/Lib/email/_encoded_words.py", line 93, in len_q
    return sum(len(_q_byte_map[x]) for x in bstring)
  File "/root/cpython/Lib/email/_encoded_words.py", line 93, in <genexpr>
    return sum(len(_q_byte_map[x]) for x in bstring)
KeyboardInterrupt
./python foo.py  131.41s user 0.43s system 98% cpu 2:13.89 total

Thanks
msg330925 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-03 10:17
Since it's a denial of service which can be triggered by an user, I mark this issue as a security issue.

I can be wrong, but it seems like Python 2.7 isn't affected: Lib/email/_header_value_parser.py was added by bpo-12586 (commit 0b6f6c82b51b7071d88f48abb3192bf3dc2a2d24). Python 2.7 doesn't have this file nor policies.
msg342487 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-14 16:55
New changeset c1f5667be1e3ec5871560c677402c1252c6018a6 by Victor Stinner (Krzysztof Wojcik) in branch 'master':
bpo-33529, email: Fix infinite loop in email header encoding (GH-12020)
https://github.com/python/cpython/commit/c1f5667be1e3ec5871560c677402c1252c6018a6
msg342512 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-14 20:12
New changeset 2fef5b01e36a17e36fd7e65c4b51f5ede8880dda by Victor Stinner (Miss Islington (bot)) in branch '3.7':
bpo-33529, email: Fix infinite loop in email header encoding (GH-12020) (GH-13321)
https://github.com/python/cpython/commit/2fef5b01e36a17e36fd7e65c4b51f5ede8880dda
msg344561 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 12:50
Python 3.6, 3.5 and 2.7 are still vulnerable. Is there someone interested to backport the fix?
msg345874 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-17 16:22
It's unclear to me if Python 3.5 is affected or not.

The fix changes the function _fold_as_ew(), Python 3.5 doesn't have this function *but* there is a call a _fold_as_ew() method!?

Lib/email/_header_value_parser.py:427: in _fold() method

            ...
            if is_ew or last_ew:
                # It's too big to fit on the line, but since we've
                # got encoded words we can use encoded word folding.
                part._fold_as_ew(folded)
                continue
            ...

If I backport the 2 tests, they fail *but* they don't hang forever (they complete in less than 1 second).

======================================================================
FAIL: test_fold_overlong_words_using_RFC2047 (test.test_email.test_headerregistry.TestFolding)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vstinner/prog/python/3.5/Lib/test/test_email/test_headerregistry.py", line 1601, in test_fold_overlong_words_using_RFC2047
    'X-Report-Abuse: =?utf-8?q?=3Chttps=3A//www=2Emailitapp=2E'
AssertionError: 'X-Report-Abuse: <https://www.mailitapp.com/report_abuse.p[50 chars]x>\n' != 'X-Report-Abuse: =?utf-8?q?=3Chttps=3A//www=2Emailitapp=2E[114 chars]?=\n'
- X-Report-Abuse: <https://www.mailitapp.com/report_abuse.php?mid=xxx-xxx-xxxxxxxxxxxxxxxxxxxxxxxx==-xxx-xx-xx>
+ X-Report-Abuse: =?utf-8?q?=3Chttps=3A//www=2Emailitapp=2Ecom/report=5Fabuse?=
+  =?utf-8?q?=2Ephp=3Fmid=3Dxxx-xxx-xxxxxxxxxxxxxxxxxxxxxxxx=3D=3D-xxx-xx-xx?=
+  =?utf-8?q?=3E?=


======================================================================
FAIL: test_non_ascii_chars_do_not_cause_inf_loop (test.test_email.test_policy.PolicyAPITests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vstinner/prog/python/3.5/Lib/test/test_email/test_policy.py", line 241, in test_non_ascii_chars_do_not_cause_inf_loop
    12 * ' =?utf-8?q?=C4=85?=\n')
AssertionError: 'Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?=\n' != 'Subject: \n =?utf-8?q?=C4=85?=\n =?utf-8?q?=C4=85?[209 chars]?=\n'
- Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?=
+ Subject: 
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
+  =?utf-8?q?=C4=85?=
msg345878 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-17 16:46
Python 3.5 is not vulnerable, it doesn't hang on the following code:

import email.policy
policy = email.policy.default.clone(max_line_length=20)
actual = policy.fold('Subject', '\u0105' * 12)
msg345879 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-17 16:50
Python 2.7 doesn't have email.policy module.

For Python 2.7, I wrote this code:
---
import email.header
import email.message

msg = email.message.Message()
msg.set_charset("UTF-8")
msg['Subject'] = email.header.Header(u'\u0105' * 12, maxlinelen=20, charset="UTF-8")
print(msg.as_string())
---

I get this output:
---
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Subject: =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=
 =?utf-8?b?xIU=?=


---

I have no idea if this example says that Python 2.7 is vulnerable or not. I get a different output on the master branch:
---
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?=


---

But I don't know if I use the email API properly. "Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?=" is longer than 20 characters.
msg345938 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-06-18 00:14
New changeset 516a6a254814d2bc6a90290dfc44d77fdfb4050b by Ned Deily (Victor Stinner) in branch '3.6':
bpo-33529, email: Fix infinite loop in email header encoding (GH-12020) (GH-14162)
https://github.com/python/cpython/commit/516a6a254814d2bc6a90290dfc44d77fdfb4050b
msg345960 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-18 08:31
Using git bisect, I found which commit introduced the regression, bpo-27240:

commit a87ba60fe56ae2ebe80ab9ada6d280a6a1f3d552
Author: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Date:   Sun Dec 3 16:46:23 2017 -0800

    bpo-27240 Rewrite the email header folding algorithm. (GH-3488) (#4693)
    
    The original algorithm tried to delegate the folding to the tokens so
    that those tokens whose folding rules differed could specify the
    differences.  However, this resulted in a lot of duplicated code because
    most of the rules were the same.
    
    The new algorithm moves all folding logic into a set of functions
    external to the token classes, but puts the information about which
    tokens can be folded in which ways on the tokens...with the exception of
    mime-parameters, which are a special case (which was not even
    implemented in the old folder).
    
    This algorithm can still probably be improved and hopefully simplified
    somewhat.
    
    Note that some of the test expectations are changed.  I believe the
    changes are toward more desirable and consistent behavior: in general
    when (re) folding a line the canonical version of the tokens is
    generated, rather than preserving errors or extra whitespace.
    (cherry picked from commit 85d5c18c9d83a1d54eecc4c2ad4dce63194107c6)

The first vulnerable release is Python 3.6.4: Python 3.6.3 and older are not affected by this vulnerability. So yes, I confirm that Python 2.7 and 3.5 are not vulnerable. By the way, a backport to 3.5 was requested but rejected :-)
https://bugs.python.org/issue27240#msg330030

I close the issue. Thanks Rad164 for the report and thanks Krzysztof Wojcik fo the fix!
History
Date User Action Args
2019-06-18 08:31:25vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg345960

stage: patch review -> resolved
2019-06-18 00:14:04ned.deilysetnosy: + ned.deily
messages: + msg345938
2019-06-17 16:56:10xtreaksetnosy: + maxking
2019-06-17 16:50:46vstinnersetmessages: + msg345879
2019-06-17 16:46:31vstinnersetmessages: + msg345878
2019-06-17 16:22:02vstinnersetmessages: + msg345874
2019-06-17 16:15:07vstinnersetpull_requests: + pull_request14004
2019-06-05 16:17:07cheryl.sabellalinkissue34222 superseder
2019-06-04 12:50:39vstinnersetmessages: + msg344561
2019-05-14 20:12:49vstinnersetmessages: + msg342512
2019-05-14 16:55:43miss-islingtonsetpull_requests: + pull_request13232
2019-05-14 16:55:27vstinnersetmessages: + msg342487
2019-02-24 17:49:12python-devsetpull_requests: + pull_request12052
2018-12-03 10:17:52vstinnersetnosy: + vstinner
title: Infinite loop on folding email if headers has no spaces -> [security] Infinite loop on folding email (_fold_as_ew()) if an header has no spaces
messages: + msg330925

versions: + Python 3.7, Python 3.8
type: behavior -> security
2018-11-07 01:33:13corona10setpull_requests: + pull_request9673
2018-06-17 17:10:26corona10setpull_requests: - pull_request7371
2018-06-17 11:35:30corona10setkeywords: + patch
stage: patch review
pull_requests: + pull_request7371
2018-06-17 07:47:24xtreaksetnosy: + xtreak
messages: + msg319807
2018-05-16 00:12:28rad164create