classification
Title: Infinite loop on folding email if headers has no spaces
Type: behavior Stage: patch review
Components: email Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, r.david.murray, rad164, xtreak
Priority: normal Keywords: patch

Created on 2018-05-16 00:12 by rad164, last changed 2018-06-17 17:10 by corona10.

Messages (2)
msg316747 - (view) Author: Rad164 (rad164) Date: 2018-05-16 00:12
I just reported a bug about email folding at issue 33524, but this issue is more fatal in some languages like Chinese or Japanese, which does not insert spaces between each words.
Python 3.6.5 has this issue, while 3.6.4 does not.

Create an email with longer header than max_line_length set by its policy.  And the header contains non-ascii characters but no white spaces.
When try to fold it, python gets stuck and finally system hangs. There are no output unless I stop it with Ctrl-C.

^CTraceback (most recent call last):
  File "emailtest.py", line 7, in <module>
    policy.fold("Subject", msg["Subject"])
  File "/usr/lib/python3.6/email/policy.py", line 183, in fold
    return self._fold(name, value, refold_binary=True)
  File "/usr/lib/python3.6/email/policy.py", line 205, in _fold
    return value.fold(policy=self)
  File "/usr/lib/python3.6/email/headerregistry.py", line 258, in fold
    return header.fold(policy=policy)
  File "/usr/lib/python3.6/email/_header_value_parser.py", line 144, in fold
    return _refold_parse_tree(self, policy=policy)
  File "/usr/lib/python3.6/email/_header_value_parser.py", line 2651, in _refold_parse_tree
    part.ew_combine_allowed, charset)
  File "/usr/lib/python3.6/email/_header_value_parser.py", line 2735, in _fold_as_ew
    ew = _ew.encode(first_part)
  File "/usr/lib/python3.6/email/_encoded_words.py", line 215, in encode
    blen = _cte_encode_length['b'](bstring)
  File "/usr/lib/python3.6/email/_encoded_words.py", line 130, in len_b
    groups_of_3, leftover = divmod(len(bstring), 3)
KeyboardInterrupt


Code to reproduce:

from email.message import EmailMessage
from email.policy import default

policy = default # max_line_length = 78
msg = EmailMessage()
msg["Subject"] = "á"*100
policy.fold("Subject", msg["Subject"])


No problems in following cases:

1. If the header is shorter than max_line_length.
2. If the header can be split with spaces and the all chunk is shorter than max_line_length.
3. If the header is fully composed with ascii characters. In this case, there is no problem even if it is very long without spaces.
msg319807 - (view) Author: Karthikeyan Singaravelan (xtreak) * Date: 2018-06-17 07:47
I tried the test case on master branch. I ran the test case on 1GB RAM Linux based digitalocean droplet to have the script killed. Please find the results as below : 

# Python build

➜  cpython git:(master) ✗ ./python
Python 3.8.0a0 (heads/bpo33095-add-reference:9d49f85, Jun 17 2018, 07:22:33)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

# Test case 

➜  cpython git:(master) ✗ cat foo.py

from email.message import EmailMessage
from email.policy import default

policy = default # max_line_length = 78
msg = EmailMessage()
msg["Subject"] = "á"*100
policy.fold("Subject", msg["Subject"])

# Test case execution

➜  cpython git:(master) ✗ time ./python foo.py
[2]    13637 killed     ./python foo.py
./python foo.py  387.36s user 3.85s system 90% cpu 7:11.94 total

# I tried to do Ctrl + C after 2 minutes to stop and the stack trace is as below : 

➜  cpython git:(master) ✗ time ./python foo.py
^CTraceback (most recent call last):
  File "foo.py", line 7, in <module>
    policy.fold("Subject", msg["Subject"])
  File "/root/cpython/Lib/email/policy.py", line 183, in fold
    return self._fold(name, value, refold_binary=True)
  File "/root/cpython/Lib/email/policy.py", line 205, in _fold
    return value.fold(policy=self)
  File "/root/cpython/Lib/email/headerregistry.py", line 258, in fold
    return header.fold(policy=policy)
  File "/root/cpython/Lib/email/_header_value_parser.py", line 144, in fold
    return _refold_parse_tree(self, policy=policy)
  File "/root/cpython/Lib/email/_header_value_parser.py", line 2650, in _refold_parse_tree
    part.ew_combine_allowed, charset)
  File "/root/cpython/Lib/email/_header_value_parser.py", line 2728, in _fold_as_ew
    ew = _ew.encode(first_part, charset=encode_as)
  File "/root/cpython/Lib/email/_encoded_words.py", line 226, in encode
    qlen = _cte_encode_length['q'](bstring)
  File "/root/cpython/Lib/email/_encoded_words.py", line 93, in len_q
    return sum(len(_q_byte_map[x]) for x in bstring)
  File "/root/cpython/Lib/email/_encoded_words.py", line 93, in <genexpr>
    return sum(len(_q_byte_map[x]) for x in bstring)
KeyboardInterrupt
./python foo.py  131.41s user 0.43s system 98% cpu 2:13.89 total

Thanks
History
Date User Action Args
2018-06-17 17:10:26corona10setpull_requests: - pull_request7371
2018-06-17 11:35:30corona10setkeywords: + patch
stage: patch review
pull_requests: + pull_request7371
2018-06-17 07:47:24xtreaksetnosy: + xtreak
messages: + msg319807
2018-05-16 00:12:28rad164create