classification
Title: Serialization of email message without header line length limit and a non-ASCII subject fails with TypeError
Type: behavior Stage: resolved
Components: email Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: non-ascii characters in headers causes TypeError on email.policy.Policy.fold
View: 33524
Assigned To: Nosy List: altvod, barry, r.david.murray, xtreak
Priority: normal Keywords:

Created on 2018-07-25 12:19 by altvod, last changed 2018-07-25 14:51 by r.david.murray. This issue is now closed.

Messages (3)
msg322348 - (view) Author: Grigory Statsenko (altvod) Date: 2018-07-25 12:19
I have the following code that creates a simple email message with a) a pure-ASCII subject, b) non-ASCII subject
(I made it into a unittest):


import email.generator
import email.policy
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from unittest import TestCase


def create_message(subject, sender, recipients, body):
    msg = MIMEMultipart()
    msg.set_charset('utf-8')
    msg.policy = email.policy.SMTP
    msg.attach(MIMEText(body, 'html'))
    msg['Subject'] = subject
    msg['From'] = sender
    msg['To'] = ';'.join(recipients)
    return msg

class TestEmailMessage(TestCase):
    def _make_message(self, subject):
        return create_message(
            subject=subject, sender='me@site.com',
            recipients=['me@site.com'], body='Some text',
        )

    def test_ascii_message_no_len_limit(self):
        # very long subject consisting of a single word
        subject = 'Q' * 100
        msg = self._make_message(subject)
        self.assertTrue(str(msg))

    def test_non_ascii_message_no_len_limit(self):
        # very long subject consisting of a single word
        subject = 'Ц' * 100
        msg = self._make_message(subject)
        self.assertTrue(str(msg))


The ASCII one passes, while the non-ASCII version fails with the following exception:

Traceback (most recent call last):
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/unittest/case.py", line 605, in run
    testMethod()
  File "/home/grigory/PycharmProjects/smtptest/test_message.py", line 36, in test_non_ascii_message_no_len_limit
    self.assertTrue(str(msg))
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/message.py", line 135, in __str__
    return self.as_string()
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/message.py", line 158, in as_string
    g.flatten(self, unixfrom=unixfrom)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/generator.py", line 116, in flatten
    self._write(msg)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/generator.py", line 195, in _write
    self._write_headers(msg)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/generator.py", line 222, in _write_headers
    self.write(self.policy.fold(h, v))
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/policy.py", line 183, in fold
    return self._fold(name, value, refold_binary=True)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/policy.py", line 205, in _fold
    return value.fold(policy=self)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/headerregistry.py", line 258, in fold
    return header.fold(policy=policy)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/_header_value_parser.py", line 144, in fold
    return _refold_parse_tree(self, policy=policy)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/_header_value_parser.py", line 2645, in _refold_parse_tree
    part.ew_combine_allowed, charset)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/_header_value_parser.py", line 2722, in _fold_as_ew
    first_part = to_encode[:text_space]
TypeError: slice indices must be integers or None or have an __index__ method


The problem is that _fold_as_ew treats maxlen as an integer, but it can also have inf and None as valid values. In my case it's inf, but None can also get there if the HTTP email policy is used and its max_line_length value is not overridden when serializing.
I am supposing that the correct behavior in both of these cases should be no wrapping at all. And/or maybe one of these (inf & None) should be converted to the other at some point, so only one special case has to handled in the low-level code

This behavior is new in Python 3.6. It works in 3.5.
Also fails in 3.7 and 3.8
msg322356 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-07-25 14:22
I took all the commits made to Lib/email from 3.5 to latest of 3.6 branch with `git log --oneline --format="%h" upstream/3.5..upstream/3.6 Lib/email > commits.txt`

I could see the test fails with a87ba60 and passes with d94ef8f. Probably something to do with a87ba60fe56ae2ebe80ab9ada6d280a6a1f3d552 that had a rewrite the email header folding algorithm as I can see from the issue https://bugs.python.org/issue27240

cpython git:(master) ✗ ./python
Python 3.8.0a0 (heads/bpo34193-dirty:bfdde5a, Jul 25 2018, 07:51:50)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

# commit a87ba60

cpython git:(master) $ git checkout a87ba60 Lib/email && ./python -m unittest bpo34220.py && git reset --quiet HEAD . && git checkout .
.E
======================================================================
ERROR: test_non_ascii_message_no_len_limit (bpo34220.TestEmailMessage)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/cpython/bpo34220.py", line 35, in test_non_ascii_message_no_len_limit
    self.assertTrue(str(msg))
  File "/home/cpython/Lib/email/message.py", line 135, in __str__
    return self.as_string()
  File "/home/cpython/Lib/email/message.py", line 158, in as_string
    g.flatten(self, unixfrom=unixfrom)
  File "/home/cpython/Lib/email/generator.py", line 116, in flatten
    self._write(msg)
  File "/home/cpython/Lib/email/generator.py", line 195, in _write
    self._write_headers(msg)
  File "/home/cpython/Lib/email/generator.py", line 222, in _write_headers
    self.write(self.policy.fold(h, v))
  File "/home/cpython/Lib/email/policy.py", line 183, in fold
    return self._fold(name, value, refold_binary=True)
  File "/home/cpython/Lib/email/policy.py", line 205, in _fold
    return value.fold(policy=self)
  File "/home/cpython/Lib/email/headerregistry.py", line 258, in fold
    return header.fold(policy=policy)
  File "/home/cpython/Lib/email/_header_value_parser.py", line 144, in fold
    return _refold_parse_tree(self, policy=policy)
  File "/home/cpython/Lib/email/_header_value_parser.py", line 2645, in _refold_parse_tree
    part.ew_combine_allowed, charset)
  File "/home/cpython/Lib/email/_header_value_parser.py", line 2722, in _fold_as_ew
    first_part = to_encode[:text_space]
TypeError: slice indices must be integers or None or have an __index__ method

----------------------------------------------------------------------
Ran 2 tests in 0.022s

FAILED (errors=1)

# commit d94ef8f
  
cpython git:(master) $ git checkout d94ef8f Lib/email && ./python -m unittest bpo34220.py && git reset --quiet HEAD . && git checkout .
..
----------------------------------------------------------------------
Ran 2 tests in 0.017s

OK


Hope I am correct on the above approach and there are no C code related changes that need to be made to recompile Python.

Thanks
msg322360 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-07-25 14:51
Thanks for the report.  This is a duplicate of #33524.
History
Date User Action Args
2018-07-25 14:51:04r.david.murraysetstatus: open -> closed
superseder: non-ascii characters in headers causes TypeError on email.policy.Policy.fold
messages: + msg322360

resolution: duplicate
stage: resolved
2018-07-25 14:22:17xtreaksetmessages: + msg322356
2018-07-25 12:56:19xtreaksetnosy: + xtreak
2018-07-25 12:19:20altvodcreate