classification
Title: Email message serialization enters an infinite loop when folding non-ASCII headers with long words
Type: Stage: patch review
Components: email Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: altvod, barry, ikrivosheev, r.david.murray
Priority: normal Keywords: patch

Created on 2018-07-25 12:56 by altvod, last changed 2019-01-30 15:47 by ikrivosheev.

Pull Requests
URL Status Linked Edit
PR 8990 open python-dev, 2018-08-29 11:06
Messages (2)
msg322351 - (view) Author: Grigory Statsenko (altvod) Date: 2018-07-25 12:56
(Discovered together with https://bugs.python.org/msg322348)

Email message serialization (in function _fold_as_ew) enters an infinite loop when folding non-ASCII headers whose words (after encoding) are longer than the given maxlen.

Besides being stuck in an infinite loop, it keeps appending to the `lines` list, so its memory usage keeps on growing also infinitely.
The code keeps appending encoded empty strings to the list like this:

lines: [
    'Subject: =?utf-8?q??=',
    ' =?utf-8?q??=',
    ' =?utf-8?q??=',
    ' =?utf-8?q??=',
    ' =?utf-8?q??=',
    ' =?utf-8?q??=',
    ' '
]
(and it keeps on growing)

Here is my code that can reproduce this issue (as a unittest):


import email.generator
import email.policy
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from unittest import TestCase


def create_message(subject, sender, recipients, body):
    msg = MIMEMultipart()
    msg.set_charset('utf-8')
    msg.policy = email.policy.SMTP
    msg.attach(MIMEText(body, 'html'))
    msg['Subject'] = subject
    msg['From'] = sender
    msg['To'] = ';'.join(recipients)
    return msg


class TestEmailMessage(TestCase):
    def _make_message(self, subject):
        return create_message(
            subject=subject, sender='me@site.com',
            recipients=['me@site.com'], body='Some text',
        )

    def test_ascii_message_with_len_limit(self):
        # very long subject consisting of a single word
        subject = 'Q' * 100
        msg = self._make_message(subject)
        self.assertTrue(msg.as_string(maxheaderlen=76))

    def test_non_ascii_message_with_len_limit(self):
        # very long subject consisting of a single word
        subject = 'Ц' * 100
        msg = self._make_message(subject)
        self.assertTrue(msg.as_string(maxheaderlen=76))


The ASCII test passes, but the non-ASCII one never finishes.

From what I can tell, the problem is in line 2728 of email/_header_value_parser.py:

            first_part = first_part[:-excess]

where `excess` is calculated from the encoded string
(which is several times longer than the original one),
but it truncates the original (non-encoded string).
The problem arises when `excess` is actually greater than `first_part`
So, it attempts to encode the exact same part of the header and fails in every iteration,
instead appending an empty string to the list and encoding it as ' =?utf-8?q??='

What this amounts to is that it's now practically impossible to send emails with non-ACSII subjects without either disregarding the RFC recommendations and requirements for line length or risking hangs and memory leaks.

Just like in https://bugs.python.org/msg322348, this behavior is new in Python 3.6. Also does not work in 3.7 and 3.8
msg334564 - (view) Author: Ivan Krivosheev (ikrivosheev) * Date: 2019-01-30 15:47
Hello Grigory. I using our patch in my project. I have some problems with your fixes.


Source text:
Subject: test Венесуэла собирается пересмотреть стоимость заключенных с Россией контрактов на поставку вооружений, а также отношения с Москвой в целом. Об этом заявил назначенный оппозицией специальный представитель Венесуэлы при Организации американских государств (ОАГ) Густаво Тарре Брисеньо на выступлении в вашингтонском Центре стратегических и международных исследований, передает

Encoded text using thunderbird:
Subject: =?UTF-8?B?dGVzdCDQktC10L3QtdGB0YPRjdC70LAg0YHQvtCx0LjRgNCw0LXRgtGB?=
 =?UTF-8?B?0Y8g0L/QtdGA0LXRgdC80L7RgtGA0LXRgtGMINGB0YLQvtC40LzQvtGB0YLRjCA=?=
 =?UTF-8?B?0LfQsNC60LvRjtGH0LXQvdC90YvRhSDRgSDQoNC+0YHRgdC40LXQuSDQutC+0L0=?=
 =?UTF-8?B?0YLRgNCw0LrRgtC+0LIg0L3QsCDQv9C+0YHRgtCw0LLQutGDINCy0L7QvtGA0YM=?=
 =?UTF-8?B?0LbQtdC90LjQuSwg0LAg0YLQsNC60LbQtSDQvtGC0L3QvtGI0LXQvdC40Y8g0YEg?=
 =?UTF-8?B?0JzQvtGB0LrQstC+0Lkg0LIg0YbQtdC70L7QvC4g0J7QsSDRjdGC0L7QvCDQt9Cw?=
 =?UTF-8?B?0Y/QstC40Lsg0L3QsNC30L3QsNGH0LXQvdC90YvQuSDQvtC/0L/QvtC30LjRhtC4?=
 =?UTF-8?B?0LXQuSDRgdC/0LXRhtC40LDQu9GM0L3Ri9C5INC/0YDQtdC00YHRgtCw0LLQuNGC?=
 =?UTF-8?B?0LXQu9GMINCS0LXQvdC10YHRg9GN0LvRiyDQv9GA0Lgg0J7RgNCz0LDQvdC40Lc=?=
 =?UTF-8?B?0LDRhtC40Lgg0LDQvNC10YDQuNC60LDQvdGB0LrQuNGFINCz0L7RgdGD0LTQsNGA?=
 =?UTF-8?B?0YHRgtCyICjQntCQ0JMpINCT0YPRgdGC0LDQstC+INCi0LDRgNGA0LUg0JHRgNC4?=
 =?UTF-8?B?0YHQtdC90YzQviDQvdCwINCy0YvRgdGC0YPQv9C70LXQvdC40Lgg0LIg0LLQsNGI?=
 =?UTF-8?B?0LjQvdCz0YLQvtC90YHQutC+0Lwg0KbQtdC90YLRgNC1INGB0YLRgNCw0YLQtdCz?=
 =?UTF-8?B?0LjRh9C10YHQutC40YUg0Lgg0LzQtdC20LTRg9C90LDRgNC+0LTQvdGL0YUg0Lg=?=
 =?UTF-8?B?0YHRgdC70LXQtNC+0LLQsNC90LjQuSwg0L/QtdGA0LXQtNCw0LXRgg==?=

Text after decode and encode in python with our patch:
Subject: test =?utf-8?b?0JLQtdC90LXRgdGD0Y3Qu9CwINGB0L7QsdC40YDQsNC10YLRgdGP?=
 =?utf-8?b?0L/=?utf-8?q?QtdGA0LXRgdC80L7RgtGA0LXRgtGM=3F=3D_=D1=81=D1=82?=
 =?utf-8?b?0L7QuNC80L7RgdGC0Ywg0LfQsNC60LvRjtGH0LXQvdC90YvRhSDRgSDQoNC+0YE=?=
 =?utf-8?b?0YHQuNC10Lkg0LrQvtC90YLRgNCw0LrRgtC+0LIg0L3QsCDQv9C+0YHRgtCw0LI=?=
 =?utf-8?b?0LrRgyDQstC+0L7RgNGD0LbQtdC90LjQuSwg0LAg0YLQsNC60LbQtSDQvtGC0L0=?=
 =?utf-8?b?0L7RiNC10L3QuNGPINGBINCc0L7RgdC60LLQvtC5INCyINGG0LXQu9C+0LwuINCe?=
 =?utf-8?b?0LEg0Y3RgtC+0Lwg0LfQsNGP0LLQuNC7INC90LDQt9C90LDRh9C10L3QvdGL0Lkg?=
 =?utf-8?b?0L7Qv9C/0L7Qt9C40YbQuNC10Lkg0YHQv9C10YbQuNCw0LvRjNC90YvQuSDQv9GA?=
 =?utf-8?b?0LXQtNGB0YLQsNCy0LjRgtC10LvRjCDQktC10L3QtdGB0YPRjdC70Ysg0L/RgNC4?=
  =?utf-8?b?0J7RgNCz0LDQvdC40LfQsNGG0LjQuCDQsNC80LXRgNC40LrQsNC90YHQutC40YU=?=
 =?utf-8?b?0LPQvtGB0YPQtNCw0YDRgdGC0LIgKNCe0JDQkykg0JPRg9GB0YLQsNCy0L4g0KI=?=
 =?utf-8?b?0LDRgNGA0LUg0JHRgNC40YHQtdC90YzQviDQvdCwINCy0YvRgdGC0YPQv9C70LU=?=
 =?utf-8?b?0L3QuNC4INCyINCy0LDRiNC40L3Qs9GC0L7QvdGB0LrQvtC8INCm0LXQvdGC0YA=?=
 =?utf-8?b?0LUg0YHRgtGA0LDRgtC10LPQuNGH0LXRgdC60LjRhSDQuCDQvNC10LbQtNGD0L0=?=
 =?utf-8?b?0LDRgNC+0LTQvdGL0YUg0LjRgdGB0LvQtdC00L7QstCw0L3QuNC5LCDQv9C10YA=?=
 =?utf-8?b?0LXQtNCw0LXRgg==?=

Result text:
Subject: test Венесуэла собирается
 =?utf-8?b?0L/QtdGA0LXRgdC80L7RgtGA0LXRgtGM?= стоимость заключенных с Россией контрактов на поставку вооружений, а также отношения с Москвой в целом. Об этом заявил назначенный оппозицией специальный представитель Венесуэлы приОрганизации американскихгосударств (ОАГ) Густаво Тарре Брисеньо на выступлении в вашингтонском Центре стратегических и международных исследований, передает

If need, i can write simple code for reproduce bug.
History
Date User Action Args
2019-01-30 15:47:56ikrivosheevsetnosy: + ikrivosheev
messages: + msg334564
2018-08-29 11:06:52python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request8463
2018-07-25 12:56:44altvodcreate