Title: Message from BytesParser cannot be flattened immediately
Type: behavior Stage:
Components: email Versions: Python 3.9
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, r.david.murray, vitas1
Priority: normal Keywords:

Created on 2021-07-21 11:06 by vitas1, last changed 2021-07-24 12:09 by vitas1.

File name Uploaded Description Edit
0.msg vitas1, 2021-07-24 12:09
Messages (2)
msg397937 - (view) Author: Vitas Ivanoff (vitas1) Date: 2021-07-21 11:06
Hello. Here is my code:
#Parse message from file and immediately flatten it
cur_policy = email.policy.SMTPUTF8
with open("/tmp/0.tmp", "rb") as orig_message_file:
    message_bytes =
message_parser = BytesParser(policy=cur_policy)
msg = message_parser.parsebytes(message_bytes)
with open("/tmp/1.tmp", "wb") as new_message_file:
    message_gen = BytesGenerator(new_message_file, policy=cur_policy)

On some messages script raises the following error:

Traceback (most recent call last):
  File "/misc/parsemail/./", line 34, in <module>
  File "/usr/lib/python3.9/email/", line 116, in flatten
  File "/usr/lib/python3.9/email/", line 199, in _write
  File "/usr/lib/python3.9/email/", line 422, in _write_headers
    self._fp.write(self.policy.fold_binary(h, v))
  File "/usr/lib/python3.9/email/", line 200, in fold_binary
    folded = self._fold(name, value, refold_binary=self.cte_type=='7bit')
  File "/usr/lib/python3.9/email/", line 214, in _fold
    return self.header_factory(name, ''.join(lines)).fold(policy=self)
  File "/usr/lib/python3.9/email/", line 257, in fold
    return header.fold(policy=policy)
  File "/usr/lib/python3.9/email/", line 156, in fold
    return _refold_parse_tree(self, policy=policy)
  File "/usr/lib/python3.9/email/", line 2825, in _refold_parse_tree
    last_ew = _fold_as_ew(tstr, lines, maxlen, last_ew,
  File "/usr/lib/python3.9/email/", line 2913, in _fold_as_ew
    encoded_word = _ew.encode(to_encode_word, charset=encode_as)
  File "/usr/lib/python3.9/email/", line 222, in encode
    bstring = string.encode('ascii', 'surrogateescape')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)

Policies 'default' and 'SMTP' are also affected. 

How to fix:

#For broken messages
message_gen = BytesGenerator(new_message_file, policy=cur_policy, maxheaderlen=0)

Well, but parsing and flattening the same *unmodified* message should be completed without using any additional parameters, isn't it?
msg398109 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2021-07-24 00:59
I suspect maxheaderlen=0 works because it causes the original lines to be re-emitted without any folding or other processing.  Without that, lines longer than the default max_line_length get refolded.

Can you provide an example of an input message that triggers this problem?
Date User Action Args
2021-07-24 12:09:28vitas1setfiles: + 0.msg
2021-07-24 00:59:42r.david.murraysetmessages: + msg398109
2021-07-21 11:06:27vitas1create