Message323216
David, I tried to find the mentioned '\r\r…\n' issue but I could not find it here. However, from an initial investigation into the BytesGenerator, here is what’s happening.
Flattening the body and attachments of the EmailMessage object works, and eventually _write_headers() is called to flatten the headers which happens entry by entry (https://github.com/python/cpython/blob/master/Lib/email/generator.py#L417-L418). Flattening a header entry is a recursive process over the parse tree of the entry, which builds the flattened and encoded final string by descending into the parse tree and encoding & concatenating the individual “parts” (tokens of the header entry).
Given the parse tree for a header entry like "Martín Córdoba <foo@bar.com>" eventually results in the correct flattened string:
'=?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\n'
at the bottom of the recursion for this “Mailbox” part. The recursive callstack is then:
_refold_parse_tree _header_value_parser.py:2687
fold [Mailbox] _header_value_parser.py:144
_refold_parse_tree _header_value_parser.py:2630
fold [Address] _header_value_parser.py:144
_refold_parse_tree _header_value_parser.py:2630
fold [AddressList] _header_value_parser.py:144
_refold_parse_tree _header_value_parser.py:2630
fold [Header] _header_value_parser.py:144
fold [_UniqueAddressHeader] headerregistry.py:258
_fold [EmailPolicy] policy.py:205
fold_binary [EmailPolicy] policy.py:199
_write_headers [BytesGenerator] generator.py:418
_write [BytesGenerator] generator.py:195
The problem now arises from the interplay of
# https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2629
encoded_part = part.fold(policy=policy)[:-1] # strip nl
which strips the '\n' from the returned string, and
# https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2686
return policy.linesep.join(lines) + policy.linesep
which adds the policy’s line separation string linesep="\r\n" to the end of the flattened string upon unrolling the recursion.
I am not sure about a proper fix here, but considering that the linesep policy can be any string length (in this case len("\r\n") == 2) a fixed truncation of one character [:-1] seems wrong. Instead, using:
encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] # strip nl
seems to work for entries with and without Unicode characters in their display names.
David, please advise on how to proceed from here. |
|
Date |
User |
Action |
Args |
2018-08-06 18:51:49 | _savage | set | recipients:
+ _savage, barry, r.david.murray, python-dev, maciej.szulik, matrixise |
2018-08-06 18:51:49 | _savage | set | messageid: <1533581509.38.0.56676864532.issue24218@psf.upfronthosting.co.za> |
2018-08-06 18:51:49 | _savage | link | issue24218 messages |
2018-08-06 18:51:49 | _savage | create | |
|