classification
Title: Also support SMTPUTF8 in smtplib's send_message method.
Type: enhancement Stage: resolved
Components: email Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: _savage, barry, maciej.szulik, matrixise, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2015-05-17 16:21 by r.david.murray, last changed 2018-08-17 23:03 by _savage. This issue is now closed.

Files
File name Uploaded Description Edit
send_message_smtputf8.patch r.david.murray, 2015-05-17 16:21 review
Messages (17)
msg243413 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-17 16:21
Now that I've committed issue 24211, we can also add SMTPUTF8 support to smptlib's send_message command.  See attached patch.
msg243425 - (view) Author: Maciej Szulik (maciej.szulik) * (Python triager) Date: 2015-05-17 19:03
David one small comment regarding typo in smtplib.py, but most importantly I'd suggest adding additional test case to cover the path (the newly added one) where you get a UnicodeEncodeError upon encoding from or to with UTF8 and then failing to find SMTPUTF8 on the server side. I see we already have test case to cover SMTPNotSupportedError but this covers just the case where the server does not have SMTPUTF8.
msg243426 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-17 19:05
Oh, right, that's what I get for doing this at the end of a long chain of patch reviews :).  I added that code after I'd written the test and forgot to go back and test it.
msg243433 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-05-17 23:38
New changeset 30795a477f85 by R David Murray in branch 'default':
#24218: Add SMTPUTF8 support to send_message.
https://hg.python.org/cpython/rev/30795a477f85
msg243435 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-17 23:45
Thanks Maciej.
msg322761 - (view) Author: Jens Troeger (_savage) * Date: 2018-07-31 12:21
I was about to open an issue when I found this one.

Consider an email message with the following:

message = EmailMessage()
message["From"] = Address(addr_spec="bar@foo.com", display_name="Jens Troeger")
message["To"] = Address(addr_spec="foo@bar.com", display_name="Martín Córdoba")

It’s important here that the email itself is `ascii` encodable, but the names are not. With that combination, send_message() falsely assumes plain text addresses (see https://github.com/python/cpython/blob/master/Lib/smtplib.py#L949 where it checks only email addresses, not display names!) and therefore the `international` flag stays False.

As a result of that, flattening the email object (https://github.com/python/cpython/blob/master/Lib/smtplib.py#L964) incorrectly inserts multiple linefeeds, thus breaking the email header, thus mangling the entire email:

flatmsg: b'From: Jens Troeger <jens@talaera.com>\r\nTo: Fernando =?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\r\r\r\r\nSubject:\r\n Confirmation: …\r\n…'

I think a proper fix would be in line 949, where email addresses and display names should be checked for encoding.

The comment to that function should also be adjusted to mention display names?

Note also that the attached patch does not test the above scenario, and should probably be extended as well.
msg322767 - (view) Author: Jens Troeger (_savage) * Date: 2018-07-31 12:52
(continuing the previous message msg322761)

…unless the addresses should be checked separately from the display names, in which case the BytesGenerator’s flatten() function should be fixed.

Without reading the RFC, please let me know how to continue from here.
msg322772 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-07-31 13:18
Well, posting on a closed issue is generally not the best way :)

The current behavior with regards to the SMTPUTF8 flag is correct (it only matters for *addresses*, display names can already be transmitted if they contain non-ascii using non SMTPUTF8 methods).

The multiple carriage returns is a bug, and there is an open issue for it, though I'm not finding it at the moment.
msg322773 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2018-07-31 13:20
Hi David,

What is the related issue with the new lines?

> On 31 Jul 2018, at 15:18, R. David Murray <report@bugs.python.org> wrote:
> 
> 
> R. David Murray <rdmurray@bitdance.com> added the comment:
> 
> Well, posting on a closed issue is generally not the best way :)
> 
> The current behavior with regards to the SMTPUTF8 flag is correct (it only matters for *addresses*, display names can already be transmitted if they contain non-ascii using non SMTPUTF8 methods).
> 
> The multiple carriage returns is a bug, and there is an open issue for it, though I'm not finding it at the moment.
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue24218>
> _______________________________________
> _______________________________________________
> Python-bugs-list mailing list
> Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/stephane%40wirtel.be
>
msg322789 - (view) Author: Jens Troeger (_savage) * Date: 2018-07-31 15:36
>  Well, posting on a closed issue is generally not the best way :)

Fair enough ;)

>  The multiple carriage returns is a bug, and there is an open issue for it, though I'm not finding it at the moment.

Oh good, yes that should be fixed!

My current workaround is setting `international = True` _always_, which prevents the multiple CRs. Not pretty but it works…
msg323044 - (view) Author: Jens Troeger (_savage) * Date: 2018-08-03 10:54
So that’s interesting. I thought that setting `international = True` (see line https://github.com/python/cpython/blob/master/Lib/smtplib.py#L947) would be a neat workaround, but the opposite.

When delivering those emails to Gmail I started seeing 

    Failed to send email: (555, b'5.5.2 Syntax error, goodbye. s53-v6sm1864855qts.5 - gsmtp', 'foo@bar.com')

and it turns out (according to the IETF message linter, https://tools.ietf.org/tools/msglint/) that:

-----------
UNKNOWN: unknown header 'User-Agent' at line 4
ERROR: missing mandatory header 'date' lines 1-7
ERROR: missing mandatory header 'return-path' lines 1-7
OK: found part text/plain line 9
WARNING: line 13 too long (109 chars); text/plain shouldn't need folding (RFC 
         2046-4.1.1)
WARNING: line 15 too long (124 chars); text/plain shouldn't need folding (RFC 
         2046-4.1.1)
WARNING: Character set mislabelled as 'utf-8' when 'us-ascii' suffices, body 
         part ending line 22
-----------

It seems that now “Date” and “Return-Path” header entries are missing when the email is generated.

I reverted the initial change. Any updates on the multiple CR problem when flattening?
msg323216 - (view) Author: Jens Troeger (_savage) * Date: 2018-08-06 18:51
David, I tried to find the mentioned '\r\r…\n' issue but I could not find it here. However, from an initial investigation into the BytesGenerator, here is what’s happening.

Flattening the body and attachments of the EmailMessage object works, and eventually _write_headers() is called to flatten the headers which happens entry by entry (https://github.com/python/cpython/blob/master/Lib/email/generator.py#L417-L418). Flattening a header entry is a recursive process over the parse tree of the entry, which builds the flattened and encoded final string by descending into the parse tree and encoding & concatenating the individual “parts” (tokens of the header entry).

Given the parse tree for a header entry like "Martín Córdoba <foo@bar.com>" eventually results in the correct flattened string:

    '=?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\n'

at the bottom of the recursion for this “Mailbox” part. The recursive callstack is then:

    _refold_parse_tree _header_value_parser.py:2687
    fold [Mailbox] _header_value_parser.py:144
    _refold_parse_tree _header_value_parser.py:2630
    fold [Address] _header_value_parser.py:144
    _refold_parse_tree _header_value_parser.py:2630
    fold [AddressList] _header_value_parser.py:144
    _refold_parse_tree _header_value_parser.py:2630
    fold [Header] _header_value_parser.py:144
    fold [_UniqueAddressHeader] headerregistry.py:258
    _fold [EmailPolicy] policy.py:205
    fold_binary [EmailPolicy] policy.py:199
    _write_headers [BytesGenerator] generator.py:418
    _write [BytesGenerator] generator.py:195

The problem now arises from the interplay of 

    # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2629
    encoded_part = part.fold(policy=policy)[:-1] # strip nl

which strips the '\n' from the returned string, and

    # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2686
    return policy.linesep.join(lines) + policy.linesep

which adds the policy’s line separation string linesep="\r\n" to the end of the flattened string upon unrolling the recursion.

I am not sure about a proper fix here, but considering that the linesep policy can be any string length (in this case len("\r\n") == 2) a fixed truncation of one character [:-1] seems wrong. Instead, using:

    encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] # strip nl

seems to work for entries with and without Unicode characters in their display names.

David, please advise on how to proceed from here.
msg323559 - (view) Author: Jens Troeger (_savage) * Date: 2018-08-15 09:05
@David, any thoughts on this?
msg323565 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-08-15 13:19
Sorry, I haven't had time to look at it yet :(  Not sure when I will, things are more than a bit busy for me right now.  Ping me again in two weeks if I haven't responded, please.  The proposed solution sounds reasonable, though, so you could also propose a PR with tests and that would take me less time to review and I might get to it sooner.
msg323634 - (view) Author: Jens Troeger (_savage) * Date: 2018-08-17 04:19
Thanks David: PR on Github (which is R/O) or where should I submit to?
msg323662 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-08-17 17:04
check out https://devguide.python.org.  (Basically, banch and generate a PR on github).  And please open a new issue for this.
msg323687 - (view) Author: Jens Troeger (_savage) * Date: 2018-08-17 23:03
New issue: https://bugs.python.org/issue34424
History
Date User Action Args
2018-08-17 23:03:08_savagesetmessages: + msg323687
2018-08-17 17:04:57r.david.murraysetmessages: + msg323662
2018-08-17 04:19:47_savagesetmessages: + msg323634
2018-08-15 13:19:30r.david.murraysetmessages: + msg323565
2018-08-15 09:05:56_savagesetmessages: + msg323559
2018-08-06 18:51:49_savagesetmessages: + msg323216
2018-08-03 10:54:08_savagesetmessages: + msg323044
2018-07-31 15:36:02_savagesetmessages: + msg322789
2018-07-31 13:20:20matrixisesetnosy: + matrixise
messages: + msg322773
2018-07-31 13:18:19r.david.murraysetmessages: + msg322772
2018-07-31 12:52:31_savagesetmessages: + msg322767
2018-07-31 12:21:42_savagesetnosy: + _savage
messages: + msg322761
2015-05-17 23:45:06r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg243435

stage: patch review -> resolved
2015-05-17 23:38:15python-devsetnosy: + python-dev
messages: + msg243433
2015-05-17 19:05:16r.david.murraysetmessages: + msg243426
2015-05-17 19:03:33maciej.szuliksetmessages: + msg243425
2015-05-17 18:03:30maciej.szuliksetnosy: + maciej.szulik
2015-05-17 16:23:19r.david.murraysetnosy: + barry
components: + email
2015-05-17 16:21:59r.david.murraycreate