Issue 24218: Also support SMTPUTF8 in smtplib's send_message method.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68406

classification

Title:	Also support SMTPUTF8 in smtplib's send_message method.
Type:	enhancement	Stage:	resolved
Components:	email	Versions:	Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	_savage, barry, maciej.szulik, matrixise, python-dev, r.david.murray
Priority:	normal	Keywords:	patch

Created on 2015-05-17 16:21 by r.david.murray, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
send_message_smtputf8.patch	r.david.murray, 2015-05-17 16:21		review

Messages (17)
msg243413 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-05-17 16:21
Now that I've committed issue 24211, we can also add SMTPUTF8 support to smptlib's send_message command. See attached patch.
msg243425 - (view)	Author: Maciej Szulik (maciej.szulik) *	Date: 2015-05-17 19:03
David one small comment regarding typo in smtplib.py, but most importantly I'd suggest adding additional test case to cover the path (the newly added one) where you get a UnicodeEncodeError upon encoding from or to with UTF8 and then failing to find SMTPUTF8 on the server side. I see we already have test case to cover SMTPNotSupportedError but this covers just the case where the server does not have SMTPUTF8.
msg243426 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-05-17 19:05
Oh, right, that's what I get for doing this at the end of a long chain of patch reviews :). I added that code after I'd written the test and forgot to go back and test it.
msg243433 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-05-17 23:38
New changeset 30795a477f85 by R David Murray in branch 'default': #24218: Add SMTPUTF8 support to send_message. https://hg.python.org/cpython/rev/30795a477f85
msg243435 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-05-17 23:45
Thanks Maciej.
msg322761 - (view)	Author: Jens Troeger (_savage) *	Date: 2018-07-31 12:21
I was about to open an issue when I found this one. Consider an email message with the following: message = EmailMessage() message["From"] = Address(addr_spec="bar@foo.com", display_name="Jens Troeger") message["To"] = Address(addr_spec="foo@bar.com", display_name="Martín Córdoba") It’s important here that the email itself is `ascii` encodable, but the names are not. With that combination, send_message() falsely assumes plain text addresses (see https://github.com/python/cpython/blob/master/Lib/smtplib.py#L949 where it checks only email addresses, not display names!) and therefore the `international` flag stays False. As a result of that, flattening the email object (https://github.com/python/cpython/blob/master/Lib/smtplib.py#L964) incorrectly inserts multiple linefeeds, thus breaking the email header, thus mangling the entire email: flatmsg: b'From: Jens Troeger <jens@talaera.com>\r\nTo: Fernando =?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\r\r\r\r\nSubject:\r\n Confirmation: …\r\n…' I think a proper fix would be in line 949, where email addresses and display names should be checked for encoding. The comment to that function should also be adjusted to mention display names? Note also that the attached patch does not test the above scenario, and should probably be extended as well.
msg322767 - (view)	Author: Jens Troeger (_savage) *	Date: 2018-07-31 12:52
(continuing the previous message msg322761) …unless the addresses should be checked separately from the display names, in which case the BytesGenerator’s flatten() function should be fixed. Without reading the RFC, please let me know how to continue from here.
msg322772 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-07-31 13:18
Well, posting on a closed issue is generally not the best way :) The current behavior with regards to the SMTPUTF8 flag is correct (it only matters for addresses, display names can already be transmitted if they contain non-ascii using non SMTPUTF8 methods). The multiple carriage returns is a bug, and there is an open issue for it, though I'm not finding it at the moment.
msg322773 - (view)	Author: Stéphane Wirtel (matrixise) *	Date: 2018-07-31 13:20
Hi David, What is the related issue with the new lines? > On 31 Jul 2018, at 15:18, R. David Murray <report@bugs.python.org> wrote: > > > R. David Murray <rdmurray@bitdance.com> added the comment: > > Well, posting on a closed issue is generally not the best way :) > > The current behavior with regards to the SMTPUTF8 flag is correct (it only matters for addresses, display names can already be transmitted if they contain non-ascii using non SMTPUTF8 methods). > > The multiple carriage returns is a bug, and there is an open issue for it, though I'm not finding it at the moment. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue24218> > _______________________________________ > _______________________________________________ > Python-bugs-list mailing list > Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/stephane%40wirtel.be >
msg322789 - (view)	Author: Jens Troeger (_savage) *	Date: 2018-07-31 15:36
> Well, posting on a closed issue is generally not the best way :) Fair enough ;) > The multiple carriage returns is a bug, and there is an open issue for it, though I'm not finding it at the moment. Oh good, yes that should be fixed! My current workaround is setting `international = True` _always_, which prevents the multiple CRs. Not pretty but it works…
msg323044 - (view)	Author: Jens Troeger (_savage) *	Date: 2018-08-03 10:54
So that’s interesting. I thought that setting `international = True` (see line https://github.com/python/cpython/blob/master/Lib/smtplib.py#L947) would be a neat workaround, but the opposite. When delivering those emails to Gmail I started seeing Failed to send email: (555, b'5.5.2 Syntax error, goodbye. s53-v6sm1864855qts.5 - gsmtp', 'foo@bar.com') and it turns out (according to the IETF message linter, https://tools.ietf.org/tools/msglint/) that: ----------- UNKNOWN: unknown header 'User-Agent' at line 4 ERROR: missing mandatory header 'date' lines 1-7 ERROR: missing mandatory header 'return-path' lines 1-7 OK: found part text/plain line 9 WARNING: line 13 too long (109 chars); text/plain shouldn't need folding (RFC 2046-4.1.1) WARNING: line 15 too long (124 chars); text/plain shouldn't need folding (RFC 2046-4.1.1) WARNING: Character set mislabelled as 'utf-8' when 'us-ascii' suffices, body part ending line 22 ----------- It seems that now “Date” and “Return-Path” header entries are missing when the email is generated. I reverted the initial change. Any updates on the multiple CR problem when flattening?
msg323216 - (view)	Author: Jens Troeger (_savage) *	Date: 2018-08-06 18:51
David, I tried to find the mentioned '\r\r…\n' issue but I could not find it here. However, from an initial investigation into the BytesGenerator, here is what’s happening. Flattening the body and attachments of the EmailMessage object works, and eventually _write_headers() is called to flatten the headers which happens entry by entry (https://github.com/python/cpython/blob/master/Lib/email/generator.py#L417-L418). Flattening a header entry is a recursive process over the parse tree of the entry, which builds the flattened and encoded final string by descending into the parse tree and encoding & concatenating the individual “parts” (tokens of the header entry). Given the parse tree for a header entry like "Martín Córdoba <foo@bar.com>" eventually results in the correct flattened string: '=?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\n' at the bottom of the recursion for this “Mailbox” part. The recursive callstack is then: _refold_parse_tree _header_value_parser.py:2687 fold [Mailbox] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [Address] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [AddressList] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [Header] _header_value_parser.py:144 fold [_UniqueAddressHeader] headerregistry.py:258 _fold [EmailPolicy] policy.py:205 fold_binary [EmailPolicy] policy.py:199 _write_headers [BytesGenerator] generator.py:418 _write [BytesGenerator] generator.py:195 The problem now arises from the interplay of # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2629 encoded_part = part.fold(policy=policy)[:-1] # strip nl which strips the '\n' from the returned string, and # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2686 return policy.linesep.join(lines) + policy.linesep which adds the policy’s line separation string linesep="\r\n" to the end of the flattened string upon unrolling the recursion. I am not sure about a proper fix here, but considering that the linesep policy can be any string length (in this case len("\r\n") == 2) a fixed truncation of one character [:-1] seems wrong. Instead, using: encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] # strip nl seems to work for entries with and without Unicode characters in their display names. David, please advise on how to proceed from here.
msg323559 - (view)	Author: Jens Troeger (_savage) *	Date: 2018-08-15 09:05
@David, any thoughts on this?
msg323565 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-08-15 13:19
Sorry, I haven't had time to look at it yet :( Not sure when I will, things are more than a bit busy for me right now. Ping me again in two weeks if I haven't responded, please. The proposed solution sounds reasonable, though, so you could also propose a PR with tests and that would take me less time to review and I might get to it sooner.
msg323634 - (view)	Author: Jens Troeger (_savage) *	Date: 2018-08-17 04:19
Thanks David: PR on Github (which is R/O) or where should I submit to?
msg323662 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-08-17 17:04
check out https://devguide.python.org. (Basically, banch and generate a PR on github). And please open a new issue for this.
msg323687 - (view)	Author: Jens Troeger (_savage) *	Date: 2018-08-17 23:03
New issue: https://bugs.python.org/issue34424

History
Date	User	Action	Args
2022-04-11 14:58:17	admin	set	github: 68406
2018-08-17 23:03:08	_savage	set	messages: + msg323687
2018-08-17 17:04:57	r.david.murray	set	messages: + msg323662
2018-08-17 04:19:47	_savage	set	messages: + msg323634
2018-08-15 13:19:30	r.david.murray	set	messages: + msg323565
2018-08-15 09:05:56	_savage	set	messages: + msg323559
2018-08-06 18:51:49	_savage	set	messages: + msg323216
2018-08-03 10:54:08	_savage	set	messages: + msg323044
2018-07-31 15:36:02	_savage	set	messages: + msg322789
2018-07-31 13:20:20	matrixise	set	nosy: + matrixise messages: + msg322773
2018-07-31 13:18:19	r.david.murray	set	messages: + msg322772
2018-07-31 12:52:31	_savage	set	messages: + msg322767
2018-07-31 12:21:42	_savage	set	nosy: + _savage messages: + msg322761
2015-05-17 23:45:06	r.david.murray	set	status: open -> closed resolution: fixed messages: + msg243435 stage: patch review -> resolved
2015-05-17 23:38:15	python-dev	set	nosy: + python-dev messages: + msg243433
2015-05-17 19:05:16	r.david.murray	set	messages: + msg243426
2015-05-17 19:03:33	maciej.szulik	set	messages: + msg243425
2015-05-17 18:03:30	maciej.szulik	set	nosy: + maciej.szulik
2015-05-17 16:23:19	r.david.murray	set	nosy: + barry components: + email
2015-05-17 16:21:59	r.david.murray	create