New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Also support SMTPUTF8 in smtplib's send_message method. #68406
Comments
Now that I've committed bpo-24211, we can also add SMTPUTF8 support to smptlib's send_message command. See attached patch. |
David one small comment regarding typo in smtplib.py, but most importantly I'd suggest adding additional test case to cover the path (the newly added one) where you get a UnicodeEncodeError upon encoding from or to with UTF8 and then failing to find SMTPUTF8 on the server side. I see we already have test case to cover SMTPNotSupportedError but this covers just the case where the server does not have SMTPUTF8. |
Oh, right, that's what I get for doing this at the end of a long chain of patch reviews :). I added that code after I'd written the test and forgot to go back and test it. |
New changeset 30795a477f85 by R David Murray in branch 'default': |
Thanks Maciej. |
I was about to open an issue when I found this one. Consider an email message with the following: message = EmailMessage()
message["From"] = Address(addr_spec="bar@foo.com", display_name="Jens Troeger")
message["To"] = Address(addr_spec="foo@bar.com", display_name="Martín Córdoba") It’s important here that the email itself is As a result of that, flattening the email object (https://github.com/python/cpython/blob/master/Lib/smtplib.py#L964) incorrectly inserts multiple linefeeds, thus breaking the email header, thus mangling the entire email: flatmsg: b'From: Jens Troeger <jens@talaera.com>\r\nTo: Fernando =?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\r\r\r\r\nSubject:\r\n Confirmation: …\r\n…' I think a proper fix would be in line 949, where email addresses and display names should be checked for encoding. The comment to that function should also be adjusted to mention display names? Note also that the attached patch does not test the above scenario, and should probably be extended as well. |
(continuing the previous message msg322761) …unless the addresses should be checked separately from the display names, in which case the BytesGenerator’s flatten() function should be fixed. Without reading the RFC, please let me know how to continue from here. |
Well, posting on a closed issue is generally not the best way :) The current behavior with regards to the SMTPUTF8 flag is correct (it only matters for *addresses*, display names can already be transmitted if they contain non-ascii using non SMTPUTF8 methods). The multiple carriage returns is a bug, and there is an open issue for it, though I'm not finding it at the moment. |
Hi David, What is the related issue with the new lines?
|
Fair enough ;)
Oh good, yes that should be fixed! My current workaround is setting |
So that’s interesting. I thought that setting When delivering those emails to Gmail I started seeing
and it turns out (according to the IETF message linter, https://tools.ietf.org/tools/msglint/) that: ----------- It seems that now “Date” and “Return-Path” header entries are missing when the email is generated. I reverted the initial change. Any updates on the multiple CR problem when flattening? |
David, I tried to find the mentioned '\r\r…\n' issue but I could not find it here. However, from an initial investigation into the BytesGenerator, here is what’s happening. Flattening the body and attachments of the EmailMessage object works, and eventually _write_headers() is called to flatten the headers which happens entry by entry (https://github.com/python/cpython/blob/master/Lib/email/generator.py#L417-L418). Flattening a header entry is a recursive process over the parse tree of the entry, which builds the flattened and encoded final string by descending into the parse tree and encoding & concatenating the individual “parts” (tokens of the header entry). Given the parse tree for a header entry like "Martín Córdoba <foo@bar.com>" eventually results in the correct flattened string:
at the bottom of the recursion for this “Mailbox” part. The recursive callstack is then:
The problem now arises from the interplay of # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2629
encoded_part = part.fold(policy=policy)[:-1] # strip nl which strips the '\n' from the returned string, and
which adds the policy’s line separation string linesep="\r\n" to the end of the flattened string upon unrolling the recursion. I am not sure about a proper fix here, but considering that the linesep policy can be any string length (in this case len("\r\n") == 2) a fixed truncation of one character [:-1] seems wrong. Instead, using: encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] # strip nl seems to work for entries with and without Unicode characters in their display names. David, please advise on how to proceed from here. |
@david, any thoughts on this? |
Sorry, I haven't had time to look at it yet :( Not sure when I will, things are more than a bit busy for me right now. Ping me again in two weeks if I haven't responded, please. The proposed solution sounds reasonable, though, so you could also propose a PR with tests and that would take me less time to review and I might get to it sooner. |
Thanks David: PR on Github (which is R/O) or where should I submit to? |
check out https://devguide.python.org. (Basically, banch and generate a PR on github). And please open a new issue for this. |
New issue: https://bugs.python.org/issue34424 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: