This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: smtplib does not handle Unicode characters
Type: enhancement Stage: resolved
Components: email Versions: Python 3.8
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: barry, jpatel, r.david.murray
Priority: normal Keywords:

Created on 2020-06-18 09:45 by jpatel, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
send_rawemail_demo.py jpatel, 2020-06-18 09:45
providing_Unicode_characters_in_email_body.png jpatel, 2020-06-18 09:50
providing_mail_options_in_sendmail.png jpatel, 2020-06-18 09:50
Messages (3)
msg371801 - (view) Author: Jay Patel (jpatel) Date: 2020-06-18 09:45
According to the user requirements, I need to send an email, which is provided as a raw email, i.e., the contents of email are provided in form of headers. To accomplish this I am using the methods provided in the "send_rawemail_demo.py" file (attached below).
The smtplib library works fine when providing only 'ascii' characters in the 'raw_email' variable. But, when I provide any Unicode characters either in the Subject or Body of the email, then the sendmail method of the smtplib library fails with the following message:
UnicodeEncodeError 'ascii' codec can't encode characters in position 123-124: ordinal not in range(128)
I tried providing the mail_options=["SMTPUTF-8"] in the sendmail method (On line no. 72 in the send_rawemail_demo.py file), but then it fails (even for the 'ascii' characters) with the exception as SMTPSenderRefused.
I have faced the same issue on Python 3.6. 
The sendmail method of the SMTP class encodes the message using 'ascii' as:
    if isinstance(msg, str):
        msg = _fix_eols(msg).encode('ascii')
The code works properly for Python 2 as the smtplib library for Python 2 does not have the above line and hence it allows Unicode characters in the Body and the Subject.
msg371803 - (view) Author: Jay Patel (jpatel) Date: 2020-06-18 09:50
Screenshot for the case, where only the 'raw_email' variable contains only 'ascii' characters.
msg371808 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2020-06-18 12:51
If you use the 'sendmail' function for sending, then it is entirely your responsibility to turn the email into "wire format".  Unicode is not wire format, but if you give sendmail a string that only has ascii in it it nicely converts it to binary for you.  But given that the email RFCs specify specific ways to indicate how non-ascii is encoded in the message, there is no way for the smtp library to know now to do that correctly when passed an arbitrary unicode string, so it doesn't try.  sendmail requires *you* do do the encoding to binary, indicating you at least think that you got the RFC parts right :)  In python2, strings are binary by default, so in that case you are handing sendmail binary format data (with the same assumption that you got the RFC parts right)...if you passed the python2 function a unicode string it would probably complain as well, although not in the same way.

If your raw email is RFC compliant, then you can do: sendmail(from, to, mymsg.encode()).

I see from your example that you are trying to use the email package to construct the email, which is good.  But, emails are *binary*, they are not unicode, so passing "message_from_string" a unicode string containing non-ascii isn't going to do what you are expecting, any more than passing unicode to the 'sendmail' function did.  message_from_string is really only useful for doing certain sorts of debug and ought to be deprecated.  Or produce a warning when handed a string containing non-ascii.  (There are historical reasons why it doesn't :(

And then you should use smtplib's 'sendmessage' function, which understands email package messages and will Do the Right Thing with them (including the extraction of the to and from addresses your code is currently doing).

However, even if you encoded your raw message to binary and then passed it to message_from_bytes, your example message is *not* RFC compliant: without MIME headers, an email with non-ascii characters in the body is technically in violation of the RFC.  Most email programs will handle that particular message despite that, but not all.  You are better off using the email package to construct a properly RFC formatted email,  using the new API (ex: msg = EmailMessage() (not Message), and then doing msg['from'] = address, etc, and msg.set_content(your unicode string body)). I can't really give you much advice here (nor should I, this being a bug tracker :) because I don't know how exactly how the data is coming in to your program in your real use case.

Once you have a properly constructed EmailMessage object, you should use smtplib's 'sendmessage' function, which understands email package messages and will Do the Right Thing with them (including the extraction of the to and from addresses your code is currently doing, as well as properly handling BCC, which means deleting BCC headers from the message before sending it, which your code does not do and which 'sendmail' would not do.)

SMTPUTF8 is about non-ascii in the email *headers*, and most SMTP servers these days do not yes support it[*]. Some of the big ones do, though (I believe gmail does).

[*] although that doesn't explain why what you got was SMTPSenderRefused.  You should have gotten SMTPNotSupportedError.
History
Date User Action Args
2022-04-11 14:59:32adminsetgithub: 85195
2020-06-29 10:24:36jpatelsetfiles: - providing_only_ascii_characters.png
2020-06-18 12:51:03r.david.murraysetstatus: open -> closed
resolution: works for me
messages: + msg371808

stage: resolved
2020-06-18 09:50:58jpatelsetfiles: + providing_mail_options_in_sendmail.png
2020-06-18 09:50:41jpatelsetfiles: + providing_Unicode_characters_in_email_body.png
2020-06-18 09:50:08jpatelsetfiles: + providing_only_ascii_characters.png

messages: + msg371803
2020-06-18 09:45:34jpatelcreate