This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.headerregistry.Address blocks Unicode local part addr_spec accepted elsewhere
Type: Stage:
Components: email Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, dracos, r.david.murray
Priority: normal Keywords:

Created on 2019-05-12 11:06 by dracos, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg342254 - (view) Author: Matthew (dracos) * Date: 2019-05-12 11:06
The parser for passing an addr_spec to email.headerregistry.Address does not allow non-ASCII local parts, but the rest of the email package handles them fine, either straight (with explicit references to RFC6532 and SMTPUTF8), or encoding as expected. Apologies if I've misunderstood something.

>>> from email.message import EmailMessage
>>> msg = EmailMessage()
>>> msg['To'] = 'Matthéw <aé@example.com>'
>>> msg.as_string()
'To: =?utf-8?q?Matth=C3=A9w?= <=?utf-8?q?a=C3=A9?=@example.com>\n\n'
>>> msg['To'].addresses[0]
Address(display_name='Matthéw', username='aé', domain='example.com')
>>> msg['To'].addresses[0].addr_spec
'aé@example.com'
>>> email.headerregistry.Address(addr_spec=msg['To'].addresses[0].addr_spec)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py", line 48, in __init__
    raise a_s.all_defects[0]
email.errors.NonASCIILocalPartDefect: local-part contains non-ASCII characters)
>>>
msg342256 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2019-05-12 13:10
In order to legitimately have a non-ascii localpart, you *must* be using RFC6532 and RFC6531.  In the email package you do this by using policy=SMTPUTF8, or setting utf8=True in your custom Policy.  In smtplib you do this by specifying smtputf8 in the mail_options list to sendmail, or passing a message with a policy that has utf8=True to send_message.

I notice in answering this report that this is not really documented clearly.  The information is there, but only if you already know how the RFCs work.  Some variation of the text above should be added to the smtplib documentation, and an example of using SMTPUTF8 should be added to the email examples chapter.

However, you are correct, there are couple of bugs here.

The rendering done by as_string (and as_bytes) is the best that we can do without raising an error...but we should probably be raising an error if the rendering policy does not have utf8=True and we don't have an "original source line" from parsing a message (which is the case here), rather than using the incorrect RFC2047 encoding.

The second bug, the one you are reporting, is that we apparently missed the constructor of Address when we were adding RFC6532 support.  If you look at the comment above that code, it is purposefully trying to raise an error if the addr_spec is invalid and it was provided by the *application* (as opposed to email.Parser).  But with RFC6532 support, it should be valid to have a local part that has non-ascii in an Address, and the error, as I noted above, should be raised only at serialization time and when we don't have an original source string.  So that raise should be modified to explicitly ignore the NonASCIILocalPartDefect.  (Really, Address should take a policy argument.  That's a bigger change, but it would be the "right way" to fix this.)

Raising the error on serialization could cause some breakage if existing programs are "getting away" with specifying non-ascii local parts but not doing it via addr_spec.  It is breakage that should happen, I think, but we may want to only do it in a feature release.
History
Date User Action Args
2022-04-11 14:59:15adminsetgithub: 81074
2019-05-12 13:10:20r.david.murraysetmessages: + msg342256
2019-05-12 11:06:18dracoscreate