classification
Title: Email address display name fails with both encoded words and special chars
Type: behavior Stage: patch review
Components: email Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, bsiem, maxking, r.david.murray
Priority: normal Keywords: patch

Created on 2019-07-02 11:51 by bsiem, last changed 2019-07-14 16:35 by bsiem.

Files
File name Uploaded Description Edit
email_header_test.py bsiem, 2019-07-02 11:51
Pull Requests
URL Status Linked Edit
PR 14561 open python-dev, 2019-07-02 18:51
Messages (5)
msg347136 - (view) Author: B Siemerink (bsiem) * Date: 2019-07-02 11:51
Special characters in email headers are normally put within double quotes. However, encoded words (=?charset?x?...?=) are not allowed withing double quotes. When the header contains a word with special characters and another word that must be encoded, the first one must also be encoded.

In the next example, The From header is quoted and therefore the comma is allowed; in the To header, the comma is not within quotes and not encoded, which is not allowed and rejected.

From: "Foo Bar, France" <foo@example.com>
To: Foo Bar, =?utf-8?q?Espa=C3=B1a?= <foo@example.com>
msg347628 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2019-07-10 17:57
FYI, it would have been most helpful if you had posted your example in the issue text instead of as an attached file, as it explains the problem better than your text does :)

Here is a minimal reproducer:

>>> m = EmailMessage(policy=strict)
>>> m['From'] = '"Foo Bar, España" <foo@example.com>'
>>> bytes(m)
b'From: Foo Bar, =?utf-8?q?Espa=C3=B1a?= <foo@example.com>\n\n'

This serialization of the header is, as you say, invalid.  Either the comma should be encoded, or the "Foo Bar," should be in quotes.
msg347634 - (view) Author: B Siemerink (bsiem) * Date: 2019-07-10 18:43
Hello David, thank you for the suggestion.

Regarding your comment:
> Either the comma should be encoded, or the "Foo Bar," should be in quotes.

According to RFC5322 the display name cannot contain both a quoted part and an encoded word, so the only option is to encode the comma.

Please let me know if I can do anything else.
msg347637 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2019-07-10 20:11
The display name is a phrase, and a phrase is a sequence of words, and a word is either a quoted string or an atom.  So it is legal to mix quoted strings and encoded words in a display name.  I'd vote to do whichever one is easier to implement :)  (I haven't looked at your PR yet and unfortunately my time is limited :(
msg347925 - (view) Author: B Siemerink (bsiem) * Date: 2019-07-14 16:35
Yes, you are right! The fix is to encode the special characters.
History
Date User Action Args
2019-07-14 16:35:42bsiemsetmessages: + msg347925
2019-07-10 20:11:31r.david.murraysetmessages: + msg347637
2019-07-10 18:43:38bsiemsetmessages: + msg347634
2019-07-10 17:57:37r.david.murraysetmessages: + msg347628
2019-07-02 18:55:38bsiemsettitle: Email header fails with both encoded words and special chars -> Email address display name fails with both encoded words and special chars
2019-07-02 18:51:56python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request14378
2019-07-02 15:22:56xtreaksetnosy: + maxking
2019-07-02 11:53:55SilentGhostsetversions: - Python 3.5, Python 3.6
2019-07-02 11:51:15bsiemcreate