Title: Email address display name fails with both encoded words and special chars
Type: behavior Stage: resolved
Components: email Versions: Python 3.9, Python 3.8, Python 3.7
Status: closed Resolution: fixed
Assigned To: Nosy List: barry, bsiem, maxking, miss-islington, ned.deily, r.david.murray
Priority: normal Keywords: patch

Created on 2019-07-02 11:51 by bsiem, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (10)
msg347136 - (view) Author: B Siemerink (bsiem) * Date: 2019-07-02 11:51
Special characters in email headers are normally put within double quotes. However, encoded words (=?charset?x?...?=) are not allowed withing double quotes. When the header contains a word with special characters and another word that must be encoded, the first one must also be encoded.

In the next example, The From header is quoted and therefore the comma is allowed; in the To header, the comma is not within quotes and not encoded, which is not allowed and rejected.

From: "Foo Bar, France" <>
To: Foo Bar, =?utf-8?q?Espa=C3=B1a?= <>
msg347628 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2019-07-10 17:57
FYI, it would have been most helpful if you had posted your example in the issue text instead of as an attached file, as it explains the problem better than your text does :)

Here is a minimal reproducer:

>>> m = EmailMessage(policy=strict)
>>> m['From'] = '"Foo Bar, España" <>'
>>> bytes(m)
b'From: Foo Bar, =?utf-8?q?Espa=C3=B1a?= <>\n\n'

This serialization of the header is, as you say, invalid.  Either the comma should be encoded, or the "Foo Bar," should be in quotes.
msg347634 - (view) Author: B Siemerink (bsiem) * Date: 2019-07-10 18:43
Hello David, thank you for the suggestion.

Regarding your comment:
> Either the comma should be encoded, or the "Foo Bar," should be in quotes.

According to RFC5322 the display name cannot contain both a quoted part and an encoded word, so the only option is to encode the comma.

Please let me know if I can do anything else.
msg347637 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2019-07-10 20:11
The display name is a phrase, and a phrase is a sequence of words, and a word is either a quoted string or an atom.  So it is legal to mix quoted strings and encoded words in a display name.  I'd vote to do whichever one is easier to implement :)  (I haven't looked at your PR yet and unfortunately my time is limited :(
msg347925 - (view) Author: B Siemerink (bsiem) * Date: 2019-07-14 16:35
Yes, you are right! The fix is to encode the special characters.
msg350128 - (view) Author: miss-islington (miss-islington) Date: 2019-08-21 23:00
New changeset df0c21ff46c5c37b6913828ef8c7651f523432f8 by Miss Islington (bot) (bsiem) in branch 'master':
bpo-37482: Fix email address name with encoded words and special chars (GH-14561)
msg350130 - (view) Author: miss-islington (miss-islington) Date: 2019-08-21 23:21
New changeset c5bba853d5e7836f6d4340e18721d3fb3a6ee0f7 by Miss Islington (bot) in branch '3.7':
bpo-37482: Fix email address name with encoded words and special chars (GH-14561)
msg350707 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-08-29 04:47
New changeset bd127b1b7dd50c76c4419d9c87c12901527d19da by Ned Deily (bsiem) in branch '3.8':
[3.8] bpo-37482: Fix email address name with encoded words and special chars (GH-14561) (GH-15380)
msg350711 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-08-29 04:58
I manually merged the stalled 3.8 backport to make 3.8.0b4.  Can this issue now be closed?
msg350736 - (view) Author: B Siemerink (bsiem) * Date: 2019-08-29 06:56
Thank you all!
