This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: EmailMessage bad encoding for international domain
Type: behavior Stage: needs patch
Components: email Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, drlazor8, iritkatriel, r.david.murray
Priority: normal Keywords: patch

Created on 2020-02-26 09:36 by drlazor8, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 18667 closed drlazor8, 2020-02-26 10:30
Messages (5)
msg362687 - (view) Author: Julien Castiaux (drlazor8) * Date: 2020-02-26 09:36
Affected python version: 3.5 and above (did test them all except 3.9)

Steps to reproduce:

  from mail.message import EmailMessage
  from mail.policy import SMTP

  msg = EmailMessage(policy=SMTP)
  msg['To'] = 'Joe <joe@examplé.com>'  # notice the é in the domain
  print(msg.as_string())

It prints

    To: "Joe <joe@=?utf-8?q?exampl=C3=A9?=.com>"

But it should be

    To: "Joe <joe@xn--exampl-gva.com>"

While b64/qp can be used to encode most non-ascii headers, the domain part of an email address is an exception. According to IDNA2008 (rfc5890 , rfc5891), non-ascii domain should be encoded using the punycode algorithm and the ACE prefix.
msg362801 - (view) Author: Julien Castiaux (drlazor8) * Date: 2020-02-27 14:15
Duplicate of https://bugs.python.org/issue39757
msg362802 - (view) Author: Julien Castiaux (drlazor8) * Date: 2020-02-27 14:17
Woops wrong copie/paste, here is the correct link: https://bugs.python.org/issue11783
msg362906 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2020-02-28 19:07
This is not actually a duplicate of 11783.  Rereading (parts of) that issue, we decided we currently have no good way to do automatic conversion between unicode and internationalized domains, so the user of the library has to do it themselves.  This means that the bug *here* is that the new email API is *wrongly* encoding the non-ascii in the domain by using an encoded word.  I'm surprised at that; I thought I'd guarded against it.

What should be happening here is that an error should be raised when that header is set (or possibly when it is accessed/serialized, but when set would be better I think) saying that there is non-ascii in the domain part.
msg408472 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-12-13 17:50
Reproduced on 3.11.
History
Date User Action Args
2022-04-11 14:59:27adminsetgithub: 83938
2021-12-13 17:50:29iritkatrielsetnosy: + iritkatriel

messages: + msg408472
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.5
2020-02-28 19:07:10r.david.murraysetstatus: closed -> open
title: EmailMessage wrong encoding for international domain -> EmailMessage bad encoding for international domain
superseder: email parseaddr and formataddr should be IDNA aware ->
messages: + msg362906

resolution: duplicate ->
stage: resolved -> needs patch
2020-02-27 15:55:53SilentGhostsetsuperseder: email parseaddr and formataddr should be IDNA aware
2020-02-27 14:17:13drlazor8setmessages: + msg362802
2020-02-27 14:15:37drlazor8setstatus: open -> closed
resolution: duplicate
messages: + msg362801

stage: patch review -> resolved
2020-02-26 10:30:28drlazor8setkeywords: + patch
stage: patch review
pull_requests: + pull_request18023
2020-02-26 09:36:15drlazor8create