classification
Title: EmailMessage bad encoding for international domain
Type: behavior Stage: needs patch
Components: email Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Julien Castiaux, barry, r.david.murray
Priority: normal Keywords: patch

Created on 2020-02-26 09:36 by Julien Castiaux, last changed 2020-02-28 19:07 by r.david.murray.

Pull Requests
URL Status Linked Edit
PR 18667 closed Julien Castiaux, 2020-02-26 10:30
Messages (4)
msg362687 - (view) Author: Julien Castiaux (Julien Castiaux) * Date: 2020-02-26 09:36
Affected python version: 3.5 and above (did test them all except 3.9)

Steps to reproduce:

  from mail.message import EmailMessage
  from mail.policy import SMTP

  msg = EmailMessage(policy=SMTP)
  msg['To'] = 'Joe <joe@examplé.com>'  # notice the é in the domain
  print(msg.as_string())

It prints

    To: "Joe <joe@=?utf-8?q?exampl=C3=A9?=.com>"

But it should be

    To: "Joe <joe@xn--exampl-gva.com>"

While b64/qp can be used to encode most non-ascii headers, the domain part of an email address is an exception. According to IDNA2008 (rfc5890 , rfc5891), non-ascii domain should be encoded using the punycode algorithm and the ACE prefix.
msg362801 - (view) Author: Julien Castiaux (Julien Castiaux) * Date: 2020-02-27 14:15
Duplicate of https://bugs.python.org/issue39757
msg362802 - (view) Author: Julien Castiaux (Julien Castiaux) * Date: 2020-02-27 14:17
Woops wrong copie/paste, here is the correct link: https://bugs.python.org/issue11783
msg362906 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2020-02-28 19:07
This is not actually a duplicate of 11783.  Rereading (parts of) that issue, we decided we currently have no good way to do automatic conversion between unicode and internationalized domains, so the user of the library has to do it themselves.  This means that the bug *here* is that the new email API is *wrongly* encoding the non-ascii in the domain by using an encoded word.  I'm surprised at that; I thought I'd guarded against it.

What should be happening here is that an error should be raised when that header is set (or possibly when it is accessed/serialized, but when set would be better I think) saying that there is non-ascii in the domain part.
History
Date User Action Args
2020-02-28 19:07:10r.david.murraysetstatus: closed -> open
title: EmailMessage wrong encoding for international domain -> EmailMessage bad encoding for international domain
superseder: email parseaddr and formataddr should be IDNA aware ->
messages: + msg362906

resolution: duplicate ->
stage: resolved -> needs patch
2020-02-27 15:55:53SilentGhostsetsuperseder: email parseaddr and formataddr should be IDNA aware
2020-02-27 14:17:13Julien Castiauxsetmessages: + msg362802
2020-02-27 14:15:37Julien Castiauxsetstatus: open -> closed
resolution: duplicate
messages: + msg362801

stage: patch review -> resolved
2020-02-26 10:30:28Julien Castiauxsetkeywords: + patch
stage: patch review
pull_requests: + pull_request18023
2020-02-26 09:36:15Julien Castiauxcreate