This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: EmailMessage may need to support RFC-non-compliant MIME parameter encoding (encoded words in quotes) for output.
Type: enhancement Stage: needs patch
Components: email Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, dorosch, hwgdb Smith, r.david.murray
Priority: normal Keywords:

Created on 2020-02-27 07:12 by hwgdb Smith, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
email bug.rar hwgdb Smith, 2020-02-27 14:43
foxmail_screenshot.jpeg hwgdb Smith, 2020-02-28 01:21
outlook_screenshot.jpeg hwgdb Smith, 2020-02-28 01:23
Messages (17)
msg362780 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-27 07:12
here is the partial code:
    msg = EmailMessage()
    file_name = "超e保3000P.csv"
    ctype, encoding = mimetypes.guess_type(file_name)
    if ctype is None or encoding is not None:
        ctype = "application/octet-stream"
    maintype, subtype = ctype.split("/", 1)

    with open(file_name, "rb") as f:
        msg.add_attachment(f.read(), maintype=maintype, subtype=subtype, filename=("GBK", "", f"{file_name}"))


The file has non-ascii characters name, so I use the three tuple filename with encode GBK, but msg.as_string() doesn't change. 
print(msg.as_string()) I find the filename is  'filename*=utf-8\'\'%E8%B6 ......'. The encoding is not correct. And of course, after send the message, I saw the attached file's filename displayed incorrect on my mail client or web mail.
But when i use the legacy API, and using the Header class to generate the filename, it works.
msg362781 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-27 07:24
"but msg.as_string() doesn't change. " , I mean using 

  filename=file_name  
or
  filename=("GBK", "", f"{file_name}")
or
  filename=("utf-8", "", f"{file_name}")

msg.as_string() doesn't change.
msg362792 - (view) Author: Andrei Daraschenka (dorosch) * Date: 2020-02-27 11:17
Hello, could you please attach minimal-work file for reproduce it?
msg362804 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-27 14:43
I have uploaded just now. Thank you.
msg362805 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2020-02-27 14:48
I think you are saying that you want the charset in the encoded filename to be GBK rather than utf-8?  utf-8 should certainly display correctly in your email client, though, so if it is not there is something else going wrong.  

As far as the 3 tuple not working to set the charset...I believe what is happening there is that a header created by the application gets "refolded" on serialization, and refolding doesn't keep the existing charset, it converts everything to utf-8.  This is an intentional part of the design: the library handles the gory details of MIME and uses utf-8 as the charset for application created content.  It is actually an accident of the implementation that the tuple form of the filename is even accepted; you will note that it is *not* documented in the contentmanager docs.

It wouldn't be crazy to ask for this as a feature, and it could even be treated as a bug that it doesn't work if we want to, but it may not be easy to "fix", because it goes against the design philosophy of the new API.
msg362806 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2020-02-27 14:57
Actually, given that the contentmanager does accept a charset parameter for text content, it does seem reasonable to treat this as a bug.  But as I said fixing it may not be trivial.
msg362808 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-27 15:31
Using utf-8 doesn't display correctly on my mail client.
So i thought it might work using GBK, and I try to change the Content-Disposition filename using GBK.
And just now, I print the legacy Api MIMEMultipart.as_string(), I found it using utf-8 too. The difference is 
legacy api:  filename="=?utf-8?b?6LaFZeS/nTMwMDBQLmNzdg==?="   
EmailMessage: filename*=utf-8''.%2F%E8%B6%85e%E4%BF%9D3000P.csv

So it is not the encoding cause the display error. But I still don't know why? Base64?
msg362814 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-27 16:09
Why there are two different representations of the same file name?  It displays incorrectly when use the EmailMessage API filename representation.
msg362836 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2020-02-27 20:14
The legacy API appears to be using an RFC-incorrect (but common) encoded-word encoding, while the new API is using the RFC-compliant MIME-parameter encoding (% encoding).  Which email client are you using?
msg362857 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-28 01:21
Microsoft outlook 20116 MSO(16.0.4266.10001) x64
Foxmail 7.2 (build 7.026)

I use these two email client. All display incorrectly. And I have uploaded the screenshot.
msg362858 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-28 01:23
Microsoft outlook 2016 MSO(16.0.4266.10001) x64
msg362903 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2020-02-28 18:47
Since Outlook is one of the mailers that generates the non-RFC-compliant headers, it doesn't surprise me all that much that it can't interpret the RFC compliant headers correctly.

I'm not sure there is anything we can do here.

I suppose someone could do a survey of mail clients and document which ones can handle which style of parameter encoding.  If it turns out more handle the "wrong" way than handle the "right" way, we could consider adopting to the de-facto standard, although I won't like it much :)

(There is also a possibility there is a bug in our RFC compliance, but this is the first problem report I've seen.)
msg362921 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-28 22:04
I think program's goal is to solve problem, not solve the "standard".

OK, if you insist that "standard" has the Top priority, could you please tell me a way to change the default behavior of the new api to use the "=?utf-8?b?" parameter style. Is there a function or parameter i can use to achieve this?

If not, i think the best way to solve it is to add a "param style" parameter that i can choose which style i use.

And if not, i am sad about this, i had to use the legacy api.
msg362922 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-28 22:20
https://litmus.com/blog/infographic-the-2019-email-client-market-share

And there is a survey about email client market share. You see outlook is top 3.
msg362924 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-28 22:24
And i just send a mail to my Gmail. I view it using web, it is incorrectly!
msg362927 - (view) Author: hwgdb Smith (hwgdb Smith) Date: 2020-02-28 22:27
Sorry, the Gmail web is correctly.
msg362991 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2020-02-29 16:54
I actually agree: if most (by market share) MUAs handle the RFC-incorrect parameter encoding style, and a significant portion does not handle the RFC correct style, then we should support the de-facto standard rather than the official standard as the default.  I just wish Microsoft would write better software :)  If on the other hand it is only microsoft out of the big market share players that is broken, I'm not sure I'd want it to be the default.  But we could still support it optionally.

So yeah, we could have a policy control that governs which one is actually used.

So this is a feature request, and ideally should be supported by an investigation of what MUAs support what, by market share.  And there's another question: does this only affect the filename parameter, or is it all MIME parameters?  I would expect it to be the latter, but someone should check at least a few examples of that to be sure.
History
Date User Action Args
2022-04-11 14:59:27adminsetgithub: 83952
2020-02-29 16:54:25r.david.murraysettype: behavior -> enhancement
title: EmailMessage.add_header doesn't work -> EmailMessage may need to support RFC-non-compliant MIME parameter encoding (encoded words in quotes) for output.
messages: + msg362991
stage: needs patch
2020-02-28 22:27:47hwgdb Smithsetmessages: + msg362927
2020-02-28 22:24:46hwgdb Smithsetmessages: + msg362924
2020-02-28 22:20:42hwgdb Smithsetmessages: + msg362922
2020-02-28 22:04:15hwgdb Smithsetmessages: + msg362921
2020-02-28 18:47:55r.david.murraysetmessages: + msg362903
2020-02-28 01:23:09hwgdb Smithsetfiles: + outlook_screenshot.jpeg

messages: + msg362858
2020-02-28 01:21:18hwgdb Smithsetfiles: + foxmail_screenshot.jpeg

messages: + msg362857
2020-02-27 20:14:33r.david.murraysetmessages: + msg362836
2020-02-27 16:09:44hwgdb Smithsetmessages: + msg362814
2020-02-27 15:31:58hwgdb Smithsetmessages: + msg362808
2020-02-27 14:57:04r.david.murraysetmessages: + msg362806
2020-02-27 14:48:42r.david.murraysetmessages: + msg362805
2020-02-27 14:43:56hwgdb Smithsetfiles: + email bug.rar

messages: + msg362804
2020-02-27 11:17:32doroschsetnosy: + dorosch
messages: + msg362792
2020-02-27 07:24:22hwgdb Smithsetmessages: + msg362781
2020-02-27 07:12:32hwgdb Smithcreate