msg362780 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-27 07:12 |
here is the partial code:
msg = EmailMessage()
file_name = "超e保3000P.csv"
ctype, encoding = mimetypes.guess_type(file_name)
if ctype is None or encoding is not None:
ctype = "application/octet-stream"
maintype, subtype = ctype.split("/", 1)
with open(file_name, "rb") as f:
msg.add_attachment(f.read(), maintype=maintype, subtype=subtype, filename=("GBK", "", f"{file_name}"))
The file has non-ascii characters name, so I use the three tuple filename with encode GBK, but msg.as_string() doesn't change.
print(msg.as_string()) I find the filename is 'filename*=utf-8\'\'%E8%B6 ......'. The encoding is not correct. And of course, after send the message, I saw the attached file's filename displayed incorrect on my mail client or web mail.
But when i use the legacy API, and using the Header class to generate the filename, it works.
|
msg362781 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-27 07:24 |
"but msg.as_string() doesn't change. " , I mean using
filename=file_name
or
filename=("GBK", "", f"{file_name}")
or
filename=("utf-8", "", f"{file_name}")
msg.as_string() doesn't change.
|
msg362792 - (view) |
Author: Andrei Daraschenka (dorosch) * |
Date: 2020-02-27 11:17 |
Hello, could you please attach minimal-work file for reproduce it?
|
msg362804 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-27 14:43 |
I have uploaded just now. Thank you.
|
msg362805 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2020-02-27 14:48 |
I think you are saying that you want the charset in the encoded filename to be GBK rather than utf-8? utf-8 should certainly display correctly in your email client, though, so if it is not there is something else going wrong.
As far as the 3 tuple not working to set the charset...I believe what is happening there is that a header created by the application gets "refolded" on serialization, and refolding doesn't keep the existing charset, it converts everything to utf-8. This is an intentional part of the design: the library handles the gory details of MIME and uses utf-8 as the charset for application created content. It is actually an accident of the implementation that the tuple form of the filename is even accepted; you will note that it is *not* documented in the contentmanager docs.
It wouldn't be crazy to ask for this as a feature, and it could even be treated as a bug that it doesn't work if we want to, but it may not be easy to "fix", because it goes against the design philosophy of the new API.
|
msg362806 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2020-02-27 14:57 |
Actually, given that the contentmanager does accept a charset parameter for text content, it does seem reasonable to treat this as a bug. But as I said fixing it may not be trivial.
|
msg362808 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-27 15:31 |
Using utf-8 doesn't display correctly on my mail client.
So i thought it might work using GBK, and I try to change the Content-Disposition filename using GBK.
And just now, I print the legacy Api MIMEMultipart.as_string(), I found it using utf-8 too. The difference is
legacy api: filename="=?utf-8?b?6LaFZeS/nTMwMDBQLmNzdg==?="
EmailMessage: filename*=utf-8''.%2F%E8%B6%85e%E4%BF%9D3000P.csv
So it is not the encoding cause the display error. But I still don't know why? Base64?
|
msg362814 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-27 16:09 |
Why there are two different representations of the same file name? It displays incorrectly when use the EmailMessage API filename representation.
|
msg362836 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2020-02-27 20:14 |
The legacy API appears to be using an RFC-incorrect (but common) encoded-word encoding, while the new API is using the RFC-compliant MIME-parameter encoding (% encoding). Which email client are you using?
|
msg362857 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-28 01:21 |
Microsoft outlook 20116 MSO(16.0.4266.10001) x64
Foxmail 7.2 (build 7.026)
I use these two email client. All display incorrectly. And I have uploaded the screenshot.
|
msg362858 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-28 01:23 |
Microsoft outlook 2016 MSO(16.0.4266.10001) x64
|
msg362903 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2020-02-28 18:47 |
Since Outlook is one of the mailers that generates the non-RFC-compliant headers, it doesn't surprise me all that much that it can't interpret the RFC compliant headers correctly.
I'm not sure there is anything we can do here.
I suppose someone could do a survey of mail clients and document which ones can handle which style of parameter encoding. If it turns out more handle the "wrong" way than handle the "right" way, we could consider adopting to the de-facto standard, although I won't like it much :)
(There is also a possibility there is a bug in our RFC compliance, but this is the first problem report I've seen.)
|
msg362921 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-28 22:04 |
I think program's goal is to solve problem, not solve the "standard".
OK, if you insist that "standard" has the Top priority, could you please tell me a way to change the default behavior of the new api to use the "=?utf-8?b?" parameter style. Is there a function or parameter i can use to achieve this?
If not, i think the best way to solve it is to add a "param style" parameter that i can choose which style i use.
And if not, i am sad about this, i had to use the legacy api.
|
msg362922 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-28 22:20 |
https://litmus.com/blog/infographic-the-2019-email-client-market-share
And there is a survey about email client market share. You see outlook is top 3.
|
msg362924 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-28 22:24 |
And i just send a mail to my Gmail. I view it using web, it is incorrectly!
|
msg362927 - (view) |
Author: hwgdb Smith (hwgdb Smith) |
Date: 2020-02-28 22:27 |
Sorry, the Gmail web is correctly.
|
msg362991 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2020-02-29 16:54 |
I actually agree: if most (by market share) MUAs handle the RFC-incorrect parameter encoding style, and a significant portion does not handle the RFC correct style, then we should support the de-facto standard rather than the official standard as the default. I just wish Microsoft would write better software :) If on the other hand it is only microsoft out of the big market share players that is broken, I'm not sure I'd want it to be the default. But we could still support it optionally.
So yeah, we could have a policy control that governs which one is actually used.
So this is a feature request, and ideally should be supported by an investigation of what MUAs support what, by market share. And there's another question: does this only affect the filename parameter, or is it all MIME parameters? I would expect it to be the latter, but someone should check at least a few examples of that to be sure.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:27 | admin | set | github: 83952 |
2020-02-29 16:54:25 | r.david.murray | set | type: behavior -> enhancement title: EmailMessage.add_header doesn't work -> EmailMessage may need to support RFC-non-compliant MIME parameter encoding (encoded words in quotes) for output. messages:
+ msg362991 stage: needs patch |
2020-02-28 22:27:47 | hwgdb Smith | set | messages:
+ msg362927 |
2020-02-28 22:24:46 | hwgdb Smith | set | messages:
+ msg362924 |
2020-02-28 22:20:42 | hwgdb Smith | set | messages:
+ msg362922 |
2020-02-28 22:04:15 | hwgdb Smith | set | messages:
+ msg362921 |
2020-02-28 18:47:55 | r.david.murray | set | messages:
+ msg362903 |
2020-02-28 01:23:09 | hwgdb Smith | set | files:
+ outlook_screenshot.jpeg
messages:
+ msg362858 |
2020-02-28 01:21:18 | hwgdb Smith | set | files:
+ foxmail_screenshot.jpeg
messages:
+ msg362857 |
2020-02-27 20:14:33 | r.david.murray | set | messages:
+ msg362836 |
2020-02-27 16:09:44 | hwgdb Smith | set | messages:
+ msg362814 |
2020-02-27 15:31:58 | hwgdb Smith | set | messages:
+ msg362808 |
2020-02-27 14:57:04 | r.david.murray | set | messages:
+ msg362806 |
2020-02-27 14:48:42 | r.david.murray | set | messages:
+ msg362805 |
2020-02-27 14:43:56 | hwgdb Smith | set | files:
+ email bug.rar
messages:
+ msg362804 |
2020-02-27 11:17:32 | dorosch | set | nosy:
+ dorosch messages:
+ msg362792
|
2020-02-27 07:24:22 | hwgdb Smith | set | messages:
+ msg362781 |
2020-02-27 07:12:32 | hwgdb Smith | create | |