This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Many MUA don't recognize charset "eucgb2312_cn" in email header
Type: behavior Stage:
Components: email Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, corona10, r.david.murray, terry.reedy, tommylikehu
Priority: normal Keywords:

Created on 2021-07-04 08:32 by tommylikehu, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg396939 - (view) Author: TommyLike Hu (tommylikehu) Date: 2021-07-04 08:32
Email module is used for email message decode and encode, if the header content is gb2312 encoded for example "中文", by design we would finally have a rfc-2047 encoded header as below:
```
=?eucgb2312_cn?b?1tDOxA==?=
```
the test script is as below:
```
from email import header, charset

h = header.make_header([(str("中文").encode("gb2312"),
                         charset.Charset("gb2312"))])
print(h.encode())
```

My question is why don't we use "gb2312" as the charset in rfc-2047 encoded string, considering the "eucgb2312_cn" is only python awareness.

Thanks
msg397045 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2021-07-06 16:47
I can't tell tell for sure if this behavior is intentional or not from a quick glance at the code (though like you I wouldn't think it would be).

That's part of the legacy api, at this point.  The new api will just use utf8:

from email.message import EmailMessage

m = EmailMessage()
m['Subject'] = '中文'

print(bytes(m))

results in

b'Subject: =?utf-8?b?5Lit5paH?=\n\n'

The fix, assuming it is correct, would be to add the line:

    'eucgb2312_cn': 'gb2312',

to the CODEC_MAP in email/charset.py, and then specify the internal codec name in your Charset call.  I'm not sure that's right, though...once upon I time I think I understood the logic behind the charset module, but I no longer remember the details.

I'd recommend just using the new API and not the legacy API.
msg397210 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-07-09 19:33
Anything before 3.9 only gets security patches.
History
Date User Action Args
2022-04-11 14:59:47adminsetgithub: 88726
2021-07-09 19:33:45terry.reedysetnosy: + terry.reedy
title: Unrecognized charset "eucgb2312_cn" in email header for many MUA -> Many MUA don't recognize charset "eucgb2312_cn" in email header
messages: + msg397210

versions: - Python 3.6, Python 3.7, Python 3.8
2021-07-06 16:47:15r.david.murraysetmessages: + msg397045
2021-07-06 14:13:25msapirosetversions: + Python 3.7, Python 3.8, Python 3.9
2021-07-04 11:53:16corona10setnosy: + corona10
2021-07-04 08:32:11tommylikehucreate