This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Implement Mac East Asian encodings properly
Type: Stage:
Components: Library (Lib) Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Behdad.Esfahbod, hyeshik.chang, lemburg
Priority: normal Keywords:

Created on 2015-04-23 18:52 by Behdad.Esfahbod, last changed 2022-04-11 14:58 by admin.

Messages (7)
msg241876 - (view) Author: Behdad Esfahbod (Behdad.Esfahbod) Date: 2015-04-23 18:52
encodings.aliases has this in it's tail, even master today [0]

    # temporary mac CJK aliases, will be replaced by proper codecs in 3.1
    'x_mac_japanese'      : 'shift_jis',
    'x_mac_korean'        : 'euc_kr',
    'x_mac_simp_chinese'  : 'gb2312',
    'x_mac_trad_chinese'  : 'big5',

A full implementation is appreciated.

[0] https://github.com/python/cpython/blob/master/Lib/encodings/aliases.py#L539
msg241877 - (view) Author: Behdad Esfahbod (Behdad.Esfahbod) Date: 2015-04-23 18:54
Also, I'm not sure about the 'x_' prefix.  It's not kept for the other mac encodings.  There's a useful table here:

https://github.com/behdad/fonttools/issues/236
msg241924 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-04-24 08:18
The "x_" prefix was added as reminder and way to document the desire to look into this at some point:

https://github.com/python/cpython/commit/c696b47b10db1fa22b77ecfe1af392b3d62aab61

Before adding more codecs, we always ask whether these are in actual use. Can you provide some evidence of this ?

We will also need official references to the definitions of the Mac encodings.

Thanks.
msg241926 - (view) Author: Behdad Esfahbod (Behdad.Esfahbod) Date: 2015-04-24 08:34
Thanks Marc-Andre.  If the x_ was indeed added for that reason, it's quite a coincidence, because the MIME name of these encodings also starts with x-mac-..., so I assumed that's where the x_ comes from.

The mappings are available at the Unicode website:
http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINTRAD.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/KOREAN.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINSIMP.TXT

As for actual use, they are part of the OpenType standard.  So by user request, I had to implement them last week in the FontTools Python library.  This is useful for people when dealing with old and legacy fonts, specially in the process of converting them to Unicode-compatible fonts.
msg241928 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-04-24 09:07
On 24.04.2015 10:34, Behdad Esfahbod wrote:
> 
> Thanks Marc-Andre.  If the x_ was indeed added for that reason, it's quite a coincidence, because the MIME name of these encodings also starts with x-mac-..., so I assumed that's where the x_ comes from.

Oh, I didn't know that :-)

Hmm, I can't find the names listed as IANA charset, so the "x-" prefix
then probably means non-standard.

http://www.iana.org/assignments/character-sets/character-sets.xhtml

> The mappings are available at the Unicode website:
> http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT
> http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINTRAD.TXT
> http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/KOREAN.TXT
> http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINSIMP.TXT
> 
> As for actual use, they are part of the OpenType standard.  So by user request, I had to implement them last week in the FontTools Python library.  This is useful for people when dealing with old and legacy fonts, specially in the process of converting them to Unicode-compatible fonts.

This may be an indication that it's better to put those
codecs into a PyPI package, rather than Python itself. The above
tables are huge (as most Asian codec tables).
msg241969 - (view) Author: Behdad Esfahbod (Behdad.Esfahbod) Date: 2015-04-24 18:34
They are a rather minor change on top of the existing Asian encodings.  So implementing them in Python might be easier.  I have a half-done version of those.  I can try finishing and post it back here.
msg241970 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-04-24 18:35
On 24.04.2015 20:34, Behdad Esfahbod wrote:
> 
> They are a rather minor change on top of the existing Asian encodings.  So implementing them in Python might be easier.  I have a half-done version of those.  I can try finishing and post it back here.

If it's only a smaller patch, that would work fine, I guess.
History
Date User Action Args
2022-04-11 14:58:16adminsetgithub: 68229
2015-04-24 18:35:15lemburgsetmessages: + msg241970
2015-04-24 18:34:01Behdad.Esfahbodsetmessages: + msg241969
2015-04-24 09:07:52lemburgsetmessages: + msg241928
2015-04-24 08:34:56Behdad.Esfahbodsetmessages: + msg241926
2015-04-24 08:18:13lemburgsetmessages: + msg241924
2015-04-23 20:17:41ned.deilysetnosy: + lemburg, hyeshik.chang
2015-04-23 18:54:01Behdad.Esfahbodsetmessages: + msg241877
2015-04-23 18:52:10Behdad.Esfahbodcreate