Message232638
This patch adds Japanese legacy encodings as below.
https://bitbucket.org/t2y/cpython/branches/compare/japanese-legacy-encoding..default
* eucjp_ms (euc-jp compatible with cp932)
* iso2022_jp_ms (yet another iso-2022-jp compatible with cp932, similar to cp50220)
* cp50220 (http://www.iana.org/assignments/charset-reg/CP50220)
* cp50221 (a variant of cp50220)
* cp50222 (a variant of cp50220)
* cp51932 (http://www.iana.org/assignments/charset-reg/CP51932)
Originally, these character encodings patch was created as result in IPA project in 2005, by Masayuki Moriyama. The result was contributed to several community: libiconv, glibc, perl, PHP, Ruby, PostgreSQL, MySQL, nkf. He had made a patch for Python 2.4.3 at that time, but somehow, no one worked to integrate. That's a crying shame.
These character encodings are legacy, but are still used. Lots of end-user don't care the character encoding. Unfortunately, for historical reason, e-mails are encoded with these legacy encodings on Japanese Windows platform. Actually, my customer recently reported about Mojibake since its e-mail data would be encoded with cp50220 (iso-2022-jp-ms).
References:
* About IPA: http://www.ipa.go.jp/english/about/summary.html
* Mojibake: http://en.wikipedia.org/wiki/Mojibake
* Java encoding names: http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
References in Japanese:
* Japanese Legacy Encoding Project: http://legacy-encoding.sourceforge.jp/wiki/
* Project details: http://www.ipa.go.jp/about/jigyoseika/05fy-pro/open/2005-1467d.pdf |
|
Date |
User |
Action |
Args |
2014-12-14 14:34:51 | t2y | set | recipients:
+ t2y, ishimoto, methane |
2014-12-14 14:34:49 | t2y | set | messageid: <1418567689.35.0.572357268777.issue23050@psf.upfronthosting.co.za> |
2014-12-14 14:34:49 | t2y | link | issue23050 messages |
2014-12-14 14:34:47 | t2y | create | |
|