This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author hyeshik.chang
Recipients cdqzzy, ezio.melotti, hyeshik.chang, lemburg, terry.reedy, vstinner
Date 2011-05-12.00:09:59
SpamBayes Score 1.9577945e-10
Marked as misclassified No
Message-id <1305159000.69.0.677379244969.issue12057@psf.upfronthosting.co.za>
In-reply-to
Content
Hello, everyone!

The rationale why I chose to encode the test strings into a Python source code was that I wanted for them to be treated as text files which are trackable in CVS or subversion and to keep Python source codes free of any non-ASCII characters. Now I don't feel the need of "text file" status, STINNER's suggestion works for me.

Actually, all "stateful" encodings supported by cjkcodecs lack of adequate test codes. (There are seven more iso-2022 stateful encodings in addition of hz in Python.)  "cjkencoding_tests.py" is used for random chunk coding tests and most stateful encodings are not compatible with random chunk coding. For those reasons, I didn't include test strings for them there. But they apparently still need appropriate simple string coding and stream coding tests.

STINNER Victor wrote:
> I don't understand why different texts are used. Why not just using the
> same original text for all testcases? One reason can be that some
> encodings (e.g. ISO 2202) use escape sequences to change the current
> encoding. Or maybe because the characters are different (chinese vs
> japanese characters?).

Almost every encoding in cjkcodecs has different set of characters. They support different languages (Chinese, Japanese, Korean), different scripts (Hanja, Kanji, Traditional and Simplified Chinese), different standards (johab and KS X 1001 in Korean), different versions/variants (JIS X 0201 and JIS X 0213 in Japanese).  It would be quite striking, actually one of them, gb18030, is a "superset" of the Unicode so far.


Teddy J Reedy wrotes:
> Perhaps there should be a separate test like the above to be sure that hz really uses GB2312-80, as specified.

You're right.


By the way, my previous e-mail address <perky@FreeBSD.org> isn't reachable anymore, please send to <hyeshik@gmail.com> when you need.
History
Date User Action Args
2011-05-12 00:10:00hyeshik.changsetrecipients: + hyeshik.chang, lemburg, terry.reedy, vstinner, ezio.melotti, cdqzzy
2011-05-12 00:10:00hyeshik.changsetmessageid: <1305159000.69.0.677379244969.issue12057@psf.upfronthosting.co.za>
2011-05-12 00:09:59hyeshik.changlinkissue12057 messages
2011-05-12 00:09:59hyeshik.changcreate