Message84115
Attached patch is a partial fix: support UTF-16-LE, UTF-16-BE and
UTF-32-LE. Some remarks about my patch:
* UTF-32-BE is not supported because I'm too lazy tonigh
to finish the patch and because such file begins with 0x00 0x00
whereas the parser doesn't like nul bytes
* I disabled the cookie check if the file starts with a BOM (the
cookie is ignored) because the charset name is not normalized
and so if the cookie is not exactly the same as the hardcoded
charset name (eg. "UTF-16LE"), the test will fail.
Eg "utf-16le" != "UTF-16LE" :-(
* compile() would require much more effort to support UTF-16-*
and UTF-32-* because compile() simply rejects any string with
nul byte. It's beause it uses functions like strlen() :-/ That's
why I use subprocess([sys.executable, ...]) in the unit test and
not simply compile()
Support UTF-{16,32}-{LE,BE} would be nice but it requires to hack to
parser (especially compile() builtin function) to support nul bytes... |
|
Date |
User |
Action |
Args |
2009-03-24 22:00:17 | vstinner | set | recipients:
+ vstinner, loewis, georg.brandl, tungwaiyip, christian.heimes |
2009-03-24 22:00:17 | vstinner | set | messageid: <1237932017.68.0.639132292951.issue1503789@psf.upfronthosting.co.za> |
2009-03-24 22:00:15 | vstinner | link | issue1503789 messages |
2009-03-24 22:00:13 | vstinner | create | |
|