Message141949
Tom Christiansen wrote:
>
> Tom Christiansen <tchrist@perl.com> added the comment:
>
> Please do not call this "utf-8-java". It is called "cesu-8" per UTS#18 at:
>
> http://unicode.org/reports/tr26/
>
> CESU-8 is *not* a a valid Unicode Transform Format and should not be called UTF-8. It is a real pain in the butt, caused by people who misunderand Unicode mis-encoding UCS-2 into UTF-8, screwing it up. I understand the need to be able to read it, but call it what it is, please.
>
> Despite the talk about Lucene, I note that the Perl port of Lucene uses real UTF-8, not CESU-8.
CESU-8 is a different encoding than the one we are talking about.
The only difference between UTF-8 and the modified one is the different
encoding for the U+0000 code point to have the output not contain
any NUL bytes. |
|
Date |
User |
Action |
Args |
2011-08-12 10:26:33 | lemburg | set | recipients:
+ lemburg, georg.brandl, phr, belopolsky, moese, vstinner, ezio.melotti, tchrist |
2011-08-12 10:26:32 | lemburg | link | issue2857 messages |
2011-08-12 10:26:32 | lemburg | create | |
|