This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients belopolsky, ezio.melotti, georg.brandl, lemburg, moese, phr, vstinner
Date 2011-05-11.13:31:59
SpamBayes Score 1.3358e-10
Marked as misclassified No
Message-id <>
In-reply-to <>
Thanks for the patch, Victor.

Some comments on the patch:

 * the codec will have to be able to work with lone surrogates
   (see the wikipedia page explaining this detail), which the
   UTF-8 codec in Python 3.x no longer does, so another special
   case is due for this difference

 * we should not make the standard UTF-8 codec slower just to
   support a variant of UTF-8 which will only get marginal use;
   for the decoder, the changes are minimal, so that's fine,
   but for the decoder you are changing the most often used
   code branch to check for NUL bytes - we need a better solution
   for this, even if it means having to use a separte encode_utf8java

Since the ticket was opened in 2008, the common name of the
codec appears to have changed from "UTF-8 Java" to "Modified UTF-8"
or "MUTF-8" as short alias:

    (change in
    (scroll down to "Modified UTF-8")
    (this is for Android)

So I guess we should adapt to the name to the now common name
and call it "ModifiedUTF8" in the C API and add these aliases:
"utf-8-modified", "mutf-8" and "modified-utf-8".
Date User Action Args
2011-05-11 13:32:00lemburgsetrecipients: + lemburg, georg.brandl, phr, belopolsky, moese, vstinner, ezio.melotti
2011-05-11 13:31:59lemburglinkissue2857 messages
2011-05-11 13:31:59lemburgcreate