Author ezio.melotti
Recipients akuchling, belopolsky, eric.araujo, ezio.melotti, georg.brandl, rhettinger, terry.reedy
Date 2011-09-01.08:04:08
SpamBayes Score 2.8813e-07
Marked as misclassified No
Message-id <1314864252.85.0.599694103576.issue4153@psf.upfronthosting.co.za>
In-reply-to
Content
After the recent discussions on python-dev I went through the Unicode howto and fixed a few things, then I found this issue so I'm attaching the patch here.
The patch addresses mostly markup issues, but it also removes the usage of 'byte string'.
A few more things that should be done:
  * clarify some more terms (e.g. codepoints, code units, characters, possibly scalar values etc.);
  * mention the differences between narrow and wide builds, including:
    - a discussion about the UCS-2/UTF-16 implementation of narrow builds;
    - something about surrogates and surrogate pairs;
    - effects of slicing and indexing on narrow builds;
    - functions/methods that (don't) accept non-BMP chars on narrow builds;
  * something about Unicode supports in the re module (this probably can wait after the 'regex' inclusion).

Also the codecs doc has a section about Unicode and encodings that might be moved to the howto.
History
Date User Action Args
2011-09-01 08:04:14ezio.melottisetrecipients: + ezio.melotti, akuchling, georg.brandl, rhettinger, terry.reedy, belopolsky, eric.araujo
2011-09-01 08:04:12ezio.melottisetmessageid: <1314864252.85.0.599694103576.issue4153@psf.upfronthosting.co.za>
2011-09-01 08:04:12ezio.melottilinkissue4153 messages
2011-09-01 08:04:12ezio.melotticreate