Author ezio.melotti
Recipients dangra, ezio.melotti, lemburg, sjmachin
Date 2010-04-02.22:27:14
SpamBayes Score 7.28673e-10
Marked as misclassified No
Message-id <1270247238.98.0.791005996157.issue8271@psf.upfronthosting.co.za>
In-reply-to
Content
Here's a new patch. Should be complete but I want to test it some more before committing.
I decided to follow RFC 3629, putting 0 instead of 5/6 for bytes in range F5-FD (we can always put them back in the unlikely case that the Unicode Consortium changed its mind) and also for other invalid ranges (e.g. C0-C1). This lead to some simplification in the code.

I also found out that, according to RFC 3629, surrogates are considered invalid and they can't be encoded/decoded, but the UTF-8 codec actually does it. I included tests and fix but I left them commented out because this is out of the scope of this patch, and it probably need a discussion on python-dev.
History
Date User Action Args
2010-04-02 22:27:19ezio.melottisetrecipients: + ezio.melotti, lemburg, sjmachin, dangra
2010-04-02 22:27:18ezio.melottisetmessageid: <1270247238.98.0.791005996157.issue8271@psf.upfronthosting.co.za>
2010-04-02 22:27:17ezio.melottilinkissue8271 messages
2010-04-02 22:27:17ezio.melotticreate