This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients Ringding, belopolsky, dangra, ezio.melotti, lemburg, pitrou, serhiy.storchaka, sjmachin, spatz123, vstinner
Date 2012-05-17.18:55:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1337280905.66.0.138053551918.issue8271@psf.upfronthosting.co.za>
In-reply-to
Content
> \xe0\x80 is not maximal subpart. Therefore, there must be two U+FFFD.

OK, now I get what you mean.  The valid range for continuation bytes that can follow E0 is A0-BF, not 80-BF as usual, so \x80 is not a valid continuation byte here.  While working on the patch I stumbled across this corner case and contacted the Unicode consortium to ask about it, as explained in msg129495.

I don't remember all the details right now, but it that test was passing with my patch there must be something wrong somewhere (either in the patch, in the test, or in our understanding of the standard).
History
Date User Action Args
2012-05-17 18:55:05ezio.melottisetrecipients: + ezio.melotti, lemburg, sjmachin, belopolsky, pitrou, vstinner, Ringding, dangra, spatz123, serhiy.storchaka
2012-05-17 18:55:05ezio.melottisetmessageid: <1337280905.66.0.138053551918.issue8271@psf.upfronthosting.co.za>
2012-05-17 18:55:05ezio.melottilinkissue8271 messages
2012-05-17 18:55:04ezio.melotticreate