Author serhiy.storchaka
Recipients Ringding, belopolsky, dangra, ezio.melotti, lemburg, pitrou, serhiy.storchaka, sjmachin, spatz123, vstinner
Date 2012-05-17.18:46:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1337280524.2462.107.camel@raxxla>
In-reply-to <1337276169.02.0.132543275688.issue8271@psf.upfronthosting.co.za>
Content
> This might be just because it first checks if there two more bytes before checking if they are valid, but 'invalid continuation byte' works too.

Yes, this implementation detail. It is much easier and faster. Whether
it is necessary to change it?

> Why not?

May be I'm wrong. I looked in "The Unicode Standard, Version
6.0" (http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf), pp. 95-97,
the standard does not categorical in this, but recommends that only
maximal subpart should be replaced by U+FFFD. \xe0\x80 is not maximal
subpart. Therefore, there must be two U+FFFD. In this case, the previous
and the current implementation does not conform to the standard.
History
Date User Action Args
2012-05-17 18:46:05serhiy.storchakasetrecipients: + serhiy.storchaka, lemburg, sjmachin, belopolsky, pitrou, vstinner, ezio.melotti, Ringding, dangra, spatz123
2012-05-17 18:46:04serhiy.storchakalinkissue8271 messages
2012-05-17 18:46:04serhiy.storchakacreate