Message 161000 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	Ringding, belopolsky, dangra, ezio.melotti, lemburg, pitrou, serhiy.storchaka, sjmachin, spatz123, vstinner
Date	2012-05-17.18:46:04
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1337280524.2462.107.camel@raxxla>
In-reply-to	<1337276169.02.0.132543275688.issue8271@psf.upfronthosting.co.za>

Content
> This might be just because it first checks if there two more bytes before checking if they are valid, but 'invalid continuation byte' works too. Yes, this implementation detail. It is much easier and faster. Whether it is necessary to change it? > Why not? May be I'm wrong. I looked in "The Unicode Standard, Version 6.0" (http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf), pp. 95-97, the standard does not categorical in this, but recommends that only maximal subpart should be replaced by U+FFFD. \xe0\x80 is not maximal subpart. Therefore, there must be two U+FFFD. In this case, the previous and the current implementation does not conform to the standard.

> This might be just because it first checks if there two more bytes before checking if they are valid, but 'invalid continuation byte' works too.

Yes, this implementation detail. It is much easier and faster. Whether
it is necessary to change it?

> Why not?

May be I'm wrong. I looked in "The Unicode Standard, Version
6.0" (http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf), pp. 95-97,
the standard does not categorical in this, but recommends that only
maximal subpart should be replaced by U+FFFD. \xe0\x80 is not maximal
subpart. Therefore, there must be two U+FFFD. In this case, the previous
and the current implementation does not conform to the standard.

History
Date	User	Action	Args
2012-05-17 18:46:05	serhiy.storchaka	set	recipients: + serhiy.storchaka, lemburg, sjmachin, belopolsky, pitrou, vstinner, ezio.melotti, Ringding, dangra, spatz123
2012-05-17 18:46:04	serhiy.storchaka	link	issue8271 messages
2012-05-17 18:46:04	serhiy.storchaka	create