Message 160989 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	Ringding, belopolsky, dangra, ezio.melotti, lemburg, pitrou, serhiy.storchaka, sjmachin, spatz123, vstinner
Date	2012-05-17.17:31:03
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1337276021.2462.19.camel@raxxla>
In-reply-to	<1337273740.36.0.967710988682.issue8271@psf.upfronthosting.co.za>

Content
> The only issue left was about the number of U+FFFD generated with invalid sequences in some cases. > My last patch has extensive tests for this, so you could try to apply it (or copy the tests) and see if they all pass. Tests fails, but I'm not sure that the tests are correct. b'\xe0\x00' raises 'unexpected end of data' and not 'invalid continuation byte'. This is terminological issue. b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I don't think that is right.

> The only issue left was about the number of U+FFFD generated with invalid sequences in some cases.
> My last patch has extensive tests for this, so you could try to apply it (or copy the tests) and see if they all pass.

Tests fails, but I'm not sure that the tests are correct.

b'\xe0\x00' raises 'unexpected end of data' and not 'invalid
continuation byte'. This is terminological issue.

b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I
don't think that is right.

History
Date	User	Action	Args
2012-05-17 17:31:04	serhiy.storchaka	set	recipients: + serhiy.storchaka, lemburg, sjmachin, belopolsky, pitrou, vstinner, ezio.melotti, Ringding, dangra, spatz123
2012-05-17 17:31:03	serhiy.storchaka	link	issue8271 messages
2012-05-17 17:31:03	serhiy.storchaka	create