Message 160991 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	spatz123
Recipients	Ringding, belopolsky, dangra, ezio.melotti, lemburg, pitrou, serhiy.storchaka, sjmachin, spatz123, vstinner
Date	2012-05-17.17:36:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CABOLOwQ3eUeYGc6uP9xykxH3SxABSpOPzGJB4vMxMzo5PqMm9w@mail.gmail.com>
In-reply-to	<1337276021.2462.19.camel@raxxla>

Content
>b'\xe0\x80'.decode('utf-8', 'replace') returns >one U+FFFD and not two. I >don't think that is right. I think that one U+FFFD is correct. The on;y error is a premature end of data. On Thu, May 17, 2012 at 12:31 PM, Serhiy Storchaka <report@bugs.python.org>wrote: > > Serhiy Storchaka <storchaka@gmail.com> added the comment: > > > The only issue left was about the number of U+FFFD generated with > invalid sequences in some cases. > > My last patch has extensive tests for this, so you could try to apply it > (or copy the tests) and see if they all pass. > > Tests fails, but I'm not sure that the tests are correct. > > b'\xe0\x00' raises 'unexpected end of data' and not 'invalid > continuation byte'. This is terminological issue. > > b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I > don't think that is right. > > ---------- > title: str.decode('utf8', 'replace') -- conformance with Unicode > 5.2.0 -> str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0 > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue8271> > _______________________________________ >

>b'\xe0\x80'.decode('utf-8', 'replace') returns >one U+FFFD and not two. I
>don't think that is right.

I think that one U+FFFD is correct.  The on;y error is a premature end of
data.
On Thu, May 17, 2012 at 12:31 PM, Serhiy Storchaka
<report@bugs.python.org>wrote:

>
> Serhiy Storchaka <storchaka@gmail.com> added the comment:
>
> > The only issue left was about the number of U+FFFD generated with
> invalid sequences in some cases.
> > My last patch has extensive tests for this, so you could try to apply it
> (or copy the tests) and see if they all pass.
>
> Tests fails, but I'm not sure that the tests are correct.
>
> b'\xe0\x00' raises 'unexpected end of data' and not 'invalid
> continuation byte'. This is terminological issue.
>
> b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I
> don't think that is right.
>
> ----------
> title: str.decode('utf8',       'replace') -- conformance with Unicode
> 5.2.0 -> str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue8271>
> _______________________________________
>

History
Date	User	Action	Args
2012-05-17 17:36:23	spatz123	set	recipients: + spatz123, lemburg, sjmachin, belopolsky, pitrou, vstinner, ezio.melotti, Ringding, dangra, serhiy.storchaka
2012-05-17 17:36:22	spatz123	link	issue8271 messages
2012-05-17 17:36:22	spatz123	create