Message 327357 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	doerwalter
Recipients	doerwalter, ezio.melotti, vstinner
Date	2018-10-08.15:38:20
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1539013100.29.0.545547206417.issue34935@psf.upfronthosting.co.za>
In-reply-to

Content
The following code issues a misleading exception message: >>> b'\xed\xa0\xbd\xed\xb3\x9e'.decode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte The cause for the exception is not an invalid continuation byte, but UTF-8 encoded surrogates. In fact using the 'surrogatepass' error handler doesn't raise an exception: >>> b'\xed\xa0\xbd\xed\xb3\x9e'.decode("utf-8", "surrogatepass") '\ud83d\udcde' I would have expected an exception message like: UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-2: surrogates not allowed (Note that the input bytes are an improperly UTF-8 encoded version of U+1F4DE (telephone receiver))

The following code issues a misleading exception message:

>>> b'\xed\xa0\xbd\xed\xb3\x9e'.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte

The cause for the exception is *not* an invalid continuation byte, but UTF-8 encoded surrogates. In fact using the 'surrogatepass' error handler doesn't raise an exception:

>>> b'\xed\xa0\xbd\xed\xb3\x9e'.decode("utf-8", "surrogatepass")
'\ud83d\udcde'

I would have expected an exception message like:

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-2: surrogates not allowed

(Note that the input bytes are an improperly UTF-8 encoded version of U+1F4DE (telephone receiver))

History
Date	User	Action	Args
2018-10-08 15:38:20	doerwalter	set	recipients: + doerwalter, vstinner, ezio.melotti
2018-10-08 15:38:20	doerwalter	set	messageid: <1539013100.29.0.545547206417.issue34935@psf.upfronthosting.co.za>
2018-10-08 15:38:20	doerwalter	link	issue34935 messages
2018-10-08 15:38:20	doerwalter	create