Message 259034 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Quentin.Pradet
Recipients	Quentin.Pradet, docs@python
Date	2016-01-27.17:15:08
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1453914908.97.0.691892593639.issue26220@psf.upfronthosting.co.za>
In-reply-to

Content
From https://docs.python.org/3.6/howto/unicode.html#the-string-type: > The following examples show the differences:: > > >>> b'\x80abc'.decode("utf-8", "strict") #doctest: +NORMALIZE_WHITESPACE > Traceback (most recent call last): > ... > UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: > invalid start byte > >>> b'\x80abc'.decode("utf-8", "replace") > '\ufffdabc' > >>> b'\x80abc'.decode("utf-8", "backslashreplace") > '\\x80abc' > >>> b'\x80abc'.decode("utf-8", "ignore") > 'abc' > > (In this code example, the Unicode replacement character has been replaced by > a question mark because it may not be displayed on some systems.) I think the whole sentence after the snippet can be removed because this is exactly what Python 3.2+ outputs. It looks like the commit which added this sentence dates from Python 3.1: https://github.com/python/cpython/commit/34d4c82af56ebc1b65514a118f0ec7feeb8e172f, but another commit around Python 3.3 removed it: https://github.com/python/cpython/commit/63172c46706ae9b2a3bc80d639504a57fff4e716.

From https://docs.python.org/3.6/howto/unicode.html#the-string-type:

> The following examples show the differences::
>
>     >>> b'\x80abc'.decode("utf-8", "strict")  #doctest: +NORMALIZE_WHITESPACE
>     Traceback (most recent call last):
>         ...
>     UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
>       invalid start byte
>     >>> b'\x80abc'.decode("utf-8", "replace")
>     '\ufffdabc'
>     >>> b'\x80abc'.decode("utf-8", "backslashreplace")
>     '\\x80abc'
>     >>> b'\x80abc'.decode("utf-8", "ignore")
>     'abc'
>
> (In this code example, the Unicode replacement character has been replaced by
> a question mark because it may not be displayed on some systems.)

I think the whole sentence after the snippet can be removed because this is exactly what Python 3.2+ outputs. It looks like the commit which added this sentence dates from Python 3.1: https://github.com/python/cpython/commit/34d4c82af56ebc1b65514a118f0ec7feeb8e172f, but another commit around Python 3.3 removed it: https://github.com/python/cpython/commit/63172c46706ae9b2a3bc80d639504a57fff4e716.

History
Date	User	Action	Args
2016-01-27 17:15:09	Quentin.Pradet	set	recipients: + Quentin.Pradet, docs@python
2016-01-27 17:15:08	Quentin.Pradet	set	messageid: <1453914908.97.0.691892593639.issue26220@psf.upfronthosting.co.za>
2016-01-27 17:15:08	Quentin.Pradet	link	issue26220 messages
2016-01-27 17:15:08	Quentin.Pradet	create