Message 191709 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	belopolsky, cvrebert, eric.araujo, eric.smith, ezio.melotti, lemburg, mark.dickinson, ncoghlan, skrah, vstinner
Date	2013-06-23.16:59:57
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1372006798.06.0.82188600287.issue10581@psf.upfronthosting.co.za>
In-reply-to

Content
Martin v. Löwis wrote at #18236 (msg191687): > int conversion ultimately uses Py_ISSPACE, which conceptually could > deviate from the Unicode properties (as it is byte-based). This is not > really an issue, since they indeed match. Py_ISSPACE matches Unicode White_Space property in the ASII range (first 128 code points) it differs for byte (code point) values from 128 through 255. This leads to the following discrepancy: >>> int('123\xa0') 123 but >>> int(b'123\xa0') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 3: invalid start byte >>> int('123\xa0'.encode()) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '123\xa0'

Martin v. Löwis wrote at #18236 (msg191687):
> int conversion ultimately uses Py_ISSPACE, which conceptually could
> deviate from the Unicode properties (as it is byte-based). This is not
> really an issue, since they indeed match.

Py_ISSPACE matches Unicode White_Space property in the ASII range (first 128 code points) it differs for byte (code point) values from 128 through 255.  This leads to the following discrepancy:

>>> int('123\xa0')
123

but

>>> int(b'123\xa0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 3: invalid start byte
>>> int('123\xa0'.encode())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '123\xa0'

History
Date	User	Action	Args
2013-06-23 16:59:58	belopolsky	set	recipients: + belopolsky, lemburg, mark.dickinson, ncoghlan, vstinner, eric.smith, ezio.melotti, eric.araujo, cvrebert, skrah
2013-06-23 16:59:58	belopolsky	set	messageid: <1372006798.06.0.82188600287.issue10581@psf.upfronthosting.co.za>
2013-06-23 16:59:58	belopolsky	link	issue10581 messages
2013-06-23 16:59:57	belopolsky	create