This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients belopolsky, ezio.melotti, loewis, terry.reedy
Date 2013-06-23.07:59:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1371974388.46.0.618338051298.issue18236@psf.upfronthosting.co.za>
In-reply-to
Content
I stand by that comment: IsWhiteSpace should use the Unicode White_Space property. Since FS/GS/RS/US are not in the White_Space property, it's correct that the int conversion fails. It's incorrect that .isspace() gives true.

There are really several bugs here:
- .isspace doesn't use the White_List property
- int conversion ultimately uses Py_ISSPACE, which conceptually could deviate from the Unicode properties (as it is byte-based). This is not really an issue, since they indeed match.

I propose to fix this by parsing PropList.txt, and generating _PyUnicode_IsWhitespace based on the White_Space property. For efficiency, it should also generate a fast-lookup array for the ASCII case, or just use _Py_ctype_table (with a comment that this table needs to match PropList White_Space). _Py_ascii_whitespace should go.

Contributions are welcome.
History
Date User Action Args
2013-06-23 07:59:48loewissetrecipients: + loewis, terry.reedy, belopolsky, ezio.melotti
2013-06-23 07:59:48loewissetmessageid: <1371974388.46.0.618338051298.issue18236@psf.upfronthosting.co.za>
2013-06-23 07:59:48loewislinkissue18236 messages
2013-06-23 07:59:47loewiscreate