Message191687
I stand by that comment: IsWhiteSpace should use the Unicode White_Space property. Since FS/GS/RS/US are not in the White_Space property, it's correct that the int conversion fails. It's incorrect that .isspace() gives true.
There are really several bugs here:
- .isspace doesn't use the White_List property
- int conversion ultimately uses Py_ISSPACE, which conceptually could deviate from the Unicode properties (as it is byte-based). This is not really an issue, since they indeed match.
I propose to fix this by parsing PropList.txt, and generating _PyUnicode_IsWhitespace based on the White_Space property. For efficiency, it should also generate a fast-lookup array for the ASCII case, or just use _Py_ctype_table (with a comment that this table needs to match PropList White_Space). _Py_ascii_whitespace should go.
Contributions are welcome. |
|
Date |
User |
Action |
Args |
2013-06-23 07:59:48 | loewis | set | recipients:
+ loewis, terry.reedy, belopolsky, ezio.melotti |
2013-06-23 07:59:48 | loewis | set | messageid: <1371974388.46.0.618338051298.issue18236@psf.upfronthosting.co.za> |
2013-06-23 07:59:48 | loewis | link | issue18236 messages |
2013-06-23 07:59:47 | loewis | create | |
|