Message147547
str.strip uses Py_UNICODE_ISSPACE that in turn uses _PyUnicode_IsWhitespace (see Objects/unicodetype_db.h#l3347), and according to the comment there it "Returns 1 for Unicode characters having the bidirectional type 'WS', 'B' or 'S' or the category 'Zs', 0 otherwise."
The category of U+200B is 'Cf', and its bidirectional type is 'BN' so 0 is returned and the character is not stripped.
OTOH, Unicode defines the White_Space property and assigns it to 26 chars, whereas _PyUnicode_IsWhitespace includes 4 more chars (1C, 1D, 1E, 1F) that should probably be removed.
I'll close this issue because str.strip() is correct regarding U+200B.
@Martin
Do you think those 4 chars should be removed?
If so I'll open another issue. |
|
Date |
User |
Action |
Args |
2011-11-13 03:44:46 | ezio.melotti | set | recipients:
+ ezio.melotti, loewis, mankyd |
2011-11-13 03:44:46 | ezio.melotti | set | messageid: <1321155886.61.0.331754899674.issue13391@psf.upfronthosting.co.za> |
2011-11-13 03:44:46 | ezio.melotti | link | issue13391 messages |
2011-11-13 03:44:45 | ezio.melotti | create | |
|