This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients ezio.melotti, loewis, mankyd
Date 2011-11-13.03:44:45
SpamBayes Score 3.2101275e-07
Marked as misclassified No
Message-id <1321155886.61.0.331754899674.issue13391@psf.upfronthosting.co.za>
In-reply-to
Content
str.strip uses Py_UNICODE_ISSPACE that in turn uses _PyUnicode_IsWhitespace (see Objects/unicodetype_db.h#l3347), and according to the comment there it "Returns 1 for Unicode characters having the bidirectional type 'WS', 'B' or 'S' or the category 'Zs', 0 otherwise."
The category of U+200B is 'Cf', and its bidirectional type is 'BN' so 0 is returned and the character is not stripped.

OTOH, Unicode defines the White_Space property and assigns it to 26 chars, whereas _PyUnicode_IsWhitespace includes 4 more chars (1C, 1D, 1E, 1F) that should probably be removed.

I'll close this issue because str.strip() is correct regarding U+200B.

@Martin
Do you think those 4 chars should be removed?
If so I'll open another issue.
History
Date User Action Args
2011-11-13 03:44:46ezio.melottisetrecipients: + ezio.melotti, loewis, mankyd
2011-11-13 03:44:46ezio.melottisetmessageid: <1321155886.61.0.331754899674.issue13391@psf.upfronthosting.co.za>
2011-11-13 03:44:46ezio.melottilinkissue13391 messages
2011-11-13 03:44:45ezio.melotticreate