Message 106774 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	PeterL, ezio.melotti
Date	2010-05-30.19:12:33
SpamBayes Score	1.5570284e-07
Marked as misclassified	No
Message-id	<1275246756.15.0.893458000522.issue8859@psf.upfronthosting.co.za>
In-reply-to

Content
Both on Linux and Windows I get: >>> '\xa0'.isspace() False >>> u'\xa0'.isspace() True The Unicode char u'\xa0' is U+00A0 NO-BREAK SPACE, so unicode.split correctly considers it a whitespace. However '\xa0' is not a whitespace, so str.split ignores it. The correct solution is to convert your string to Unicode and then split. I'd close this as invalid but I'd like you to confirm that the example I posted and that 'split' return the same result on both Linux and Windows before doing so (the fact that on Linux works it's probably caused by something else -- e.g. the label is already Unicode).

Both on Linux and Windows I get:
>>> '\xa0'.isspace()
False
>>> u'\xa0'.isspace()
True

The Unicode char u'\xa0' is U+00A0 NO-BREAK SPACE, so unicode.split correctly considers it a whitespace.
However '\xa0' is not a whitespace, so str.split ignores it.
The correct solution is to convert your string to Unicode and then split.
I'd close this as invalid but I'd like you to confirm that the example I posted and that 'split' return the same result on both Linux and Windows before doing so (the fact that on Linux works it's probably caused by something else -- e.g. the label is already Unicode).

History
Date	User	Action	Args
2010-05-30 19:12:36	ezio.melotti	set	recipients: + ezio.melotti, PeterL
2010-05-30 19:12:36	ezio.melotti	set	messageid: <1275246756.15.0.893458000522.issue8859@psf.upfronthosting.co.za>
2010-05-30 19:12:33	ezio.melotti	link	issue8859 messages
2010-05-30 19:12:33	ezio.melotti	create