Message 97500 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	exarkun
Recipients	exarkun
Date	2010-01-10.05:27:23
SpamBayes Score	0.0002530175
Marked as misclassified	No
Message-id	<1263101250.07.0.175926881279.issue7663@psf.upfronthosting.co.za>
In-reply-to

Content
This issue may extend beyond just unicode.upper() and unicode.lower(), but it's very clear with these two methods, at least. For example, consider DESERET SMALL LETTER EW. On a UTF-16 build, calling upper on a string containing this doesn't change it to the capital variation (DESERET CAPITAL LETTER EW): >>> u'\N{DESERET SMALL LETTER EW}'.upper() == u'\N{DESERET SMALL LETTER EW}' True It can also be seen that this isn't even recognized as lower case: >>> u'\N{DESERET SMALL LETTER EW}'.islower() False With a UTF-32 build, however, the expected behavior (ie, the behavior one would get for a code point in the BMP with small and capital variations) is provided.

This issue may extend beyond just unicode.upper() and unicode.lower(), but it's very clear with these two methods, at least.

For example, consider DESERET SMALL LETTER EW.  On a UTF-16 build, calling upper on a string containing this doesn't change it to the capital variation (DESERET CAPITAL LETTER EW):

>>> u'\N{DESERET SMALL LETTER EW}'.upper() == u'\N{DESERET SMALL LETTER EW}'
True

It can also be seen that this isn't even recognized as lower case:

>>> u'\N{DESERET SMALL LETTER EW}'.islower()
False

With a UTF-32 build, however, the expected behavior (ie, the behavior one would get for a code point in the BMP with small and capital variations) is provided.

History
Date	User	Action	Args
2010-01-10 05:27:30	exarkun	set	recipients: + exarkun
2010-01-10 05:27:30	exarkun	set	messageid: <1263101250.07.0.175926881279.issue7663@psf.upfronthosting.co.za>
2010-01-10 05:27:27	exarkun	link	issue7663 messages
2010-01-10 05:27:24	exarkun	create