Author exarkun
Recipients exarkun
Date 2010-01-10.05:27:23
SpamBayes Score 0.000253017
Marked as misclassified No
Message-id <1263101250.07.0.175926881279.issue7663@psf.upfronthosting.co.za>
In-reply-to
Content
This issue may extend beyond just unicode.upper() and unicode.lower(), but it's very clear with these two methods, at least.

For example, consider DESERET SMALL LETTER EW.  On a UTF-16 build, calling upper on a string containing this doesn't change it to the capital variation (DESERET CAPITAL LETTER EW):

>>> u'\N{DESERET SMALL LETTER EW}'.upper() == u'\N{DESERET SMALL LETTER EW}'
True

It can also be seen that this isn't even recognized as lower case:

>>> u'\N{DESERET SMALL LETTER EW}'.islower()
False

With a UTF-32 build, however, the expected behavior (ie, the behavior one would get for a code point in the BMP with small and capital variations) is provided.
History
Date User Action Args
2010-01-10 05:27:30exarkunsetrecipients: + exarkun
2010-01-10 05:27:30exarkunsetmessageid: <1263101250.07.0.175926881279.issue7663@psf.upfronthosting.co.za>
2010-01-10 05:27:27exarkunlinkissue7663 messages
2010-01-10 05:27:24exarkuncreate