Message94011
> Feel free to upload it here. I'm fairly skeptical that it is
> possible to implement casing "correctly" in a locale-independent
> way.
Ok. I will try to find time to complete it enough to be readable.
Unicode (see sec 3.13) specifies the casing of unicode strings pretty
completely -- i.e. it gives "Default Casing" rules to be used when no
locale specific "tailoring" is available. The only dependencies on
locale for the special casing rules are for Turkish, Azeri, and
Lithuanian. And you only need to know that that is the language, no
other details. So I'm sure that a complete implementation is possible
without resort to a lot of locale munging -- at least for .lower()
.upper() and .title().
.swapcase() is just ...err... dumb^h^h^h^h questionably useful.
However .capitalize() is a bit weird; and I'm not sure it isn't
incorrectly implemented now:
It UPPERCASES the first character, rather than TITLECASING, which is
probably wrong in the very few cases where it makes a difference:
e.g. (using Croatian ligatures)
>>> u'\u01c5amonjna'.title()
u'\u01c4amonjna'
>>> u'\u01c5amonjna'.capitalize()
u'\u01c5amonjna'
"Capitalization" is not precisely defined (by the Unicode standard) --
the currently python implementation doesn't even do what the docs say:
"makes the first character have upper case" (it also lower-cases all
other characters!), however I might argue that a more useful
implementation "makes the first character have titlecase..." |
|
Date |
User |
Action |
Args |
2009-10-14 19:00:16 | senn | set | recipients:
+ senn, lemburg, loewis, ezio.melotti, alexs |
2009-10-14 19:00:15 | senn | set | messageid: <1255546815.92.0.62328909151.issue4610@psf.upfronthosting.co.za> |
2009-10-14 19:00:09 | senn | link | issue4610 messages |
2009-10-14 19:00:09 | senn | create | |
|