This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients alexs, ezio.melotti, lemburg, loewis
Date 2008-12-20.19:52:49
SpamBayes Score 0.0
Marked as misclassified No
Message-id <>
In-reply-to <>
On 2008-12-20 17:19, Alex Stapleton wrote:
> Alex Stapleton <> added the comment:
> I am trying to get a PEP together for this. Does anyone have any thoughts 
> on how to handle comparison between unicode strings in a locale aware 
> situation?

Some thoughts:

 * the Unicode implementation *must* stay locale independent

 * we should implement the Unicode collation algorithm

 * which collation to use should be a parameter of a function
   or object initializer and it should be possible to use
   multiple collations in the same application (without switching
   the locale)

 * the terms "locale" and "collation" should not be mixed;
   a (default) collation is a property of a locale and there can
   also be more than one collation per locale

The Unicode collation algorithm defines collation in terms of a
key function for each collation, so that already fits nicely with
the key function parameter of list.sort().

> Should __lt__ and __gt__ be specified as ignoring locale? In which case do 
> we need to add a new method for doing locale aware comparisons?

Unicode strings should not get any locale or collation specific
methods. Instead this feature should be implemented elsewhere
and the strings in question passed to this new function or

> Should locale be a property of the string, an argument passed to 
> upper/lower/isupper/islower/swapcase/capitalize/sort or global state 
> (locale module...)?

No. See above.

> Should doing a locale aware comparison of two strings with different 
> locales throw an exception?

No, assigning locales to strings is not going to work and
we should not go down that road.

It's better to have locale aware functions for certain operations,
so that you can pass your Unicode strings to these function
instead of binding additional context information to the Unicode
strings themselves.

> Should locales be represented as objects or just a string like "en_GB"?

I think the easiest way to get the collation algorithm implemented
is by using a similar scheme as for codecs: you pass a collation
name to a central function and get back a collation object that
implements the collation in form of a key method and a compare
Date User Action Args
2008-12-20 19:52:51lemburgsetrecipients: + lemburg, loewis, ezio.melotti, alexs
2008-12-20 19:52:50lemburglinkissue4610 messages
2008-12-20 19:52:49lemburgcreate