Message 78122 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	alexs, ezio.melotti, lemburg, loewis
Date	2008-12-20.19:52:49
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<494D4D10.8040806@egenix.com>
In-reply-to	<1229789954.16.0.56126897745.issue4610@psf.upfronthosting.co.za>

Content
On 2008-12-20 17:19, Alex Stapleton wrote: > Alex Stapleton <alexs@prol.etari.at> added the comment: > > I am trying to get a PEP together for this. Does anyone have any thoughts > on how to handle comparison between unicode strings in a locale aware > situation? Some thoughts: * the Unicode implementation must stay locale independent * we should implement the Unicode collation algorithm (TR#10, http://unicode.org/reports/tr10/) * which collation to use should be a parameter of a function or object initializer and it should be possible to use multiple collations in the same application (without switching the locale) * the terms "locale" and "collation" should not be mixed; a (default) collation is a property of a locale and there can also be more than one collation per locale The Unicode collation algorithm defines collation in terms of a key function for each collation, so that already fits nicely with the key function parameter of list.sort(). > Should __lt__ and __gt__ be specified as ignoring locale? In which case do > we need to add a new method for doing locale aware comparisons? Unicode strings should not get any locale or collation specific methods. Instead this feature should be implemented elsewhere and the strings in question passed to this new function or object. > Should locale be a property of the string, an argument passed to > upper/lower/isupper/islower/swapcase/capitalize/sort or global state > (locale module...)? No. See above. > Should doing a locale aware comparison of two strings with different > locales throw an exception? No, assigning locales to strings is not going to work and we should not go down that road. It's better to have locale aware functions for certain operations, so that you can pass your Unicode strings to these function instead of binding additional context information to the Unicode strings themselves. > Should locales be represented as objects or just a string like "en_GB"? I think the easiest way to get the collation algorithm implemented is by using a similar scheme as for codecs: you pass a collation name to a central function and get back a collation object that implements the collation in form of a key method and a compare method.

On 2008-12-20 17:19, Alex Stapleton wrote:
> Alex Stapleton <alexs@prol.etari.at> added the comment:
> 
> I am trying to get a PEP together for this. Does anyone have any thoughts 
> on how to handle comparison between unicode strings in a locale aware 
> situation?

Some thoughts:

 * the Unicode implementation *must* stay locale independent

 * we should implement the Unicode collation algorithm
   (TR#10, http://unicode.org/reports/tr10/)

 * which collation to use should be a parameter of a function
   or object initializer and it should be possible to use
   multiple collations in the same application (without switching
   the locale)

 * the terms "locale" and "collation" should not be mixed;
   a (default) collation is a property of a locale and there can
   also be more than one collation per locale

The Unicode collation algorithm defines collation in terms of a
key function for each collation, so that already fits nicely with
the key function parameter of list.sort().

> Should __lt__ and __gt__ be specified as ignoring locale? In which case do 
> we need to add a new method for doing locale aware comparisons?

Unicode strings should not get any locale or collation specific
methods. Instead this feature should be implemented elsewhere
and the strings in question passed to this new function or
object.

> Should locale be a property of the string, an argument passed to 
> upper/lower/isupper/islower/swapcase/capitalize/sort or global state 
> (locale module...)?

No. See above.

> Should doing a locale aware comparison of two strings with different 
> locales throw an exception?

No, assigning locales to strings is not going to work and
we should not go down that road.

It's better to have locale aware functions for certain operations,
so that you can pass your Unicode strings to these function
instead of binding additional context information to the Unicode
strings themselves.

> Should locales be represented as objects or just a string like "en_GB"?

I think the easiest way to get the collation algorithm implemented
is by using a similar scheme as for codecs: you pass a collation
name to a central function and get back a collation object that
implements the collation in form of a key method and a compare
method.

History
Date	User	Action	Args
2008-12-20 19:52:51	lemburg	set	recipients: + lemburg, loewis, ezio.melotti, alexs
2008-12-20 19:52:50	lemburg	link	issue4610 messages
2008-12-20 19:52:49	lemburg	create