This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ezio.melotti, ned.deily, pnugues, vstinner
Date 2015-01-08.21:27:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1420752447.09.0.767155121802.issue23195@psf.upfronthosting.co.za>
In-reply-to
Content
locale.strxfrm() have a different implementation in Python 2 and in Python 3:
- Python 2 uses strxfrm(), so works on bytes strings
- Python 3 uses wcsxfrm(), so works on multibyte strings ("unicode" strings)

It looks like Python 2 and 3 have the same behaviour on Mac OS X: the list is not sorted as expected. Test on Mac OS X 10.9.2.

Imac-Photo:~ haypo$ cat collate2.py 
#coding:utf8
import locale, random
locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")
print("LC_COLLATE = %s" % locale.setlocale(locale.LC_COLLATE, None))
a = ["A", "E", "Z", "\xc9", "a", "e", "\xe9", "z"]
random.shuffle(a)
print(sorted(a))
print(sorted(a, key=locale.strxfrm))

Imac-Photo:~ haypo$ cat collate3.py 
#coding:utf8
import locale, random
locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")
print("LC_COLLATE = %s" % locale.setlocale(locale.LC_COLLATE, None))
a = ["A", "E", "Z", "\xc9", "a", "e", "\xe9", "z"]
random.shuffle(a)
print(ascii(sorted(a)))
print(ascii(sorted(a, key=locale.strxfrm)))

Imac-Photo:~ haypo$ LC_ALL=fr_FR.utf8 python collate2.py 
LC_COLLATE = fr_FR.UTF-8
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']

Imac-Photo:~ haypo$ LC_ALL=fr_FR.utf8 ~/prog/python/default/python.exe ~/collate3.py 
LC_COLLATE = fr_FR.UTF-8
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']

On Linux, I get the expected order with Python 3:

LC_COLLATE = fr_FR.UTF-8
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']
['a', 'A', 'e', 'E', '\xe9', '\xc9', 'z', 'Z']

On Linux, Python 2 gives me a strange order. It's maybe an issue in my program:

haypo@selma$ python x.py 
LC_COLLATE = fr_FR.UTF-8
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']
['\xe9', '\xc9', 'a', 'A', 'e', 'E', 'z', 'Z']
History
Date User Action Args
2015-01-08 21:27:27vstinnersetrecipients: + vstinner, ned.deily, ezio.melotti, pnugues
2015-01-08 21:27:27vstinnersetmessageid: <1420752447.09.0.767155121802.issue23195@psf.upfronthosting.co.za>
2015-01-08 21:27:27vstinnerlinkissue23195 messages
2015-01-08 21:27:26vstinnercreate