Title: locale.strxfrm does not work with Unicode strings
Type: behavior Stage: resolved
Components: Unicode Versions: Python 2.7
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: loewis Nosy List: benjamin.peterson, cito, loewis, saurik
Priority: normal Keywords: patch

Created on 2008-03-25 14:33 by cito, last changed 2020-05-31 12:28 by serhiy.storchaka. This issue is now closed.

File name Uploaded Description Edit
wcsxfrm.diff saurik, 2012-01-02 02:25 Python 2.7.2: Unicode locale.strxfrm()
Messages (7)
msg64484 - (view) Author: Christoph Zwerschke (cito) * Date: 2008-03-25 14:33
While locale.strcoll seems to work with Unicode strings, locale.strxfrm
gives a UnicodeError. Example:


    locale.setlocale(locale.LC_ALL, 'de')
except locale.Error: # Windoof
    locale.setlocale(locale.LC_ALL, 'german')

s = ['Ägypten', 'Zypern']

print sorted(s, cmp=locale.strcoll) # works
print sorted(s, key=locale.strxfrm) # works

s = [u'Ägypten', u'Zypern']

print sorted(s, cmp=locale.strcoll) # works
print sorted(s, key=locale.strxfrm) # UnicodeError


Therefore, it is not possible to sort lists of Unicode strings
effectively. If possible, this should be fixed. If not possible, this
problem should at least be mentioned in the documentation. Currently,
the docs do not indicate that strcoll and strxfrm behave differently
concerning Unicode.
msg64516 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-03-25 21:26
FWIW, this is fixed in Python 3.0.
msg64518 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-03-25 21:29
Can it be backported?
msg64524 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-03-25 21:43
Sure, although it probably shouldn't be backported to 2.5.
msg150438 - (view) Author: Jay Freeman (saurik) (saurik) Date: 2012-01-01 17:57
Given that Python 3.x is still not ready for general use (and when this is discussed people make it quite clear that this is to be expected, and that a many year timeline was originally proposed for the Python 3.0 transition), it seems like this bug fix should have been backported to 2.x at some point in the last four years it has been open. :(
msg150448 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-01-02 00:00
saurik: can you propose a patch?
msg150450 - (view) Author: Jay Freeman (saurik) (saurik) Date: 2012-01-02 02:25
I have attached a tested patch against Python-2.7.2.tgz (as I do not know how to use hg currently). It should be noted that I also am not 100% certain how the Python build environment works, but the way I added the wcsxfrm test was to add it to, then run autoheader and autoconf.

It also should be noted that the original code called strxfrm and did not check for an error result: neither does my new code (which is mostly based on formulaic modifications of the existing code in addition to educated guesses with regards to coding and formatting standards: feel free to change, obviously).

Finally, I noticed while working on this that --enable-unicode=no does not work (there is a check that enforces that it must be either ucs2 or ucs4): seems like an easy fix. That said, I ran into numerous other issues trying to make a non-Unicode build, and in the end gave up. My code looks like it should work, however, were someone to figure out how to build a non-Unicode Python 2.7.
Date User Action Args
2020-05-31 12:28:24serhiy.storchakasetstatus: open -> closed
resolution: out of date
stage: resolved
2012-01-02 02:25:58sauriksetfiles: + wcsxfrm.diff
keywords: + patch
messages: + msg150450
2012-01-02 00:00:49loewissetmessages: + msg150448
2012-01-01 17:57:27sauriksetnosy: + saurik
messages: + msg150438
2010-08-21 22:59:37georg.brandlsetversions: + Python 2.7, - Python 2.6
2008-03-25 21:43:09loewissetmessages: + msg64524
versions: + Python 2.6, - Python 2.5
2008-03-25 21:29:18benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg64518
2008-03-25 21:26:19loewissetmessages: + msg64516
2008-03-25 14:59:05georg.brandlsetassignee: loewis
nosy: + loewis
2008-03-25 14:33:55citocreate