classification
Title: string.lowercase/uppercase/letters not affected by locale changes on linux
Type: behavior Stage:
Components: Extension Modules Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: PeterL, ezio.melotti, georg.brandl, r.david.murray
Priority: normal Keywords:

Created on 2009-07-20 16:43 by PeterL, last changed 2009-07-25 10:35 by georg.brandl. This issue is now closed.

Messages (13)
msg90733 - (view) Author: Peter Landgren (PeterL) Date: 2009-07-20 16:43
string.lowercase is changed after locale.setlocale(locale.LC_ALL,'') in
Windows XP but not in Linux.
This little test script on Windows XP and Linux explains the problem:

import locale
import string
print string.lowercase
print locale.setlocale(locale.LC_ALL,'C')
print string.lowercase
print locale.setlocale(locale.LC_ALL,'')
print string.lowercase

Result on Win XP with Python 2.5.1:
abcdefghijklmnopqrstuvwxyz
C
abcdefghijklmnopqrstuvwxyz
Swedish_Sweden.1252
abcdefghijklmnopqrstuvwxyzâܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´­±‗¾¶§÷°¨·¹³²■

Result on Linux with Python 2.5.2:
abcdefghijklmnopqrstuvwxyz
C
abcdefghijklmnopqrstuvwxyz
sv_SE.UTF-8
abcdefghijklmnopqrstuvwxyz
msg90734 - (view) Author: Peter Landgren (PeterL) Date: 2009-07-20 17:35
Thru, but later in the application code like this
a = u"qaz" + string.lowercase[26]

causes
   a = u"qaz" + string.lowercase[26]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 0:
ordinal not in range(128)

0x83 corresponds to â.
msg90735 - (view) Author: Peter Landgren (PeterL) Date: 2009-07-20 17:35
True, but later in the application code like this
a = u"qaz" + string.lowercase[26]

causes
   a = u"qaz" + string.lowercase[26]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 0:
ordinal not in range(128)

0x83 corresponds to â.
msg90736 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-07-20 17:59
That's still not a crash...a crash is when the python interpreter fails.

2.5 isn't getting bug fixes any more.  Do you see the same problem in
2.6?  In 3.x this is a non-issue, since 3.x uses unicode internally.
msg90737 - (view) Author: Peter Landgren (PeterL) Date: 2009-07-20 18:30
OK about 2.5
Downloaded and installed Python 2.6.2 on my Win XP box and get the same
error as with Python 2.5.1.

Ok about Python 3, it will be nice when we have upgraded our
application, Gramps, to this version and get rid of all kind of coding
issues.
msg90740 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-07-20 19:09
Hmm.  I thought I remembered looking at this before.  See (closed) issue
1633600.  It looks like the linux issue is fixed in 2.7, but I'm not
sure when or how, nor can I reproduce my test or yours at the moment
since I seem to have a configuration problem on my linux system.
msg90749 - (view) Author: Peter Landgren (PeterL) Date: 2009-07-21 06:24
Just some more test. I compared the result of string.letters,
string.uppercase and string.lowercase in 2.5 and 2.6:

Python25:
Letters=
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzâèîÄܣ׃¬Á║└┴┬├─┼ãÃ
╚╔╩╦╠═╬¤ðÐÊËÈıÍÏ┘┌█▄¦Ì▀ÓßÔÒõÕµþÞÚÛÙýݯ´­±‗¾¶§÷°¨·¹³²■ 
Upper= ABCDEFGHIJKLMNOPQRSTUVWXYZèîă└┴┬├─┼ãÃ╚╔╩╦╠═╬¤ðÐÊËÈıÍÏ┘┌█▄¦Ì
Lower= abcdefghijklmnopqrstuvwxyzâܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´­±‗¾¶§÷°¨·¹³²■ 

Python26:
Letters=
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzƒSOZsozYªµºÀÁÂÃÄÅÆÇ
ÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
Upper= ABCDEFGHIJKLMNOPQRSTUVWXYZSOZYÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
Lower= abcdefghijklmnopqrstuvwxyzƒsozªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ

They return different contents, but the length are the same!
msg90809 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-07-22 11:45
This behavior is not a bug - when setting the locale, string.lowercase
and friends are augmented by whatever the locale considers uppercase and
lowercase letters, as byte strings.  This will lead to decoding errors
when these strings are combined with Unicode strings.

Either you use string.ascii_lowercase and friends, or you make sure you
know what encoding the strings will be in, and decode accordingly.
msg90811 - (view) Author: Peter Landgren (PeterL) Date: 2009-07-22 12:01
OK,
Agreed for 2.6.

But for 2.5 many of the characters returned by string.lowercase:
âܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´­±‗¾¶§÷°¨·¹³²■ 
are not lowercase letters at all, but that is history now, as 2.5 is history.
We solved it by using ascii_lowercase.
Thanks,
Peter Landgren

> Georg Brandl <georg@python.org> added the comment:
>
> This behavior is not a bug - when setting the locale, string.lowercase
> and friends are augmented by whatever the locale considers uppercase and
> lowercase letters, as byte strings.  This will lead to decoding errors
> when these strings are combined with Unicode strings.
>
> Either you use string.ascii_lowercase and friends, or you make sure you
> know what encoding the strings will be in, and decode accordingly.
>
> ----------
> nosy: +georg.brandl
> resolution:  -> wont fix
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue6525>
> _______________________________________
msg90836 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-07-23 00:08
I did some test as well and here is what I got:
Python2.4 WinXP:
>>> import locale
>>> import string
>>> locale.setlocale(locale.LC_ALL, '')
'Italian_Italy.1252'
>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz\x83\x9a\x9c\x9e\xaa\xb5\xba\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> print string.lowercase
abcdefghijklmnopqrstuvwxyzâܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´­±‗¾¶§÷°¨·¹³²■ 
>>> import unicodedata
>>> set(map(unicodedata.category, string.lowercase.decode('windows-1252')))
set(['Ll'])

Python2.6 WinXP:
>>> import locale
>>> import string
>>> locale.setlocale(locale.LC_ALL, '')
'Italian_Italy.1252'
>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz\x83\x9a\x9c\x9e\xaa\xb5\xba\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> print string.lowercase
abcdefghijklmnopqrstuvwxyzƒsozªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
>>> import unicodedata
>>> set(map(unicodedata.category, string.lowercase.decode('windows-1252')))
set(['Ll'])

As you can see both the strings are equivalent and all the chars
correctly belong to the Ll (letter, lowercase) Unicode category. For
some reason they look different only when they are printed.

If these chars are not added to string.lowercase on Linux when you
change the locale, then it's a bug.
Can you reproduce it with recent versions of Python?
msg90841 - (view) Author: Peter Landgren (PeterL) Date: 2009-07-23 07:16
Obviously, 2.5 and 2.6 decode the "string.lowercase"  when print is used and 2.6 seems to 
be the correct.

Yes. I get exactly the same result in both
Python 2.5.2 (r252:60911, Jan  8 2009, 12:17:37)
and
Python 2.6.2 (r262:71600, Jul 23 2009, 09:01:02)
showing that string.lowercase does NOT change with locale.

'sv_SE.UTF-8'
>>> a = string.lowercase
>>> len(a)
26
>>> a
'abcdefghijklmnopqrstuvwxyz'
>>> print a
abcdefghijklmnopqrstuvwxyz
>>> string.ascii_lowercase == string.lowercase
True
>>>
msg90913 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-07-25 09:41
I reproduced the issue on my Linux machine. Regardless of the locale I
use, string.lowercase/uppercase/letters is always equal to
string.ascii_lowercase.
On windows instead, other letters are added.
msg90914 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-07-25 10:35
This seems to be normal when using an UTF-8 locale. For (e.g.) 'de_DE'
string.lowercase is changed here, for 'de_DE.utf-8' it isn't.
History
Date User Action Args
2009-07-25 10:35:16georg.brandlsetstatus: open -> closed
resolution: wont fix
messages: + msg90914
2009-07-25 09:41:49ezio.melottisetstatus: closed -> open
resolution: wont fix -> (no value)
messages: + msg90913

title: Problem with string.lowercase in Windows XP -> string.lowercase/uppercase/letters not affected by locale changes on linux
2009-07-23 07:16:16PeterLsetmessages: + msg90841
2009-07-23 00:08:45ezio.melottisetmessages: + msg90836
2009-07-22 12:01:07PeterLsetmessages: + msg90811
2009-07-22 11:45:08georg.brandlsetstatus: open -> closed

nosy: + georg.brandl
messages: + msg90809

resolution: wont fix
2009-07-21 06:41:40ezio.melottisetnosy: + ezio.melotti
2009-07-21 06:24:25PeterLsetmessages: + msg90749
2009-07-20 19:09:15r.david.murraysetmessages: + msg90740
versions: + Python 2.6, Python 2.7, - Python 2.5
2009-07-20 18:30:02PeterLsetmessages: + msg90737
2009-07-20 17:59:49r.david.murraysetnosy: + r.david.murray
messages: + msg90736
2009-07-20 17:35:37PeterLsetmessages: + msg90735
2009-07-20 17:35:18PeterLsetmessages: + msg90734
2009-07-20 17:18:57r.david.murraysetpriority: normal
type: crash -> behavior
2009-07-20 16:43:40PeterLcreate