msg90733 - (view) |
Author: Peter Landgren (PeterL) |
Date: 2009-07-20 16:43 |
string.lowercase is changed after locale.setlocale(locale.LC_ALL,'') in
Windows XP but not in Linux.
This little test script on Windows XP and Linux explains the problem:
import locale
import string
print string.lowercase
print locale.setlocale(locale.LC_ALL,'C')
print string.lowercase
print locale.setlocale(locale.LC_ALL,'')
print string.lowercase
Result on Win XP with Python 2.5.1:
abcdefghijklmnopqrstuvwxyz
C
abcdefghijklmnopqrstuvwxyz
Swedish_Sweden.1252
abcdefghijklmnopqrstuvwxyzâܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´±‗¾¶§÷°¨·¹³²■
Result on Linux with Python 2.5.2:
abcdefghijklmnopqrstuvwxyz
C
abcdefghijklmnopqrstuvwxyz
sv_SE.UTF-8
abcdefghijklmnopqrstuvwxyz
|
msg90734 - (view) |
Author: Peter Landgren (PeterL) |
Date: 2009-07-20 17:35 |
Thru, but later in the application code like this
a = u"qaz" + string.lowercase[26]
causes
a = u"qaz" + string.lowercase[26]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 0:
ordinal not in range(128)
0x83 corresponds to â.
|
msg90735 - (view) |
Author: Peter Landgren (PeterL) |
Date: 2009-07-20 17:35 |
True, but later in the application code like this
a = u"qaz" + string.lowercase[26]
causes
a = u"qaz" + string.lowercase[26]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 0:
ordinal not in range(128)
0x83 corresponds to â.
|
msg90736 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2009-07-20 17:59 |
That's still not a crash...a crash is when the python interpreter fails.
2.5 isn't getting bug fixes any more. Do you see the same problem in
2.6? In 3.x this is a non-issue, since 3.x uses unicode internally.
|
msg90737 - (view) |
Author: Peter Landgren (PeterL) |
Date: 2009-07-20 18:30 |
OK about 2.5
Downloaded and installed Python 2.6.2 on my Win XP box and get the same
error as with Python 2.5.1.
Ok about Python 3, it will be nice when we have upgraded our
application, Gramps, to this version and get rid of all kind of coding
issues.
|
msg90740 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2009-07-20 19:09 |
Hmm. I thought I remembered looking at this before. See (closed) issue
1633600. It looks like the linux issue is fixed in 2.7, but I'm not
sure when or how, nor can I reproduce my test or yours at the moment
since I seem to have a configuration problem on my linux system.
|
msg90749 - (view) |
Author: Peter Landgren (PeterL) |
Date: 2009-07-21 06:24 |
Just some more test. I compared the result of string.letters,
string.uppercase and string.lowercase in 2.5 and 2.6:
Python25:
Letters=
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzâèîÄܣ׃¬Á║└┴┬├─┼ãÃ
╚╔╩╦╠═╬¤ðÐÊËÈıÍÏ┘┌█▄¦Ì▀ÓßÔÒõÕµþÞÚÛÙýݯ´±‗¾¶§÷°¨·¹³²■
Upper= ABCDEFGHIJKLMNOPQRSTUVWXYZèîă└┴┬├─┼ãÃ╚╔╩╦╠═╬¤ðÐÊËÈıÍÏ┘┌█▄¦Ì
Lower= abcdefghijklmnopqrstuvwxyzâܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´±‗¾¶§÷°¨·¹³²■
Python26:
Letters=
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzƒSOZsozYªµºÀÁÂÃÄÅÆÇ
ÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
Upper= ABCDEFGHIJKLMNOPQRSTUVWXYZSOZYÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
Lower= abcdefghijklmnopqrstuvwxyzƒsozªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
They return different contents, but the length are the same!
|
msg90809 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2009-07-22 11:45 |
This behavior is not a bug - when setting the locale, string.lowercase
and friends are augmented by whatever the locale considers uppercase and
lowercase letters, as byte strings. This will lead to decoding errors
when these strings are combined with Unicode strings.
Either you use string.ascii_lowercase and friends, or you make sure you
know what encoding the strings will be in, and decode accordingly.
|
msg90811 - (view) |
Author: Peter Landgren (PeterL) |
Date: 2009-07-22 12:01 |
OK,
Agreed for 2.6.
But for 2.5 many of the characters returned by string.lowercase:
âܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´±‗¾¶§÷°¨·¹³²■
are not lowercase letters at all, but that is history now, as 2.5 is history.
We solved it by using ascii_lowercase.
Thanks,
Peter Landgren
> Georg Brandl <georg@python.org> added the comment:
>
> This behavior is not a bug - when setting the locale, string.lowercase
> and friends are augmented by whatever the locale considers uppercase and
> lowercase letters, as byte strings. This will lead to decoding errors
> when these strings are combined with Unicode strings.
>
> Either you use string.ascii_lowercase and friends, or you make sure you
> know what encoding the strings will be in, and decode accordingly.
>
> ----------
> nosy: +georg.brandl
> resolution: -> wont fix
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue6525>
> _______________________________________
|
msg90836 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2009-07-23 00:08 |
I did some test as well and here is what I got:
Python2.4 WinXP:
>>> import locale
>>> import string
>>> locale.setlocale(locale.LC_ALL, '')
'Italian_Italy.1252'
>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz\x83\x9a\x9c\x9e\xaa\xb5\xba\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> print string.lowercase
abcdefghijklmnopqrstuvwxyzâܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´±‗¾¶§÷°¨·¹³²■
>>> import unicodedata
>>> set(map(unicodedata.category, string.lowercase.decode('windows-1252')))
set(['Ll'])
Python2.6 WinXP:
>>> import locale
>>> import string
>>> locale.setlocale(locale.LC_ALL, '')
'Italian_Italy.1252'
>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz\x83\x9a\x9c\x9e\xaa\xb5\xba\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> print string.lowercase
abcdefghijklmnopqrstuvwxyzƒsozªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
>>> import unicodedata
>>> set(map(unicodedata.category, string.lowercase.decode('windows-1252')))
set(['Ll'])
As you can see both the strings are equivalent and all the chars
correctly belong to the Ll (letter, lowercase) Unicode category. For
some reason they look different only when they are printed.
If these chars are not added to string.lowercase on Linux when you
change the locale, then it's a bug.
Can you reproduce it with recent versions of Python?
|
msg90841 - (view) |
Author: Peter Landgren (PeterL) |
Date: 2009-07-23 07:16 |
Obviously, 2.5 and 2.6 decode the "string.lowercase" when print is used and 2.6 seems to
be the correct.
Yes. I get exactly the same result in both
Python 2.5.2 (r252:60911, Jan 8 2009, 12:17:37)
and
Python 2.6.2 (r262:71600, Jul 23 2009, 09:01:02)
showing that string.lowercase does NOT change with locale.
'sv_SE.UTF-8'
>>> a = string.lowercase
>>> len(a)
26
>>> a
'abcdefghijklmnopqrstuvwxyz'
>>> print a
abcdefghijklmnopqrstuvwxyz
>>> string.ascii_lowercase == string.lowercase
True
>>>
|
msg90913 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2009-07-25 09:41 |
I reproduced the issue on my Linux machine. Regardless of the locale I
use, string.lowercase/uppercase/letters is always equal to
string.ascii_lowercase.
On windows instead, other letters are added.
|
msg90914 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2009-07-25 10:35 |
This seems to be normal when using an UTF-8 locale. For (e.g.) 'de_DE'
string.lowercase is changed here, for 'de_DE.utf-8' it isn't.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:51 | admin | set | github: 50774 |
2009-07-25 10:35:16 | georg.brandl | set | status: open -> closed resolution: wont fix messages:
+ msg90914
|
2009-07-25 09:41:49 | ezio.melotti | set | status: closed -> open resolution: wont fix -> (no value) messages:
+ msg90913
title: Problem with string.lowercase in Windows XP -> string.lowercase/uppercase/letters not affected by locale changes on linux |
2009-07-23 07:16:16 | PeterL | set | messages:
+ msg90841 |
2009-07-23 00:08:45 | ezio.melotti | set | messages:
+ msg90836 |
2009-07-22 12:01:07 | PeterL | set | messages:
+ msg90811 |
2009-07-22 11:45:08 | georg.brandl | set | status: open -> closed
nosy:
+ georg.brandl messages:
+ msg90809
resolution: wont fix |
2009-07-21 06:41:40 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2009-07-21 06:24:25 | PeterL | set | messages:
+ msg90749 |
2009-07-20 19:09:15 | r.david.murray | set | messages:
+ msg90740 versions:
+ Python 2.6, Python 2.7, - Python 2.5 |
2009-07-20 18:30:02 | PeterL | set | messages:
+ msg90737 |
2009-07-20 17:59:49 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg90736
|
2009-07-20 17:35:37 | PeterL | set | messages:
+ msg90735 |
2009-07-20 17:35:18 | PeterL | set | messages:
+ msg90734 |
2009-07-20 17:18:57 | r.david.murray | set | priority: normal type: crash -> behavior |
2009-07-20 16:43:40 | PeterL | create | |