Message206786
When Python 2.6 (or 2.7) compiled with _XOPEN_SOURCE=600 on illumos string.lowercase and string.uppercase contain garbage when UTF-8 locale is used.
(OpenIndiana bug report - https://www.illumos.org/issues/4411 ).
The reason is that with UTF-8 locale islower()/isupper() and similar functions are not expected to work with non-ascii symbols.
So, code like
n = 0;
for (c = 0; c < 256; c++) {
if (islower(c))
buf[n++] = c;
}
is expected to fail, because it calls islower on illegal UTF-8 symbols (with codes 128-255). It should be converted to something like
n = 0;
for (c = 0; c < 256; c++) {
if (isascii(c) && islower(c))
buf[n++] = c;
}
or to
n = 0;
for (c = 0; c < 128; c++) {
if (islower(c))
buf[n++] = c;
}
Before doing this you should check if locale is UTF-8. However, almost all non-C locales on illumos are UTF-8.
Example of incorrect behavior:
Python 2.6.9 (unknown, Nov 12 2013, 13:54:48)
[GCC 4.7.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import string
>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz\\xaa\\xb5\\xba\\xdf\\xe0\\xe1\\xe2\\xe3\\xe4\\xe5\\xe6\\xe7\\xe8\\xe9\\xea\\xeb\\xec\\xed\\xee\\xef\\xf0\\xf1\\xf2\\xf3\\xf4\\xf5\\xf6\\xf8\\xf9\\xfa\\xfb\\xfc\\xfd\\xfe\\xff'
>>> string.uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ\\xc0\\xc1\\xc2\\xc3\\xc4\\xc5\\xc6\\xc7\\xc8\\xc9\\xca\\xcb\\xcc\\xcd\\xce\\xcf\\xd0\\xd1\\xd2\\xd3\\xd4\\xd5\\xd6\\xd8\\xd9\\xda\\xdb\\xdc\\xdd\\xde'
>>> |
|
Date |
User |
Action |
Args |
2013-12-21 21:38:37 | Alexander.Pyhalov | set | recipients:
+ Alexander.Pyhalov, vstinner, ezio.melotti |
2013-12-21 21:38:37 | Alexander.Pyhalov | set | messageid: <1387661917.11.0.929433274966.issue20049@psf.upfronthosting.co.za> |
2013-12-21 21:38:37 | Alexander.Pyhalov | link | issue20049 messages |
2013-12-21 21:38:36 | Alexander.Pyhalov | create | |
|