Issue1813
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008-01-12 15:00 by arnimar, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
verify_locale.py | arnimar, 2008-01-12 15:00 | Program to verify bug/fix | ||
turklocale.patch | pitrou, 2008-02-16 20:04 |
Messages (31) | |||
---|---|---|---|
msg59821 - (view) | Author: Árni Már Jónsson (arnimar) | Date: 2008-01-12 15:00 | |
When switching to a turkish locale, the codecs registry fails on a codec lookup which worked before the locale change. This happens when the codec name contains an uppercase 'I'. What happens, is just before doing a cache lookup, the string is normalized, which includes a call to <ctype.h>'s tolower. tolower is locale dependant, and the turkish locale handles 'I's different from other locales. Thus, the lookup fails, since the normalization behaves differently then it did before. Replacing the tolower() call with this made the lookup work: int my_tolower(char c) { if ('A' <= c && c <= 'Z') c += 32; return c; } PS: If the turkish locale is not supported, this here will enable it to an Ubuntu system a) sudo cp /usr/share/i18n/SUPPORTED /var/lib/locales/supported.d/local (or just copy the lines with "tr" in them) b) sudo dpkg-reconfigure locales |
|||
msg62386 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2008-02-14 10:52 | |
I can confirm this on SVN trunk on a Mandriva system. |
|||
msg62433 - (view) | Author: Árni Már Jónsson (arnimar) | Date: 2008-02-15 16:36 | |
There is more to this bug than appears. I'm guessing that the name mangling code in locale (e.g. the normalizing code) is locale dependent. See this example: #!/usr/bin/python2.5 import locale print 'TR', locale.normalize('tr') print locale.setlocale(locale.LC_ALL, ('tr_TR', 'ISO8859-9')) # first issue, not quite the same coming out, as came in print locale.getlocale() # and this fails print locale.setlocale(locale.LC_ALL, ('tr_TR', 'ISO8859-9')) First, the value returned from getlocale is ('tr_TR', 'so8859-9'), not ('tr_TR', 'ISO8859-9'), and the second setlocale fails. |
|||
msg62463 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2008-02-16 19:34 | |
The C library's tolower() and toupper() are used in a handful of source files. It might make sense to replace some of those calls with ascii-only versions of the corresponding functions. Modules/_sre.c: return ((ch) < 256 ? (unsigned int)tolower((ch)) : ch); Modules/_sqlite/cursor.c: *dst++ = tolower(*src++); Modules/stropmodule.c: *s_new = tolower(c); Modules/stropmodule.c: *s_new = toupper(c); Modules/stropmodule.c: *s_new = toupper(c); Modules/stropmodule.c: *s_new = tolower(c); Modules/stropmodule.c: *s_new = toupper(c); Modules/stropmodule.c: *s_new = tolower(c); Modules/unicodedata.c: h = (h * scale) + (unsigned char) toupper(Py_CHARMASK(s[i])); Modules/unicodedata.c: if (toupper(Py_CHARMASK(name[i])) != buffer[i]) Modules/_tkinter.c: argv0[0] = tolower(Py_CHARMASK(argv0[0])); Modules/binascii.c: c = tolower(c); Objects/stringobject.c: s[i] = _tolower(c); Objects/stringobject.c: s[i] = _toupper(c); Objects/stringobject.c: c = toupper(c); Objects/stringobject.c: c = tolower(c); Objects/stringobject.c: *s_new = toupper(c); Objects/stringobject.c: *s_new = tolower(c); Objects/stringobject.c: *s_new = toupper(c); Objects/stringobject.c: *s_new = tolower(c); Parser/tokenizer.c: else buf[i] = tolower(c); Python/codecs.c: ch = tolower(Py_CHARMASK(ch)); Python/dynload_win.c: first = tolower(*string1); Python/dynload_win.c: second = tolower(*string2); Python/pystrcmp.c: while ((--size > 0) && (tolower(*s1) == tolower(*s2))) { Python/pystrcmp.c: return tolower(*s1) - tolower(*s2); Python/pystrcmp.c: while (*s1 && (tolower(*s1++) == tolower(*s2++))) { Python/pystrcmp.c: return (tolower(*s1) - tolower(*s2)); |
|||
msg62464 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2008-02-16 19:58 | |
As for the .upper() and .lower() methods, they are used in quite a bunch of standard library modules :-/... Lib/base64.py Lib/BaseHTTPServer.py Lib/bsddb/test/test_compare.py Lib/bsddb/test/test_dbobj.py Lib/CGIHTTPServer.py Lib/cgi.py Lib/compiler/ast.py Lib/ConfigParser.py Lib/cookielib.py Lib/Cookie.py Lib/csv.py Lib/ctypes/test/test_byteswap.py Lib/ctypes/util.py Lib/decimal.py Lib/distutils/command/bdist_rpm.py Lib/distutils/command/bdist_wininst.py Lib/distutils/command/register.py Lib/distutils/msvc9compiler.py Lib/distutils/msvccompiler.py Lib/distutils/sysconfig.py Lib/distutils/tests/test_dist.py Lib/distutils/util.py Lib/email/charset.py Lib/email/encoders.py Lib/email/header.py Lib/email/__init__.py Lib/email/message.py Lib/email/_parseaddr.py Lib/email/test/test_email.py Lib/email/test/test_email_renamed.py Lib/encodings/idna.py Lib/encodings/punycode.py Lib/formatter.py Lib/ftplib.py Lib/gettext.py Lib/htmllib.py Lib/HTMLParser.py Lib/httplib.py Lib/idlelib/configDialog.py Lib/idlelib/EditorWindow.py Lib/idlelib/IOBinding.py Lib/idlelib/keybindingDialog.py Lib/idlelib/PyShell.py Lib/idlelib/SearchDialogBase.py Lib/idlelib/tabbedpages.py Lib/idlelib/TreeWidget.py Lib/imaplib.py Lib/inspect.py Lib/lib-tk/turtle.py Lib/locale.py Lib/logging/handlers.py Lib/logging/__init__.py Lib/_LWPCookieJar.py Lib/macpath.py Lib/mailcap.py Lib/markupbase.py Lib/mhlib.py Lib/mimetools.py Lib/mimetypes.py Lib/mimify.py Lib/msilib/__init__.py Lib/nntplib.py Lib/ntpath.py Lib/nturl2path.py Lib/optparse.py Lib/os2emxpath.py Lib/os.py Lib/pdb.py Lib/plat-irix5/flp.py Lib/plat-irix6/flp.py Lib/plat-mac/buildtools.py Lib/plat-mac/gensuitemodule.py Lib/plat-riscos/riscospath.py Lib/pyclbr.py Lib/rfc822.py Lib/robotparser.py Lib/sgmllib.py Lib/SimpleHTTPServer.py Lib/smtpd.py Lib/smtplib.py Lib/socket.py Lib/sqlite3/test/hooks.py Lib/sre_constants.py Lib/stringold.py Lib/stringprep.py Lib/string.py Lib/_strptime.py Lib/subprocess.py Lib/test/regrtest.py Lib/test/test_bigmem.py Lib/test/test_codeccallbacks.py Lib/test/test_codecs.py Lib/test/test_cookielib.py Lib/test/test_datetime.py Lib/test/test_decimal.py Lib/test/test_deque.py Lib/test/test_descr.py Lib/test/test_fileinput.py Lib/test/test_grp.py Lib/test/test_hmac.py Lib/test/test_httplib.py Lib/test/test_os.py Lib/test/test_smtplib.py Lib/test/test_sort.py Lib/test/test_ssl.py Lib/test/test_strop.py Lib/test/test_strptime.py Lib/test/test_support.py Lib/test/test_ucn.py Lib/test/test_unicodedata.py Lib/test/test_urllib2.py Lib/test/test_urllib.py Lib/test/test_wsgiref.py Lib/test/test_xmlrpc.py Lib/urllib2.py Lib/urllib.py Lib/urlparse.py Lib/UserString.py Lib/uuid.py Lib/warnings.py Lib/webbrowser.py Lib/wsgiref/handlers.py Lib/wsgiref/headers.py Lib/wsgiref/simple_server.py Lib/wsgiref/util.py Lib/wsgiref/validate.py Lib/xml/dom/minidom.py Lib/xml/dom/xmlbuilder.py Lib/xmllib.py |
|||
msg62466 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2008-02-16 20:04 | |
Even if we don't fix all uses of (?to)(lower|upper) in the source tree, I think it's important that codec and locale lookup work properly when the current locale defines non-latin case folding for latin characters. Here is a patch. Perhaps also the str type should grow ascii_lower() and ascii_upper() methods, since many cases of using lower() and upper() actually assume ascii semantics (e.g. for parsing of HTTP or SMTP headers). |
|||
msg62472 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2008-02-16 22:20 | |
I agree that it's a bit unfortunate that the 8-bit string APIs in Python use the locale aware C functions per default (this should really be reversed: there should be locale-aware .upper() and .lower() methods and the the standard ones should work just like the Unicode ones - without dependency on the locale, using ASCII mappings), but for historical reasons this cannot easily be changed. .lower() and .upper() for 8-bit strings were always locale dependent and before the addition of Unicode, setting the locale was the most common way to make an application understand different character sets. In Python 3k the problem will probably go away, since .lower() and .upper() will then no longer depend on the locale. Perhaps we should just convert a few of the cases you found to using Unicode strings instead of 8-bit strings in 2.6 ?! That would both make the code more portable and also provide a clear statement of "this is a text string", making porting to Py3k easier. |
|||
msg64109 - (view) | Author: Sean Reifschneider (jafo) * | Date: 2008-03-19 21:44 | |
Marc-Andre: How should we proceed with this bug? Discuss on python-dev or c.l.python? |
|||
msg64162 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2008-03-20 10:20 | |
Sean: I'd suggest to discuss this on python-dev. Note that even if we do use Unicode for the cases in question, the Turkish locale will still pose a problem - see #1528802 for a discussion. |
|||
msg111605 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010-07-26 12:24 | |
Does anyone know if this was discussed on python-dev? I've tried searching the archives and didn't find anything, but that's not to say it isn't there. |
|||
msg111765 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-07-28 02:16 | |
There is also a locale normalization function in unicodeobject.c: normalize_encoding(). This function uses "if (ISUPPER(*e)) *l++ = TOLOWER(*e++);" which uses the Python, *locale-independent*, implementation of ctype. We should maybe use the ISUPPER / TOLOWER in codecs.c. Anyway, a function should be fixed, but I don't know which one :-) |
|||
msg119686 - (view) | Author: Dirkjan Ochtman (djc) * | Date: 2010-10-27 10:30 | |
We've included this patch in Gentoo for about two years now. Can we get some discussion going on doing something like this? |
|||
msg119692 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-10-27 11:27 | |
Looking at this again, I think we should change the codec registry C code to use Py_TOLOWER() and the encoding search function code to use the .translate() approach that Antoine suggested. |
|||
msg140399 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-07-15 09:14 | |
The decimal module has been fixed in Python 2.7, 3.2 and 3.3 for Turkish local: issue #11830. |
|||
msg141028 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-07-24 00:43 | |
New changeset 92d02de91cc9 by Antoine Pitrou in branch '3.2': Issue #1813: Fix codec lookup under Turkish locales. http://hg.python.org/cpython/rev/92d02de91cc9 New changeset a77a4df54b95 by Antoine Pitrou in branch '3.2': Add a test for issue #1813: getlocale() failing under a Turkish locale http://hg.python.org/cpython/rev/a77a4df54b95 New changeset fe0caf8c48d2 by Antoine Pitrou in branch 'default': Add a test for issue #1813: getlocale() failing under a Turkish locale http://hg.python.org/cpython/rev/fe0caf8c48d2 |
|||
msg141029 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-07-24 00:52 | |
New changeset 739958134fe5 by Antoine Pitrou in branch '2.7': Issue #1813: Fix codec lookup and setting/getting locales under Turkish locales. http://hg.python.org/cpython/rev/739958134fe5 |
|||
msg141030 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-07-24 00:53 | |
Finally fixed in 2.7, 3.2, 3.3! |
|||
msg141190 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-26 22:50 | |
The Fedora bot fails because here ... locale.setlocale(locale.LC_CTYPE, loc) loc = ('tr_TR', 'ISO8859-9'), and apparently setlocale can only handle "tr_TR", but not "tr_TR.ISO8859-9": 144 if (locale) { 145 /* set locale */ 146 result = setlocale(category, locale); 147 if (!result) { 148 /* operation failed, no setting was changed */ 149 PyErr_SetString(Error, "unsupported locale setting"); 150 return NULL; (gdb) p result = setlocale(category, "tr_TR.ISO8859-9") $8 = 0x0 (gdb) p result = setlocale(category, "tr_TR") $9 = 0x96d770 "tr_TR" (gdb) p locale $10 = 0x7ffff0f6a5b0 "tr_TR.ISO8859-9" (gdb) |
|||
msg141191 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-26 23:01 | |
Stefan Krah <report@bugs.python.org> wrote: > (gdb) p result = setlocale(category, "tr_TR.ISO8859-9") > $8 = 0x0 > (gdb) p result = setlocale(category, "tr_TR") > $9 = 0x96d770 "tr_TR" > (gdb) p locale > $10 = 0x7ffff0f6a5b0 "tr_TR.ISO8859-9" > (gdb) Perhaps this is a bug in Fedora's setlocale that can't handle the turkish 'I' in 'ISO' when CTYPE is turkish. |
|||
msg141193 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-07-26 23:02 | |
> Stefan Krah <report@bugs.python.org> wrote: > > (gdb) p result = setlocale(category, "tr_TR.ISO8859-9") > > $8 = 0x0 > > (gdb) p result = setlocale(category, "tr_TR") > > $9 = 0x96d770 "tr_TR" > > (gdb) p locale > > $10 = 0x7ffff0f6a5b0 "tr_TR.ISO8859-9" > > (gdb) > > Perhaps this is a bug in Fedora's setlocale that can't handle the turkish 'I' > in 'ISO' when CTYPE is turkish. Perhaps indeed. Maybe you should try to report it. It does look like an OS bug in any case. (fortunately that buildbot is in the "unstable" bunch :-)) |
|||
msg141196 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-26 23:34 | |
Yes, it's a bug. This works: #include <stdio.h> #include <locale.h> int main(void) { char *s; printf("%s\n", setlocale(LC_CTYPE, "tr_TR.ISO8859-9")); printf("%s\n", setlocale(LC_CTYPE, NULL)); s = setlocale(LC_CTYPE, "tr_TR.ISO8859-9"); printf("%s\n", s ? s : "null"); return 0; } But when I change the first setlocale call to "tr_TR", the result of the last call is NULL. |
|||
msg141262 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-07-27 18:42 | |
I'm seeing this test failure in Gentoo, as well. |
|||
msg141322 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-28 23:10 | |
Fedora bug report: https://bugzilla.redhat.com/show_bug.cgi?id=726536 |
|||
msg141550 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-08-02 09:41 | |
Unrelated to the Fedora issue: The test is currently skipped on the FreeBSD bot, but completes successfully with: diff -r 0b52b6f1bfab Lib/test/test_locale.py --- a/Lib/test/test_locale.py Tue Aug 02 10:16:45 2011 +0200 +++ b/Lib/test/test_locale.py Tue Aug 02 11:37:39 2011 +0200 @@ -399,7 +399,7 @@ oldlocale = locale.setlocale(locale.LC_CTYPE) self.addCleanup(locale.setlocale, locale.LC_CTYPE, oldlocale) try: - locale.setlocale(locale.LC_CTYPE, 'tr_TR') + locale.setlocale(locale.LC_CTYPE, 'tr_TR.UTF-8') except locale.Error: # Unsupported locale on this system self.skipTest('test needs Turkish locale') |
|||
msg141551 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-08-02 10:21 | |
As I wrote on python-dev, this test also fails on Debian lenny, which has the same setlocale() bug as Fedora. So, indeed the test should be skipped on a multitude of platforms. |
|||
msg141559 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-08-02 11:34 | |
On Tue, 02 Aug 2011 12:12:37 +0200, Stefan Krah <stefan@bytereef.org> wrote: > I suspect many buildbots are green because they don't have tr_TR and > tr_TR.iso8859-9 installed. This is true for my Gentoo buildbots. Once we've figured out the best way to handle this, I'll fix that (install the other locales) for my two. When I run the C test program I get null as the final output of that regardless of whether I use 'tr_TR' or 'tr_TR.utf8'. This is with glibc-2.13-r2 (the r2 is Gentoo's mod number). As someone pointed out on python-dev, if this isn't fixable then it should be an expected failure, not a skip. One question is, is there any platform on which the turkish locale is installed where this test actually works? |
|||
msg141561 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-08-02 12:01 | |
[Re-opening to fix the skips] Yes, the test works on: Ubuntu Lucid (libc-2.11.1), OpenSUSE (libc-2.11.1), FreeBSD-8.2 Failure: Fedora 14 (libc-2.13), Debian lenny (libc-2.7), Gentoo (libc-2.13-r2) So perhaps this test should be marked as expected failure on Linux altogether (unless we test for the libc version). |
|||
msg141562 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-08-02 12:06 | |
> As someone pointed out on python-dev, if this isn't fixable then it > should be an expected failure, not a skip. The Python bug is fixed, the problem is apparently some libcs have the same bug as we did... > One question is, is there any platform on which the turkish locale is > installed where this test actually works? Well, it works here (Mageia). |
|||
msg143954 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-09-13 11:39 | |
https://bugzilla.redhat.com/show_bug.cgi?id=726536 claims that the glibc issue (which is relevant for skipping the test case) is fixed in glibc-2.14.90-8. I suspect the only way of running the test case reliably is whitelisting a couple of known good glibc versions. |
|||
msg152461 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-02-02 15:59 | |
New changeset a55ffb6c1993 by Stefan Krah in branch '3.2': Issue #1813: Revert workaround for a glibc bug on the Fedora buildbot. http://hg.python.org/cpython/rev/a55ffb6c1993 New changeset 4244e4348362 by Stefan Krah in branch 'default': Issue #1813: merge changeset that reverts a glibc workaround for the http://hg.python.org/cpython/rev/4244e4348362 New changeset 0b8917fc6db5 by Stefan Krah in branch '2.7': Issue #1813: backport changeset that reverts a glibc workaround for the http://hg.python.org/cpython/rev/0b8917fc6db5 |
|||
msg152462 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-02-02 16:06 | |
I've upgraded the Fedora buildbot to Fedora-16. The specific glibc workaround should not be necessary any more. So the test will now fail again on all systems that a) have the bug and b) the tr_Tr locale. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:29 | admin | set | github: 46138 |
2012-02-04 04:04:36 | Arfrever | set | status: open -> closed |
2012-02-02 16:06:37 | skrah | set | messages: + msg152462 |
2012-02-02 16:00:00 | python-dev | set | messages: + msg152461 |
2011-09-13 11:39:30 | skrah | set | messages: + msg143954 |
2011-08-02 12:06:31 | pitrou | set | messages: + msg141562 |
2011-08-02 12:01:12 | skrah | set | status: closed -> open messages: + msg141561 |
2011-08-02 11:34:29 | r.david.murray | set | messages: + msg141559 |
2011-08-02 10:21:35 | skrah | set | messages: + msg141551 |
2011-08-02 09:41:45 | skrah | set | messages: + msg141550 |
2011-07-28 23:10:01 | skrah | set | messages: + msg141322 |
2011-07-27 18:42:42 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg141262 |
2011-07-26 23:34:59 | skrah | set | messages: + msg141196 |
2011-07-26 23:02:52 | pitrou | set | messages: + msg141193 |
2011-07-26 23:01:52 | skrah | set | messages: + msg141191 |
2011-07-26 22:50:18 | skrah | set | nosy:
+ skrah messages: + msg141190 |
2011-07-24 00:53:04 | pitrou | set | status: open -> closed versions: + Python 3.3, - Python 2.6, Python 3.1 messages: + msg141030 resolution: fixed stage: resolved |
2011-07-24 00:52:27 | python-dev | set | messages: + msg141029 |
2011-07-24 00:43:24 | python-dev | set | nosy:
+ python-dev messages: + msg141028 |
2011-07-15 15:35:32 | Arfrever | set | nosy:
+ Arfrever |
2011-07-15 09:14:54 | vstinner | set | messages: + msg140399 |
2011-05-23 20:22:14 | gkcn | set | nosy:
+ gkcn |
2010-10-27 11:27:26 | lemburg | set | messages: + msg119692 |
2010-10-27 10:30:19 | djc | set | nosy:
+ djc messages: + msg119686 |
2010-07-28 02:16:52 | vstinner | set | messages: + msg111765 |
2010-07-26 12:24:23 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg111605 |
2010-05-14 12:38:50 | pitrou | set | nosy:
+ vstinner versions: + Python 3.1, Python 2.7, Python 3.2 |
2010-04-28 14:12:28 | jwilk | set | nosy:
+ jwilk |
2008-03-20 10:20:48 | lemburg | set | messages: + msg64162 |
2008-03-19 21:44:49 | jafo | set | priority: normal assignee: lemburg messages: + msg64109 keywords: + patch nosy: + jafo |
2008-02-16 22:20:15 | lemburg | set | nosy:
+ lemburg messages: + msg62472 |
2008-02-16 20:04:38 | pitrou | set | versions: + Python 2.6, - Python 2.5 |
2008-02-16 20:04:33 | pitrou | set | files:
+ turklocale.patch messages: + msg62466 |
2008-02-16 19:58:26 | pitrou | set | messages: + msg62464 |
2008-02-16 19:34:21 | pitrou | set | messages: + msg62463 |
2008-02-15 16:36:35 | arnimar | set | messages: + msg62433 |
2008-02-14 10:52:10 | pitrou | set | nosy:
+ pitrou messages: + msg62386 |
2008-02-13 23:03:06 | arnimar | set | components: + Library (Lib), - Interpreter Core |
2008-01-12 15:00:02 | arnimar | create |