msg119213 - (view) |
Author: Stephen Hansen (ixokai) |
Date: 2010-10-20 15:31 |
In the course of investigating issue10092, Georg discovered that the behavior of locale.normalize() on Mac is bad.
Basically, "en_US.UTF-8" is how the "correct" locale string should be spelled on the Mac. If you drop the dash, it fails: which locale.normalize does, so you can't pass the return value of the function to setlocale, even though that's what its documented to be for.
If that isn't clear, this should demonstrate (from /branches/py3k):
Top-2:build pythonbuildbot$ ./python.exe
Python 3.2a3+ (py3k:85631, Oct 17 2010, 06:45:22)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
[51767 refs]
>>> locale.normalize("en_US.UTF-8")
'en_US.UTF8'
[51770 refs]
>>> locale.setlocale(locale.LC_TIME, 'en_US.UTF8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/pythonbuildbot/test/build/Lib/locale.py", line 538, in setlocale
return _setlocale(category, locale)
locale.Error: unsupported locale setting
[51816 refs]
>>> locale.setlocale(locale.LC_TIME, 'en_US.UTF-8')
'en_US.UTF-8'
[51816 refs]
The precise same behavior exists on my stock/system Python 2.6, too, fwiw. (Not that it can be fixed on 2.6, but maybe 2.7?)
|
msg119216 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2010-10-20 15:46 |
This patch solves the immediate failure:
Index: Lib/locale.py
===================================================================
--- Lib/locale.py (revision 85743)
+++ Lib/locale.py (working copy)
@@ -396,6 +396,9 @@
else:
encoding = defenc
#print 'found encoding %r' % encoding
+ if sys.platform == 'darwin' and encoding == 'UTF8':
+ encoding = 'UTF-8'
+
if encoding:
return langname + '.' + encoding
else:
I'm not happy about hardcoding this specific exception though, there should be a better solution than this.
Ronald
|
msg119236 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2010-10-20 21:47 |
Ronald Oussoren wrote:
>
> Ronald Oussoren <ronaldoussoren@mac.com> added the comment:
>
> This patch solves the immediate failure:
>
> Index: Lib/locale.py
> ===================================================================
> --- Lib/locale.py (revision 85743)
> +++ Lib/locale.py (working copy)
> @@ -396,6 +396,9 @@
> else:
> encoding = defenc
> #print 'found encoding %r' % encoding
> + if sys.platform == 'darwin' and encoding == 'UTF8':
> + encoding = 'UTF-8'
> +
> if encoding:
> return langname + '.' + encoding
> else:
>
> I'm not happy about hardcoding this specific exception though, there should be a better solution than this.
Could you tell me the values of localename, code, langname and encoding
at that step in the process ?
We may need to add an locale_encoding_alias from 'UTF8' to 'UTF-8',
since the version with the hyphen is what the C lib uses.
|
msg119298 - (view) |
Author: Stephen Hansen (ixokai) |
Date: 2010-10-21 13:53 |
Mark, the locals() right before "if encoding:" (line 399) are:
>>> locale.normalize("en_US.UTF-8")
{'code': 'en_US.ISO8859-1', 'langname': 'en_US', 'encoding': 'UTF8', 'norm_encoding': 'utf_8', 'defenc': 'ISO8859-1', 'localename': 'en_US.UTF-8', 'lookup_name': 'en_us.utf-8', 'fullname': 'en_us.utf-8'}
'en_US.UTF8'
|
msg119301 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2010-10-21 14:15 |
Stephen Hansen wrote:
>
> Stephen Hansen <me+python@ixokai.io> added the comment:
>
> Mark, the locals() right before "if encoding:" (line 399) are:
>
>>>> locale.normalize("en_US.UTF-8")
> {'code': 'en_US.ISO8859-1', 'langname': 'en_US', 'encoding': 'UTF8', 'norm_encoding': 'utf_8', 'defenc': 'ISO8859-1', 'localename': 'en_US.UTF-8', 'lookup_name': 'en_us.utf-8', 'fullname': 'en_us.utf-8'}
> 'en_US.UTF8'
Thanks.
Line 646 in the alias table is wrong:
'utf_8': 'UTF8',
should read:
'utf_8': 'UTF-8',
I wonder why this wasn't reported earlier - did the GlibC change
the UTF-8 spelling at some point ? I do vaguely remember that I
had to remove the hyphen due to problems with setlocale() not
accepting 'UTF-8', but that was at the time I wrote that part
of locale.py, i.e. many years ago.
It doesn't appear to be necessary anymore. I checked on openSUSE
10.3 and 11.3. Both work fine with 'UTF-8' and 'UTF8'.
|
msg119309 - (view) |
Author: Georg Brandl (georg.brandl) * |
Date: 2010-10-21 15:27 |
If other Posix-y systems accept both spellings and only Macs insist on the dash, we should probably indeed change the alias entry to use it.
|
msg122374 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-11-25 15:42 |
Mandriva and Debian also work fine with both "UTF8" and "UTF-8". For the record, the canonical spelling inside /usr/share/locale is "UTF-8". I suppose glibc does its own normalization.
|
msg123553 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2010-12-07 14:29 |
UTF-8 works on SuSE Enterprise Linux 9 and 10 as well.
BTW, neither UTF8 nor UTF-8 work on HPUX 10. That platform requires spelling it as utf8.
This sadly enought means that this code doesn't work on HPUX 10:
>>> locale.setlocale(locale.LC_ALL, locale.getdefaultlocale())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/python2.7/lib/python2.7/locale.py", line 531, in setlocale
return _setlocale(category, locale)
locale.Error: unsupported locale setting
That's because getdefaultlocale returns 'UTF8' as the encoding, even though LANG is set to 'nl_NL.utf8' (which is a working locale on the machine I tested).
BTW. I'm +1 on changing the alias table as Marc-Andre proposed.
|
msg123667 - (view) |
Author: MunSic JEONG (ruseel) |
Date: 2010-12-09 02:34 |
Ubuntu 10.4.1 LTS
also work fine with both "UTF8" and "UTF-8"
|
msg129662 - (view) |
Author: Boris FELD (Boris.FELD) * |
Date: 2011-02-27 22:00 |
Bug confirmed on python2.5+ and python3.2-.
If it works with the dash, is agree with the Marc-Andre solution.
|
msg134271 - (view) |
Author: Piotr Sikora (PiotrSikora) |
Date: 2011-04-22 16:52 |
It's the same on OpenBSD (and I'm pretty sure it's true for other BSDs as well).
>>> locale.resetlocale()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/locale.py", line 523, in resetlocale
_setlocale(category, _build_localename(getdefaultlocale()))
locale.Error: unsupported locale setting
>>> locale._build_localename(locale.getdefaultlocale())
'en_US.UTF8'
Works fine with Marc-Andre's alias table fix.
Any chances this will be eventually fixed in 2.x?
|
msg134450 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2011-04-26 10:18 |
Piotr Sikora wrote:
>
> Piotr Sikora <piotr.sikora@frickle.com> added the comment:
>
> It's the same on OpenBSD (and I'm pretty sure it's true for other BSDs as well).
>
>>>> locale.resetlocale()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/local/lib/python2.6/locale.py", line 523, in resetlocale
> _setlocale(category, _build_localename(getdefaultlocale()))
> locale.Error: unsupported locale setting
>>>> locale._build_localename(locale.getdefaultlocale())
> 'en_US.UTF8'
>
> Works fine with Marc-Andre's alias table fix.
>
> Any chances this will be eventually fixed in 2.x?
This can go into Python 2.7, and, of course, into the 3.x
branches.
|
msg135406 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2011-05-07 07:19 |
The attached patch implements the change that Marc-Andre proposed.
I intend to apply this patch to all active branches later today (after some more testing)
|
msg136150 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-05-17 12:10 |
New changeset 932de36903e7 by Ronald Oussoren in branch '2.7':
(backport)Fix #10154 and #10090: locale normalizes the UTF-8 encoding to "UTF-8" instead of "UTF8"
http://hg.python.org/cpython/rev/932de36903e7
New changeset 28e410eb86af by Ronald Oussoren in branch '3.1':
Fix #10154 and #10090: locale normalizes the UTF-8 encoding to "UTF-8" instead of "UTF8"
http://hg.python.org/cpython/rev/28e410eb86af
New changeset 454d13e535ff by Ronald Oussoren in branch '3.2':
(merge) Fix #10154 and #10090: locale normalizes the UTF-8 encoding to "UTF-8" instead of "UTF8"
http://hg.python.org/cpython/rev/454d13e535ff
|
msg136154 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-05-17 12:49 |
New changeset 3d7cb852a176 by Ronald Oussoren in branch 'default':
Fix for issue 10154, merge from 3.2
http://hg.python.org/cpython/rev/3d7cb852a176
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:07 | admin | set | github: 54363 |
2014-10-02 08:28:35 | serhiy.storchaka | link | issue1176504 superseder |
2011-05-17 12:49:51 | python-dev | set | messages:
+ msg136154 |
2011-05-17 12:14:26 | ronaldoussoren | set | status: open -> closed |
2011-05-17 12:14:03 | ronaldoussoren | set | resolution: fixed stage: needs patch -> resolved |
2011-05-17 12:10:12 | python-dev | set | nosy:
+ python-dev messages:
+ msg136150
|
2011-05-07 08:07:34 | vstinner | set | nosy:
+ vstinner
|
2011-05-07 07:19:43 | ronaldoussoren | set | files:
+ issue10154.patch keywords:
+ patch messages:
+ msg135406
|
2011-04-26 10:18:54 | lemburg | set | messages:
+ msg134450 title: locale.normalize strips "-" from UTF-8, which fails on Mac -> locale.normalize strips "-" from UTF-8, which fails on Mac |
2011-04-23 15:46:10 | eric.araujo | set | title: locale.normalize strips "-" from UTF-8, which fails on Mac -> locale.normalize strips "-" from UTF-8, which fails on Mac stage: needs patch versions:
+ Python 3.3, - Python 2.6, Python 2.5 |
2011-04-22 16:52:24 | PiotrSikora | set | nosy:
+ PiotrSikora messages:
+ msg134271
|
2011-02-27 22:00:08 | Boris.FELD | set | nosy:
+ Boris.FELD
messages:
+ msg129662 versions:
+ Python 2.6, Python 2.5 |
2010-12-09 02:34:31 | ruseel | set | messages:
+ msg123667 |
2010-12-07 14:29:42 | ronaldoussoren | set | messages:
+ msg123553 |
2010-11-25 15:42:48 | pitrou | set | nosy:
+ pitrou messages:
+ msg122374
|
2010-11-25 02:12:40 | ruseel | set | nosy:
+ ruseel
|
2010-10-22 17:37:08 | eric.araujo | link | issue10090 dependencies |
2010-10-21 15:27:04 | georg.brandl | set | nosy:
+ georg.brandl messages:
+ msg119309
|
2010-10-21 14:15:06 | lemburg | set | messages:
+ msg119301 |
2010-10-21 13:53:57 | ixokai | set | messages:
+ msg119298 |
2010-10-20 21:47:40 | lemburg | set | nosy:
+ lemburg title: locale.normalize strips "-" from UTF-8, which fails on Mac -> locale.normalize strips "-" from UTF-8, which fails on Mac messages:
+ msg119236
|
2010-10-20 15:49:01 | ronaldoussoren | set | files:
- smime.p7s |
2010-10-20 15:46:22 | ronaldoussoren | set | files:
+ smime.p7s
messages:
+ msg119216 |
2010-10-20 15:31:23 | ixokai | create | |