classification
Title: locale.normalize strips "-" from UTF-8, which fails on Mac
Type: behavior Stage: resolved
Components: Library (Lib), macOS Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ronaldoussoren Nosy List: Boris.FELD, PiotrSikora, georg.brandl, ixokai, lemburg, pitrou, python-dev, ronaldoussoren, ruseel, vstinner
Priority: normal Keywords: patch

Created on 2010-10-20 15:31 by ixokai, last changed 2011-05-17 12:49 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
issue10154.patch ronaldoussoren, 2011-05-07 07:19 review
Messages (15)
msg119213 - (view) Author: Stephen Hansen (ixokai) (Python triager) Date: 2010-10-20 15:31
In the course of investigating issue10092, Georg discovered that the behavior of locale.normalize() on Mac is bad.

Basically, "en_US.UTF-8" is how the "correct" locale string should be spelled on the Mac. If you drop the dash, it fails: which locale.normalize does, so you can't pass the return value of the function to setlocale, even though that's what its documented to be for.

If that isn't clear, this should demonstrate (from /branches/py3k):


Top-2:build pythonbuildbot$ ./python.exe
Python 3.2a3+ (py3k:85631, Oct 17 2010, 06:45:22) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
[51767 refs]
>>> locale.normalize("en_US.UTF-8")
'en_US.UTF8'
[51770 refs]
>>> locale.setlocale(locale.LC_TIME, 'en_US.UTF8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pythonbuildbot/test/build/Lib/locale.py", line 538, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting
[51816 refs]
>>> locale.setlocale(locale.LC_TIME, 'en_US.UTF-8')
'en_US.UTF-8'
[51816 refs]

The precise same behavior exists on my stock/system Python 2.6, too, fwiw. (Not that it can be fixed on 2.6, but maybe 2.7?)
msg119216 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010-10-20 15:46
This patch solves the immediate failure:

Index: Lib/locale.py
===================================================================
--- Lib/locale.py	(revision 85743)
+++ Lib/locale.py	(working copy)
@@ -396,6 +396,9 @@
         else:
             encoding = defenc
         #print 'found encoding %r' % encoding
+        if sys.platform == 'darwin' and encoding == 'UTF8':
+            encoding = 'UTF-8'
+
         if encoding:
             return langname + '.' + encoding
         else:

I'm not happy about hardcoding this specific exception though, there should be a better solution than this.

Ronald
msg119236 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-10-20 21:47
Ronald Oussoren wrote:
> 
> Ronald Oussoren <ronaldoussoren@mac.com> added the comment:
> 
> This patch solves the immediate failure:
> 
> Index: Lib/locale.py
> ===================================================================
> --- Lib/locale.py	(revision 85743)
> +++ Lib/locale.py	(working copy)
> @@ -396,6 +396,9 @@
>          else:
>              encoding = defenc
>          #print 'found encoding %r' % encoding
> +        if sys.platform == 'darwin' and encoding == 'UTF8':
> +            encoding = 'UTF-8'
> +
>          if encoding:
>              return langname + '.' + encoding
>          else:
> 
> I'm not happy about hardcoding this specific exception though, there should be a better solution than this.

Could you tell me the values of localename, code, langname and encoding
at that step in the process ?

We may need to add an locale_encoding_alias from 'UTF8' to 'UTF-8',
since the version with the hyphen is what the C lib uses.
msg119298 - (view) Author: Stephen Hansen (ixokai) (Python triager) Date: 2010-10-21 13:53
Mark, the locals() right before "if encoding:" (line 399) are:

>>> locale.normalize("en_US.UTF-8")
{'code': 'en_US.ISO8859-1', 'langname': 'en_US', 'encoding': 'UTF8', 'norm_encoding': 'utf_8', 'defenc': 'ISO8859-1', 'localename': 'en_US.UTF-8', 'lookup_name': 'en_us.utf-8', 'fullname': 'en_us.utf-8'}
'en_US.UTF8'
msg119301 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-10-21 14:15
Stephen Hansen wrote:
> 
> Stephen Hansen <me+python@ixokai.io> added the comment:
> 
> Mark, the locals() right before "if encoding:" (line 399) are:
> 
>>>> locale.normalize("en_US.UTF-8")
> {'code': 'en_US.ISO8859-1', 'langname': 'en_US', 'encoding': 'UTF8', 'norm_encoding': 'utf_8', 'defenc': 'ISO8859-1', 'localename': 'en_US.UTF-8', 'lookup_name': 'en_us.utf-8', 'fullname': 'en_us.utf-8'}
> 'en_US.UTF8'

Thanks.

Line 646 in the alias table is wrong:

    'utf_8':                        'UTF8',

should read:

    'utf_8':                        'UTF-8',

I wonder why this wasn't reported earlier - did the GlibC change
the UTF-8 spelling at some point ? I do vaguely remember that I
had to remove the hyphen due to problems with setlocale() not
accepting 'UTF-8', but that was at the time I wrote that part
of locale.py, i.e. many years ago.

It doesn't appear to be necessary anymore. I checked on openSUSE
10.3 and 11.3. Both work fine with 'UTF-8' and 'UTF8'.
msg119309 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-10-21 15:27
If other Posix-y systems accept both spellings and only Macs insist on the dash, we should probably indeed change the alias entry to use it.
msg122374 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-25 15:42
Mandriva and Debian also work fine with both "UTF8" and "UTF-8". For the record, the canonical spelling inside /usr/share/locale is "UTF-8". I suppose glibc does its own normalization.
msg123553 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010-12-07 14:29
UTF-8 works on SuSE Enterprise Linux 9 and 10 as well. 

BTW, neither UTF8 nor UTF-8 work on HPUX 10. That platform requires spelling it as utf8. 

This sadly enought means that this code doesn't work on HPUX 10:

>>> locale.setlocale(locale.LC_ALL, locale.getdefaultlocale())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/python2.7/lib/python2.7/locale.py", line 531, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

That's because getdefaultlocale returns 'UTF8' as the encoding, even though LANG is set to 'nl_NL.utf8' (which is a working locale on the machine I tested).

BTW. I'm +1 on changing the alias table as Marc-Andre proposed.
msg123667 - (view) Author: MunSic JEONG (ruseel) Date: 2010-12-09 02:34
Ubuntu 10.4.1 LTS 
 also work fine with both "UTF8" and "UTF-8"
msg129662 - (view) Author: Boris FELD (Boris.FELD) * Date: 2011-02-27 22:00
Bug confirmed on python2.5+ and python3.2-.

If it works with the dash, is agree with the Marc-Andre solution.
msg134271 - (view) Author: Piotr Sikora (PiotrSikora) Date: 2011-04-22 16:52
It's the same on OpenBSD (and I'm pretty sure it's true for other BSDs as well).

>>> locale.resetlocale()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/locale.py", line 523, in resetlocale
    _setlocale(category, _build_localename(getdefaultlocale()))
locale.Error: unsupported locale setting
>>> locale._build_localename(locale.getdefaultlocale())
'en_US.UTF8'

Works fine with Marc-Andre's alias table fix.

Any chances this will be eventually fixed in 2.x?
msg134450 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-04-26 10:18
Piotr Sikora wrote:
> 
> Piotr Sikora <piotr.sikora@frickle.com> added the comment:
> 
> It's the same on OpenBSD (and I'm pretty sure it's true for other BSDs as well).
> 
>>>> locale.resetlocale()
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python2.6/locale.py", line 523, in resetlocale
>     _setlocale(category, _build_localename(getdefaultlocale()))
> locale.Error: unsupported locale setting
>>>> locale._build_localename(locale.getdefaultlocale())
> 'en_US.UTF8'
> 
> Works fine with Marc-Andre's alias table fix.
> 
> Any chances this will be eventually fixed in 2.x?

This can go into Python 2.7, and, of course, into the 3.x
branches.
msg135406 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2011-05-07 07:19
The attached patch implements the change that Marc-Andre proposed.

I intend to apply this patch to all active branches later today (after some more testing)
msg136150 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-17 12:10
New changeset 932de36903e7 by Ronald Oussoren in branch '2.7':
(backport)Fix #10154 and #10090: locale normalizes the UTF-8 encoding to "UTF-8" instead of "UTF8"
http://hg.python.org/cpython/rev/932de36903e7

New changeset 28e410eb86af by Ronald Oussoren in branch '3.1':
Fix #10154 and #10090: locale normalizes the UTF-8 encoding to "UTF-8" instead of "UTF8"
http://hg.python.org/cpython/rev/28e410eb86af

New changeset 454d13e535ff by Ronald Oussoren in branch '3.2':
(merge) Fix #10154 and #10090: locale normalizes the UTF-8 encoding to "UTF-8" instead of "UTF8"
http://hg.python.org/cpython/rev/454d13e535ff
msg136154 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-17 12:49
New changeset 3d7cb852a176 by Ronald Oussoren in branch 'default':
Fix for issue 10154, merge from 3.2
http://hg.python.org/cpython/rev/3d7cb852a176
History
Date User Action Args
2014-10-02 08:28:35serhiy.storchakalinkissue1176504 superseder
2011-05-17 12:49:51python-devsetmessages: + msg136154
2011-05-17 12:14:26ronaldoussorensetstatus: open -> closed
2011-05-17 12:14:03ronaldoussorensetresolution: fixed
stage: needs patch -> resolved
2011-05-17 12:10:12python-devsetnosy: + python-dev
messages: + msg136150
2011-05-07 08:07:34vstinnersetnosy: + vstinner
2011-05-07 07:19:43ronaldoussorensetfiles: + issue10154.patch
keywords: + patch
messages: + msg135406
2011-04-26 10:18:54lemburgsetmessages: + msg134450
title: locale.normalize strips "-" from UTF-8, which fails on Mac -> locale.normalize strips "-" from UTF-8, which fails on Mac
2011-04-23 15:46:10eric.araujosettitle: locale.normalize strips "-" from UTF-8, which fails on Mac -> locale.normalize strips "-" from UTF-8, which fails on Mac
stage: needs patch
versions: + Python 3.3, - Python 2.6, Python 2.5
2011-04-22 16:52:24PiotrSikorasetnosy: + PiotrSikora
messages: + msg134271
2011-02-27 22:00:08Boris.FELDsetnosy: + Boris.FELD

messages: + msg129662
versions: + Python 2.6, Python 2.5
2010-12-09 02:34:31ruseelsetmessages: + msg123667
2010-12-07 14:29:42ronaldoussorensetmessages: + msg123553
2010-11-25 15:42:48pitrousetnosy: + pitrou
messages: + msg122374
2010-11-25 02:12:40ruseelsetnosy: + ruseel
2010-10-22 17:37:08eric.araujolinkissue10090 dependencies
2010-10-21 15:27:04georg.brandlsetnosy: + georg.brandl
messages: + msg119309
2010-10-21 14:15:06lemburgsetmessages: + msg119301
2010-10-21 13:53:57ixokaisetmessages: + msg119298
2010-10-20 21:47:40lemburgsetnosy: + lemburg
title: locale.normalize strips "-" from UTF-8, which fails on Mac -> locale.normalize strips "-" from UTF-8, which fails on Mac
messages: + msg119236
2010-10-20 15:49:01ronaldoussorensetfiles: - smime.p7s
2010-10-20 15:46:22ronaldoussorensetfiles: + smime.p7s

messages: + msg119216
2010-10-20 15:31:23ixokaicreate