This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: locale.normalize does not take unicode strings
Type: Stage: patch review
Components: Unicode Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, ezio.melotti, jtaylor, lemburg, pitrou, python-dev
Priority: normal Keywords: patch

Created on 2011-08-15 11:12 by jtaylor, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue12752.diff barry, 2011-08-15 20:48 review
Messages (7)
msg142118 - (view) Author: Julian Taylor (jtaylor) Date: 2011-08-15 11:12
using unicode strings for locale.normalize gives following traceback with python2.7:

~$ python2.7 -c 'import locale; locale.normalize(u"en_US")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/locale.py", line 358, in normalize
    fullname = localename.translate(_ascii_lower_map)
TypeError: character mapping must return integer, None or unicode

with python2.6 it works and it also works with non-unicode strings in 2.7
msg142122 - (view) Author: Julian Taylor (jtaylor) Date: 2011-08-15 11:59
this is a regression introduced by fixing http://bugs.python.org/issue1813

This breaks some user code,. e.g. wx.Locale.GetCanonicalName returns unicode.
Example bugs:
https://bugs.launchpad.net/ubuntu/+source/update-manager/+bug/824734
https://bugs.launchpad.net/ubuntu/+source/playonlinux/+bug/825421
msg142123 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-08-15 12:01
Julian Taylor wrote:
> 
> New submission from Julian Taylor <jtaylor.debian@googlemail.com>:
> 
> using unicode strings for locale.normalize gives following traceback with python2.7:
> 
> ~$ python2.7 -c 'import locale; locale.normalize(u"en_US")'
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File "/usr/lib/python2.7/locale.py", line 358, in normalize
>     fullname = localename.translate(_ascii_lower_map)
> TypeError: character mapping must return integer, None or unicode
> 
> with python2.6 it works and it also works with non-unicode strings in 2.7

This looks like a side-effect of the change Antoine made to the locale
module when trying to make the case mapping work in a non-locale
dependent way.
msg142146 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2011-08-15 20:39
A cheap way of fixing this would be to test for str-ness of localename and if it's a unicode, just localname.encode('ascii')

Or is that completely insane?
msg142147 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2011-08-15 20:47
For example:


diff -r fb49394f75ed Lib/locale.py
--- a/Lib/locale.py	Mon Aug 15 14:24:15 2011 +0300
+++ b/Lib/locale.py	Mon Aug 15 16:47:23 2011 -0400
@@ -355,6 +355,8 @@
 
     """
     # Normalize the locale name and extract the encoding
+    if isinstance(localename, unicode):
+        localename = localename.encode('ascii')
     fullname = localename.translate(_ascii_lower_map)
     if ':' in fullname:
         # ':' is sometimes used as encoding delimiter.
diff -r fb49394f75ed Lib/test/test_locale.py
--- a/Lib/test/test_locale.py	Mon Aug 15 14:24:15 2011 +0300
+++ b/Lib/test/test_locale.py	Mon Aug 15 16:47:23 2011 -0400
@@ -412,6 +412,11 @@
         locale.setlocale(locale.LC_CTYPE, loc)
         self.assertEqual(loc, locale.getlocale())
 
+    def test_normalize_issue12752(self):
+        # Issue #1813 caused a regression where locale.normalize() would no
+        # longer accept unicode strings.
+        self.assertEqual(locale.normalize(u'en_US'), 'en_US.ISO8859-1')
+
 
 def test_main():
     tests = [
msg142149 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-15 22:40
The proposed resolution looks ok. Another possibility is simply to use .lower() if the string is an unicode string, since that will bypass the C locale.
msg142152 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-08-15 23:51
New changeset 0d64fe6c737f by Barry Warsaw in branch '2.7':
The simplest possible fix for the regression in bug 12752 by encoding unicodes
http://hg.python.org/cpython/rev/0d64fe6c737f
History
Date User Action Args
2022-04-11 14:57:20adminsetgithub: 56961
2011-08-15 23:51:53barrysetstatus: open -> closed
resolution: fixed
2011-08-15 23:51:01python-devsetnosy: + python-dev
messages: + msg142152
2011-08-15 23:17:33barrysetassignee: barry
2011-08-15 22:40:25pitrousetnosy: + pitrou

messages: + msg142149
stage: test needed -> patch review
2011-08-15 20:48:24barrysetfiles: + issue12752.diff
keywords: + patch
2011-08-15 20:47:48barrysetmessages: + msg142147
2011-08-15 20:39:18barrysetnosy: + barry
messages: + msg142146
2011-08-15 12:01:47lemburgsetnosy: + lemburg
messages: + msg142123
2011-08-15 11:59:22jtaylorsetmessages: + msg142122
2011-08-15 11:13:38ezio.melottisetnosy: + ezio.melotti

stage: test needed
2011-08-15 11:12:41jtaylorcreate