Issue 12752: locale.normalize does not take unicode strings

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/56961

classification

Title:	locale.normalize does not take unicode strings
Type:		Stage:	patch review
Components:	Unicode	Versions:	Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	barry	Nosy List:	barry, ezio.melotti, jtaylor, lemburg, pitrou, python-dev
Priority:	normal	Keywords:	patch

Created on 2011-08-15 11:12 by jtaylor, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue12752.diff	barry, 2011-08-15 20:48		review

Messages (7)
msg142118 - (view)	Author: Julian Taylor (jtaylor)	Date: 2011-08-15 11:12
using unicode strings for locale.normalize gives following traceback with python2.7: ~$ python2.7 -c 'import locale; locale.normalize(u"en_US")' Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/lib/python2.7/locale.py", line 358, in normalize fullname = localename.translate(_ascii_lower_map) TypeError: character mapping must return integer, None or unicode with python2.6 it works and it also works with non-unicode strings in 2.7
msg142122 - (view)	Author: Julian Taylor (jtaylor)	Date: 2011-08-15 11:59
this is a regression introduced by fixing http://bugs.python.org/issue1813 This breaks some user code,. e.g. wx.Locale.GetCanonicalName returns unicode. Example bugs: https://bugs.launchpad.net/ubuntu/+source/update-manager/+bug/824734 https://bugs.launchpad.net/ubuntu/+source/playonlinux/+bug/825421
msg142123 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2011-08-15 12:01
Julian Taylor wrote: > > New submission from Julian Taylor <jtaylor.debian@googlemail.com>: > > using unicode strings for locale.normalize gives following traceback with python2.7: > > ~$ python2.7 -c 'import locale; locale.normalize(u"en_US")' > Traceback (most recent call last): > File "<string>", line 1, in <module> > File "/usr/lib/python2.7/locale.py", line 358, in normalize > fullname = localename.translate(_ascii_lower_map) > TypeError: character mapping must return integer, None or unicode > > with python2.6 it works and it also works with non-unicode strings in 2.7 This looks like a side-effect of the change Antoine made to the locale module when trying to make the case mapping work in a non-locale dependent way.
msg142146 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2011-08-15 20:39
A cheap way of fixing this would be to test for str-ness of localename and if it's a unicode, just localname.encode('ascii') Or is that completely insane?
msg142147 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2011-08-15 20:47
For example: diff -r fb49394f75ed Lib/locale.py --- a/Lib/locale.py Mon Aug 15 14:24:15 2011 +0300 +++ b/Lib/locale.py Mon Aug 15 16:47:23 2011 -0400 @@ -355,6 +355,8 @@ """ # Normalize the locale name and extract the encoding + if isinstance(localename, unicode): + localename = localename.encode('ascii') fullname = localename.translate(_ascii_lower_map) if ':' in fullname: # ':' is sometimes used as encoding delimiter. diff -r fb49394f75ed Lib/test/test_locale.py --- a/Lib/test/test_locale.py Mon Aug 15 14:24:15 2011 +0300 +++ b/Lib/test/test_locale.py Mon Aug 15 16:47:23 2011 -0400 @@ -412,6 +412,11 @@ locale.setlocale(locale.LC_CTYPE, loc) self.assertEqual(loc, locale.getlocale()) + def test_normalize_issue12752(self): + # Issue #1813 caused a regression where locale.normalize() would no + # longer accept unicode strings. + self.assertEqual(locale.normalize(u'en_US'), 'en_US.ISO8859-1') + def test_main(): tests = [
msg142149 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-08-15 22:40
The proposed resolution looks ok. Another possibility is simply to use .lower() if the string is an unicode string, since that will bypass the C locale.
msg142152 - (view)	Author: Roundup Robot (python-dev)	Date: 2011-08-15 23:51
New changeset 0d64fe6c737f by Barry Warsaw in branch '2.7': The simplest possible fix for the regression in bug 12752 by encoding unicodes http://hg.python.org/cpython/rev/0d64fe6c737f

History
Date	User	Action	Args
2022-04-11 14:57:20	admin	set	github: 56961
2011-08-15 23:51:53	barry	set	status: open -> closed resolution: fixed
2011-08-15 23:51:01	python-dev	set	nosy: + python-dev messages: + msg142152
2011-08-15 23:17:33	barry	set	assignee: barry
2011-08-15 22:40:25	pitrou	set	nosy: + pitrou messages: + msg142149 stage: test needed -> patch review
2011-08-15 20:48:24	barry	set	files: + issue12752.diff keywords: + patch
2011-08-15 20:47:48	barry	set	messages: + msg142147
2011-08-15 20:39:18	barry	set	nosy: + barry messages: + msg142146
2011-08-15 12:01:47	lemburg	set	nosy: + lemburg messages: + msg142123
2011-08-15 11:59:22	jtaylor	set	messages: + msg142122
2011-08-15 11:13:38	ezio.melotti	set	nosy: + ezio.melotti stage: test needed
2011-08-15 11:12:41	jtaylor	create