classification
Title: locale.setlocale does not work with unicode strings
Type: Stage: resolved
Components: Unicode Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, lemburg, loewis, python-dev, serhiy.storchaka, terry.reedy, tierlieb, vstinner
Priority: normal Keywords: patch

Created on 2015-11-27 14:54 by tierlieb, last changed 2015-11-29 23:11 by Arfrever. This issue is now closed.

Files
File name Uploaded Description Edit
setlocale_unicode.patch vstinner, 2015-11-27 22:11 review
Messages (13)
msg255461 - (view) Author: (tierlieb) Date: 2015-11-27 14:54
Within locale.py in setlocale your have this piece of code:

    if locale and type(locale) is not type(""):
        # convert to string
        locale = normalize(_build_localename(locale))

That does not work with unicode strings as I found out after wondering quite a bit about the difference was between my tests and my production code...

So either expand the check here to include type(u"") or make _build_localename smarter.
msg255481 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-11-27 20:09
The doc for setlocale(category [,locale]) says "locale may be a string, or an iterable of two strings (language code and encoding)".  The purpose of _build_locale is handle an iterable of two strings.  This request looks like an enhancement request, which is not allowed for 2.7.  I suspect that the locale locale module and doc predate the addition of unicode.  I think this should be closed.
msg255494 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-11-27 21:47
I wouldn't say this is a feature request. 

What the code wanted to check is "if this is an iterable of two strings, convert these to a locale string". I have no idea why the doc string uses "iterable". IMO, a tuple of two strings would have been fine and make the test case a lot simpler - too late to fix, though.

If the code works with Unicode strings, I think we can change the test to:

if locale and not isinstance(locale, basestring):
    ...

In Python 3, the function will only accept Unicode strings, so no need to fix things there.

@tierlieb: Could you provide a patch with test for this ? Thanks.
msg255496 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-27 22:11
I don't see the benefit of supporting Unicode strings for setlocale() arguments: locale name are always encodable to ASCII, so loc.decode('ascii') is enough to workaround the issue.

But well, I think it's ok if it doesn't make the code much more complex ;-)

I wrote a patch, what do you think? Is it worth it? ;-)
msg255498 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-11-27 22:28
On 27.11.2015 23:11, STINNER Victor wrote:
> 
> STINNER Victor added the comment:
> 
> I don't see the benefit of supporting Unicode strings for setlocale() arguments: locale name are always encodable to ASCII, so loc.decode('ascii') is enough to workaround the issue.
> 
> But well, I think it's ok if it doesn't make the code much more complex ;-)
> 
> I wrote a patch, what do you think?

Thanks :-)

> Is it worth it? ;-)

I think so, since the current failure for Unicode is rather
obscure.

BTW: Why did you use (_str, _unicode) instead of basestring ?
msg255501 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-27 22:50
> BTW: Why did you use (_str, _unicode) instead of basestring ?

Serhiy usually insists that technically, it's possible to compile Python 2.7 without Unicode support. I don't believe that anyone uses this crazy feature, but well, it was easier to use _unicode (which is already defined) than trying to run a poll on python users :-)
msg255503 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-27 22:55
New changeset 7841e9b614eb by Victor Stinner in branch '2.7':
Closes #25742: locale.setlocale() now accepts a Unicode string for its second
https://hg.python.org/cpython/rev/7841e9b614eb
msg255504 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-11-27 22:56
On 27.11.2015 23:50, STINNER Victor wrote:
> 
> STINNER Victor added the comment:
> 
>> BTW: Why did you use (_str, _unicode) instead of basestring ?
> 
> Serhiy usually insists that technically, it's possible to compile Python 2.7 without Unicode support. I don't believe that anyone uses this crazy feature, but well, it was easier to use _unicode (which is already defined) than trying to run a poll on python users :-)

Hmm, but basestring is always defined, even when Python is compiled
without Unicode support (which I agree is not used much these
days). unicode won't exist in such a Python version, so basestring
is actually safer to use than the tuple.
msg255506 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-27 23:00
Marc-Andre Lemburg added the comment:
> Hmm, but basestring is always defined, even when Python is compiled
> without Unicode support (...)

Oh, I didn't know. Well, I already pushed my patch and it works. Feel
free to modify locale.py to use basestring. I'm not interested to
spend time on this *minor* issue anymore ;-)

Thanks for the review.by the way.
msg255507 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-11-27 23:03
On 28.11.2015 00:00, STINNER Victor wrote:
> 
> STINNER Victor added the comment:
> 
> Marc-Andre Lemburg added the comment:
>> Hmm, but basestring is always defined, even when Python is compiled
>> without Unicode support (...)
> 
> Oh, I didn't know. Well, I already pushed my patch and it works. Feel
> free to modify locale.py to use basestring. I'm not interested to
> spend time on this *minor* issue anymore ;-)

No big deal. There are probably lots more places in the stdlib which
break without Unicode compiled in... :-)

> Thanks for the review.by the way.

Thanks for the patch.
msg255514 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-28 00:11
Marc-Andre Lemburg added the comment:
> No big deal. There are probably lots more places in the stdlib which
> break without Unicode compiled in... :-)

Well, to have more fun, try to run any Python application with a
Python compiled without Unicode support *and* without thread support
:-D
msg255574 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-29 14:17
http://buildbot.python.org/all/builders/x86%20XP-4%202.7/builds/3517/steps/test/logs/stdio
======================================================================
ERROR: test_setlocale_unicode (test.test_locale.TestMiscellaneous)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "d:\cygwin\home\db3l\buildarea\2.7.bolen-windows\build\lib\test\test_locale.py", line 497, in test_setlocale_unicode
    old_loc = locale.getlocale(locale.LC_ALL)
  File "d:\cygwin\home\db3l\buildarea\2.7.bolen-windows\build\lib\locale.py", line 565, in getlocale
    raise TypeError, 'category LC_ALL is not supported'
TypeError: category LC_ALL is not supported

----------------------------------------------------------------------
msg255576 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-29 15:33
New changeset d7481ebeaa4f by Victor Stinner in branch '2.7':
Issue #25742: Try to fix test_locale on Windows
https://hg.python.org/cpython/rev/d7481ebeaa4f
History
Date User Action Args
2015-11-29 23:11:34Arfreversetstatus: open -> closed
2015-11-29 15:33:27python-devsetmessages: + msg255576
2015-11-29 14:17:42serhiy.storchakasetstatus: closed -> open
nosy: + serhiy.storchaka
messages: + msg255574

2015-11-28 00:11:05vstinnersetmessages: + msg255514
2015-11-27 23:03:47lemburgsetmessages: + msg255507
2015-11-27 23:00:41vstinnersetmessages: + msg255506
2015-11-27 22:56:23lemburgsetmessages: + msg255504
2015-11-27 22:55:39python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg255503

resolution: fixed
stage: resolved
2015-11-27 22:50:44vstinnersetmessages: + msg255501
2015-11-27 22:28:22lemburgsetmessages: + msg255498
2015-11-27 22:11:14vstinnersetfiles: + setlocale_unicode.patch
keywords: + patch
messages: + msg255496
2015-11-27 21:47:06lemburgsetmessages: + msg255494
2015-11-27 20:09:15terry.reedysetnosy: + terry.reedy, lemburg, loewis
messages: + msg255481
2015-11-27 14:54:09tierliebcreate