classification
Title: localeconv() does not encode returned strings
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: loewis, pitrou
Priority: normal Keywords:

Created on 2008-02-01 18:13 by pitrou, last changed 2008-03-08 10:43 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
pylocaleconv.patch pitrou, 2008-02-01 18:50
Messages (6)
msg61970 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-02-01 18:13
Some values in the dict returned by localeconv() may be non-ASCII
strings, yet they are not decoded according to the locale's character
set. This can be observed when the currency symbol is the euro sign:

>>> import locale
>>> locale.setlocale(locale.LC_MONETARY, 'fr_FR.UTF-8')
'fr_FR.UTF-8'
>>> locale.localeconv()['currency_symbol']
'\xe2\x82\xac'
>>> locale.setlocale(locale.LC_MONETARY, 'fr_FR.ISO8859-15')
'fr_FR.ISO8859-15'
>>> locale.localeconv()['currency_symbol']
'\xa4'

localeconv() is defined in the _locale module, which has no knowledge of
the current encoding - but the locale module does. So we could redefine
localeconv() in locale as a wrapper, to do the proper encoding dance.
msg61971 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-02-01 18:50
Here is a patch fixing the problem. Note however that it will make
localeconv() quite slower. Perhaps _locale.localeconv should grow an
encoding parameter instead.
msg61973 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-02-01 19:32
The locale module is completely broken; don't try to work around that
breakage. One option would be to remove it entirely.
msg61977 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-02-01 21:18
Perhaps it is broken - it does look rather fragile - but are there any
plans to design a replacement?
msg61987 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-02-02 01:08
> Perhaps it is broken - it does look rather fragile - but are there any
> plans to design a replacement?

If people could contribute an ICU wrapper - that would be nice. However,
it's unlikely to happen. So I'll rather rewrite _locale to use wchar_t
functions, and give up on systems where these are not available, or
where wchar_t is not Unicode (not sure how to detect the latter case).
msg63395 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-03-08 10:43
I found now a way to fix this, by relying on wchar_t functions. It's
fixed in r61306
History
Date User Action Args
2008-03-08 10:43:45loewissetstatus: open -> closed
resolution: fixed
messages: + msg63395
2008-02-02 01:08:27loewissetmessages: + msg61987
2008-02-01 21:18:51pitrousetmessages: + msg61977
2008-02-01 19:32:39loewissetnosy: + loewis
messages: + msg61973
2008-02-01 18:50:32pitrousetfiles: + pylocaleconv.patch
messages: + msg61971
2008-02-01 18:13:47pitroucreate