classification
Title: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize
Type: Stage:
Components: Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, haypo, loewis, python-dev, skrah
Priority: normal Keywords: patch

Created on 2011-12-08 23:02 by haypo, last changed 2011-12-17 06:15 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
pyunicode_decodelocale.patch haypo, 2011-12-08 23:03 review
pyunicode_decodelocale-2.patch haypo, 2011-12-09 19:26 review
Messages (6)
msg149060 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-12-08 23:02
To decode byte string from the locale encoding (LC_CTYPE), PyUnicode_DecodeFSDefault() can be used, but this function uses a constant encoding set at startup (the locale encoding at startup). The right method is currently to call _Py_char2wchar() and then PyUnicode_FromWideChar(). _Py_char2wchar() is a low level function, it doesn't raise nice Python exception, but just return NULL on error and write a message to stderr using fprintf() (!).

Attached patch adds PyUnicode_DecodeLocale() and PyUnicode_DecodeLocaleAndSize() to offer a high level API to decode data from the *current* locale encoding. These functions fail with an OSError  or MemoryError if decoding fails (instead of a generic ValueError), and don't write to stderr anymore. They are a surrogateescape argument to choose to escape undecodable bytes or to fail with an error.

The patch only uses the function in _localemodule.c, but other functions may have to be fixed to use the new function. The tzname_encoding.patch of issue #5905 should maybe use it for example.
msg149116 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-12-09 19:26
I fixed issue #5905 (strptime fails in non-UTF locale). The fix is not enough if the locale is changed in Python.

Update the patch to fix time.strftime() (if wcsftime() is not available).
msg149654 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-12-17 02:55
changeset:   74002:279b0aee0cfb
user:        Victor Stinner <victor.stinner@haypocalc.com>
date:        Fri Dec 16 23:56:01 2011 +0100
files:       Doc/c-api/unicode.rst Include/unicodeobject.h Modules/_localemodule.c Modules/main.c Modules/timemodule.c
description:
Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()

 * PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string
   from the current locale encoding
 * _Py_char2wchar() writes an "error code" in the size argument to indicate
   if the function failed because of memory allocation failure or because of a
   decoding error. The function doesn't write the error message directly to
   stderr.
 * Fix time.strftime() (if wcsftime() is missing): decode strftime() result
   from the current locale encoding, not from the filesystem encoding.
msg149655 - (view) Author: Roundup Robot (python-dev) Date: 2011-12-17 03:46
New changeset 88198b93ff2f by Victor Stinner in branch 'default':
Issue #13560: Add PyUnicode_EncodeLocale()
http://hg.python.org/cpython/rev/88198b93ff2f

New changeset 51412b4b81ae by Victor Stinner in branch 'default':
Issue #13560: os.strerror() now uses the current locale encoding instead of UTF-8
http://hg.python.org/cpython/rev/51412b4b81ae
msg149658 - (view) Author: Roundup Robot (python-dev) Date: 2011-12-17 04:46
New changeset 07802351ccad by Victor Stinner in branch 'default':
Issue #13560: Locale codec functions use the classic "errors" parameter,
http://hg.python.org/cpython/rev/07802351ccad
msg149661 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-12-17 06:15
Ok, I think that the current code is good enough to close the issue. I opened a more global issue about the Python codec: #13619.
History
Date User Action Args
2011-12-17 06:15:00hayposetstatus: open -> closed
resolution: fixed
messages: + msg149661
2011-12-17 04:46:26python-devsetmessages: + msg149658
2011-12-17 03:46:07python-devsetnosy: + python-dev
messages: + msg149655
2011-12-17 02:55:55hayposetmessages: + msg149654
2011-12-09 19:26:30hayposetfiles: + pyunicode_decodelocale-2.patch

messages: + msg149116
2011-12-09 08:47:32skrahsetnosy: + skrah
2011-12-08 23:03:05hayposetfiles: + pyunicode_decodelocale.patch
keywords: + patch
2011-12-08 23:02:16haypocreate