Message 149060 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, loewis, vstinner
Date	2011-12-08.23:02:15
SpamBayes Score	4.8596127e-12
Marked as misclassified	No
Message-id	<1323385336.87.0.489685352091.issue13560@psf.upfronthosting.co.za>
In-reply-to

Content
To decode byte string from the locale encoding (LC_CTYPE), PyUnicode_DecodeFSDefault() can be used, but this function uses a constant encoding set at startup (the locale encoding at startup). The right method is currently to call _Py_char2wchar() and then PyUnicode_FromWideChar(). _Py_char2wchar() is a low level function, it doesn't raise nice Python exception, but just return NULL on error and write a message to stderr using fprintf() (!). Attached patch adds PyUnicode_DecodeLocale() and PyUnicode_DecodeLocaleAndSize() to offer a high level API to decode data from the current locale encoding. These functions fail with an OSError or MemoryError if decoding fails (instead of a generic ValueError), and don't write to stderr anymore. They are a surrogateescape argument to choose to escape undecodable bytes or to fail with an error. The patch only uses the function in _localemodule.c, but other functions may have to be fixed to use the new function. The tzname_encoding.patch of issue #5905 should maybe use it for example.

To decode byte string from the locale encoding (LC_CTYPE), PyUnicode_DecodeFSDefault() can be used, but this function uses a constant encoding set at startup (the locale encoding at startup). The right method is currently to call _Py_char2wchar() and then PyUnicode_FromWideChar(). _Py_char2wchar() is a low level function, it doesn't raise nice Python exception, but just return NULL on error and write a message to stderr using fprintf() (!).

Attached patch adds PyUnicode_DecodeLocale() and PyUnicode_DecodeLocaleAndSize() to offer a high level API to decode data from the *current* locale encoding. These functions fail with an OSError  or MemoryError if decoding fails (instead of a generic ValueError), and don't write to stderr anymore. They are a surrogateescape argument to choose to escape undecodable bytes or to fail with an error.

The patch only uses the function in _localemodule.c, but other functions may have to be fixed to use the new function. The tzname_encoding.patch of issue #5905 should maybe use it for example.

History
Date	User	Action	Args
2011-12-08 23:02:16	vstinner	set	recipients: + vstinner, loewis, ezio.melotti
2011-12-08 23:02:16	vstinner	set	messageid: <1323385336.87.0.489685352091.issue13560@psf.upfronthosting.co.za>
2011-12-08 23:02:16	vstinner	link	issue13560 messages
2011-12-08 23:02:15	vstinner	create