Author vstinner
Recipients vstinner
Date 2013-07-07.14:00:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1373205636.55.0.562779126658.issue18395@psf.upfronthosting.co.za>
In-reply-to
Content
The Python C API has two very useful functions: _Py_char2wchar() and _Py_wchar2char(). They must be used to handle correctly undecodable byte sequences. _Py_char2wchar() and _Py_wchar2char() use the surrogateescape error handler (PEP 383). _Py_char2wchar() forces also the ASCII encoding on FreeBSD and Solaris when the LC_CTYPE locale is C.

Py_Main() expects an array of wide character strings (wchar_t*) for the command line argument, whereas main() gets an array or byte strings (char*). _Py_char2wchar() must be used to be able to call Py_Main().

I propose the following names:

wchar_t* Py_DecodeLocale(const char* arg, size_t *size);
char* Py_EncodeLocale(const wchar_t *text, size_t *error_pos);

See Python/fileutils.c for more information about these functions.


Python 3.3 has already higher level functions (calling _Py_char2_wchar() and _Py_wchar2char()):

PyObject* PyUnicode_DecodeLocale(const char *str, const char *errors);
PyObject* PyUnicode_EncodeLocale(PyObject *unicode, const char *errors);

But these functions cannot be used before Python is initialized.
History
Date User Action Args
2013-07-07 14:00:36vstinnersetrecipients: + vstinner
2013-07-07 14:00:36vstinnersetmessageid: <1373205636.55.0.562779126658.issue18395@psf.upfronthosting.co.za>
2013-07-07 14:00:36vstinnerlinkissue18395 messages
2013-07-07 14:00:36vstinnercreate