classification
Title: Make _Py_char2wchar() and _Py_wchar2char() public
Type: Stage:
Components: Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, haypo, josh.r, python-dev, takluyver
Priority: normal Keywords:

Created on 2013-07-07 14:00 by haypo, last changed 2014-08-01 10:36 by haypo. This issue is now closed.

Messages (5)
msg192557 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-07-07 14:00
The Python C API has two very useful functions: _Py_char2wchar() and _Py_wchar2char(). They must be used to handle correctly undecodable byte sequences. _Py_char2wchar() and _Py_wchar2char() use the surrogateescape error handler (PEP 383). _Py_char2wchar() forces also the ASCII encoding on FreeBSD and Solaris when the LC_CTYPE locale is C.

Py_Main() expects an array of wide character strings (wchar_t*) for the command line argument, whereas main() gets an array or byte strings (char*). _Py_char2wchar() must be used to be able to call Py_Main().

I propose the following names:

wchar_t* Py_DecodeLocale(const char* arg, size_t *size);
char* Py_EncodeLocale(const wchar_t *text, size_t *error_pos);

See Python/fileutils.c for more information about these functions.


Python 3.3 has already higher level functions (calling _Py_char2_wchar() and _Py_wchar2char()):

PyObject* PyUnicode_DecodeLocale(const char *str, const char *errors);
PyObject* PyUnicode_EncodeLocale(PyObject *unicode, const char *errors);

But these functions cannot be used before Python is initialized.
msg223393 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-07-18 00:12
How often do people need to convert to do platform independent locale encoding before Python is initialized? Encouraging use of platform dependent wchar_t's seems like a bad idea when PyUnicode abstracts away the difference ever since 3.3 released.
msg223396 - (view) Author: Thomas Kluyver (takluyver) * Date: 2014-07-18 00:45
You seem to need wchar_t to call Py_Main and Py_SetProgramName.

I think there's an example in the docs which is wrong, because it appears to pass a char* to Py_SetProgramName:
https://docs.python.org/3.4/extending/embedding.html#very-high-level-embedding
msg223430 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-07-18 20:35
> You seem to need wchar_t to call Py_Main and Py_SetProgramName.

Yes, exactly.
msg224483 - (view) Author: Roundup Robot (python-dev) Date: 2014-08-01 10:34
New changeset 93a798c7f270 by Victor Stinner in branch 'default':
Issue #18395: Rename ``_Py_char2wchar()`` to :c:func:`Py_DecodeLocale`, rename
http://hg.python.org/cpython/rev/93a798c7f270

New changeset 94d0e842b9ea by Victor Stinner in branch 'default':
Issue #18395, #22108: Update embedded Python examples to decode correctly
http://hg.python.org/cpython/rev/94d0e842b9ea
History
Date User Action Args
2014-08-01 13:06:05zach.warelinkissue20466 superseder
2014-08-01 10:36:33hayposetstatus: open -> closed
resolution: fixed
2014-08-01 10:34:47python-devsetnosy: + python-dev
messages: + msg224483
2014-07-18 20:35:52hayposetmessages: + msg223430
2014-07-18 00:45:41takluyversetmessages: + msg223396
2014-07-18 00:12:09josh.rsetnosy: + josh.r
messages: + msg223393
2014-07-17 23:38:54takluyversetnosy: + takluyver
2013-07-07 14:08:55barrysetnosy: + barry
2013-07-07 14:00:36haypocreate