python c api wchar_t/char passing contradiction #66306

jj · 2014-07-30T17:10:42Z

BPO	22108
Nosy	@loewis, @vstinner, @ezio-melotti, @zware

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-08-01.13:05:07.999>
created_at = <Date 2014-07-30.17:10:42.463>
labels = ['build', 'expert-unicode']
title = 'python c api wchar_t*/char* passing contradiction'
updated_at = <Date 2014-08-01.13:07:10.027>
user = 'https://bugs.python.org/jj'

bugs.python.org fields:

activity = <Date 2014-08-01.13:07:10.027>
actor = 'zach.ware'
assignee = 'none'
closed = True
closed_date = <Date 2014-08-01.13:05:07.999>
closer = 'loewis'
components = ['Unicode']
creation = <Date 2014-07-30.17:10:42.463>
creator = 'jj'
dependencies = []
files = []
hgrepos = []
issue_num = 22108
keywords = []
message_count = 13.0
messages = ['224327', '224329', '224340', '224364', '224400', '224406', '224444', '224462', '224482', '224485', '224490', '224491', '224496']
nosy_count = 6.0
nosy_names = ['loewis', 'vstinner', 'ezio.melotti', 'python-dev', 'zach.ware', 'jj']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'compile error'
url = 'https://bugs.python.org/issue22108'
versions = ['Python 3.4', 'Python 3.5']

jj · 2014-07-30T17:10:42Z

The documentation and the code example at
https://docs.python.org/3.5/extending/embedding.html#very-high-level-embedding

#include <Python.h>

int
main(int argc, char *argv[])
{
  Py_SetProgramName(argv[0]);  /* optional but recommended */
  Py_Initialize();
  PyRun_SimpleString("from time import time,ctime\n"
                     "print('Today is', ctime(time()))\n");
  Py_Finalize();
  return 0;
}

contradicts the actual implementation of the code:
http://hg.python.org/cpython/file/tip/Include/pythonrun.h#l25

which leads to compiler errors. To fix them, ugly wchar_t to char conversions are needed.

Also, I was hoping, Python 3.3 finally switched from wchar_t to char and UTF-8.
at least that's how I understood PEP-393 http://python.org/dev/peps/pep-0393/

see also:

http://stackoverflow.com/questions/21591908/python-3-3-c-string-handling-wchar-t-vs-char

=> Are the docs wrong (which i hope are not, the example is straightforward and simple-stupid with a char*),
or is cpython wrong?

loewis · 2014-07-30T17:47:53Z

You were misinterpreting PEP-393 - it is only about the representation of string objects, and doesn't affect any pre-existing API. Changing Py_SetProgramName is not possible without breaking existing code, so it could only happen in Python 4.

A proper solution might be adding Py_SetProgramNameUTF8, but it could trick people into believing that argv[0] actually is UTF-8 on their system, which it might not be. Providing Py_SetProgramNameASCII might be better, but it could fail if argv[0] contains non-ASCII characters. Yet another solution could be to expose _Py_char2wchar to the developer.

In any case: yes, the example is outdated, and only valid for Python 2.

vstinner · 2014-07-30T18:53:38Z

This issue is why I created the issue bpo-18395.

jj · 2014-07-30T23:57:17Z

I'd say Python should definitely change its internal string type to char*. Exposing "handy" wchar_t->char conversion functions don't resolve the data represenation enhancement.

loewis · 2014-07-31T12:43:52Z

Jonas, why do you say that?

zware · 2014-07-31T14:39:44Z

See also bpo-20466 (which has a patch for this, but I cannot speak for its effectiveness).

I'd be in favor of closing that issue and this one as duplicates of bpo-18395, and noting in bpo-18395 that the embedding example must be updated before that issue is closed.

jj · 2014-07-31T20:03:00Z

Martin, i think the most intuitive and easiest way for working with strings in C are just char arrays.

Starting with the main() argv being char*, probably most programmers just go with char* and all the encoding just works.
This is because contact with encoding is only needed for the user input software (xorg, keyboard input) and user output (-> your terminal emulator, the gui, ...).
No matter what stuff your program receives, the encoding only matters for the actual output display software to select the correct visual representation.
Requiring a conversion to wide chars just increases the interface complexity and adds really unneeded data transformations that are completely obsolete with UTF-8.

What I'd really like to see in CPython is that the internal storage (and the way it's exposed in the C-API) is just raw bytes (=> char*).

This allows super-easy integration in C projects that probably all just use char as their string type (see the doc example mentioned earlier).

PEP-393 states: "(..) the specification chooses UTF-8 as the recommended way of exposing strings to C code."

And for that, I think using char instead of wchar_t is a better solution for interface developers.

vstinner · 2014-08-01T00:46:57Z

What I'd really like to see in CPython is that the internal storage (and the way it's exposed in the C-API) is just raw bytes (=> char*).

Python is portable, we care of Windows. On Windows, wchar_t* is the native type for strings (ex: command line, environment variables).

python-dev · 2014-08-01T10:34:47Z

New changeset 94d0e842b9ea by Victor Stinner in branch 'default':
Issue bpo-18395, bpo-22108: Update embedded Python examples to decode correctly
http://hg.python.org/cpython/rev/94d0e842b9ea

vstinner · 2014-08-01T10:37:21Z

I updated the embedding and extending examples but I didn't try them.

@jonas: Can you please try the updated examples?

jj · 2014-08-01T12:20:38Z

Indeed, that should do it, thanks.

I still pledge for Python 4? always using char* internally to make this conversion obsolete ;) (except for windows)

vstinner · 2014-08-01T12:21:45Z

I still pledge for Python 4? always using char* internally to make this conversion obsolete ;) (except for windows)

I don't understand your proposition. We try to have duplicating functions for char* and wchar*.

loewis · 2014-08-01T13:05:08Z

Jonas: Python's string type is a Unicode character type, unlike C's (which is wishy-washy when it comes to characters outside of the "basic execution character set"). So just declaring that all APIs take UTF-8 will *not* allow for easy integration with other C code; instead, it will be the source of moji-bake.

In any case, this issue appears to be resolved now; thanks for the patch.

jj mannequin added topic-unicode build The build process and cross-build labels Jul 30, 2014

loewis mannequin closed this as completed Aug 1, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python c api wchar_t/char passing contradiction #66306

python c api wchar_t/char passing contradiction #66306

jj mannequin commented Jul 30, 2014

jj mannequin commented Jul 30, 2014

loewis mannequin commented Jul 30, 2014

vstinner commented Jul 30, 2014

jj mannequin commented Jul 30, 2014

loewis mannequin commented Jul 31, 2014

zware commented Jul 31, 2014

jj mannequin commented Jul 31, 2014

vstinner commented Aug 1, 2014

python-dev mannequin commented Aug 1, 2014

vstinner commented Aug 1, 2014

jj mannequin commented Aug 1, 2014

vstinner commented Aug 1, 2014

loewis mannequin commented Aug 1, 2014

python c api wchar_t*/char* passing contradiction #66306

python c api wchar_t*/char* passing contradiction #66306

Comments

jj mannequin commented Jul 30, 2014

jj mannequin commented Jul 30, 2014

loewis mannequin commented Jul 30, 2014

vstinner commented Jul 30, 2014

jj mannequin commented Jul 30, 2014

loewis mannequin commented Jul 31, 2014

zware commented Jul 31, 2014

jj mannequin commented Jul 31, 2014

vstinner commented Aug 1, 2014

python-dev mannequin commented Aug 1, 2014

vstinner commented Aug 1, 2014

jj mannequin commented Aug 1, 2014

vstinner commented Aug 1, 2014

loewis mannequin commented Aug 1, 2014

python c api wchar_t/char passing contradiction #66306

python c api wchar_t/char passing contradiction #66306