New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python c api wchar_t*/char* passing contradiction #66306
Comments
The documentation and the code example at #include <Python.h>
int
main(int argc, char *argv[])
{
Py_SetProgramName(argv[0]); /* optional but recommended */
Py_Initialize();
PyRun_SimpleString("from time import time,ctime\n"
"print('Today is', ctime(time()))\n");
Py_Finalize();
return 0;
} contradicts the actual implementation of the code: which leads to compiler errors. To fix them, ugly wchar_t to char conversions are needed. Also, I was hoping, Python 3.3 finally switched from wchar_t to char and UTF-8. see also: http://stackoverflow.com/questions/21591908/python-3-3-c-string-handling-wchar-t-vs-char => Are the docs wrong (which i hope are not, the example is straightforward and simple-stupid with a char*), |
You were misinterpreting PEP-393 - it is only about the representation of string objects, and doesn't affect any pre-existing API. Changing Py_SetProgramName is not possible without breaking existing code, so it could only happen in Python 4. A proper solution might be adding Py_SetProgramNameUTF8, but it could trick people into believing that argv[0] actually is UTF-8 on their system, which it might not be. Providing Py_SetProgramNameASCII might be better, but it could fail if argv[0] contains non-ASCII characters. Yet another solution could be to expose _Py_char2wchar to the developer. In any case: yes, the example is outdated, and only valid for Python 2. |
This issue is why I created the issue bpo-18395. |
I'd say Python should definitely change its internal string type to char*. Exposing "handy" wchar_t->char conversion functions don't resolve the data represenation enhancement. |
Jonas, why do you say that? |
Martin, i think the most intuitive and easiest way for working with strings in C are just char arrays. Starting with the main() argv being char*, probably most programmers just go with char* and all the encoding just works. What I'd really like to see in CPython is that the internal storage (and the way it's exposed in the C-API) is just raw bytes (=> char*). This allows super-easy integration in C projects that probably all just use char as their string type (see the doc example mentioned earlier). PEP-393 states: "(..) the specification chooses UTF-8 as the recommended way of exposing strings to C code." And for that, I think using char instead of wchar_t is a better solution for interface developers. |
Python is portable, we care of Windows. On Windows, wchar_t* is the native type for strings (ex: command line, environment variables). |
New changeset 94d0e842b9ea by Victor Stinner in branch 'default': |
I updated the embedding and extending examples but I didn't try them. @jonas: Can you please try the updated examples? |
Indeed, that should do it, thanks. I still pledge for Python 4? always using char* internally to make this conversion obsolete ;) (except for windows) |
I don't understand your proposition. We try to have duplicating functions for char* and wchar*. |
Jonas: Python's string type is a Unicode character type, unlike C's (which is wishy-washy when it comes to characters outside of the "basic execution character set"). So just declaring that all APIs take UTF-8 will *not* allow for easy integration with other C code; instead, it will be the source of moji-bake. In any case, this issue appears to be resolved now; thanks for the patch. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: