New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve support of PEP 383 (surrogates) in Python3: meta-issue #52489
Comments
If the fullpath to the python3 binary contains a non-ASCII character and the file system encoding is ASCII, Python fails with: The file system encoding is set to ASCII if there is no locale (eg. LANG=C). The problem is that the command line argument, especially argv[0], is stored to a wchar_t* string using surrogates to store undecodable bytes. Attached patch fixes calculate_path() and import functions to support surrogates. Details:
The patch is a work in progress: there are some FIXME (I don't know if the string should be encoded/decoded using surrogates or not). I only tested ASCII and UTF-8 file system encodings. I don't know if we can support more encodings. Python has few builtin encodings. Other encodings are implemented in Python: we have to import them, but we need the codec to import a module, so... I don't think that Windows is affected by this issue because it has a better API for unicode filenames and command line arguments, and most patched functions are surrounded by #ifndef WINDOWS ... #endif |
If I understood correctly, my patch is also required to import a module having a non-ASCII full path if the file system encoding is ASCII. |
Oh, it doesn't work: get_codeset() returns NULL, because the codec register is empty when get_codeset() is called (with my patch). |
New patch fixing more issues about undecodable filenames. Lib/test/test_subprocess.py | 4 - TODO:
I restored code setting the system encoding. The patch fixes also _posixsubprocess.fork_exec() to support undecodable current working directory. |
New version of the patch: all tests pass except of 3 (test_ftplib, test_pep3120, test_traceback). |
I commited the platform.py patch as r80166 (trunk) and r80167 (py3k), but quickly reverted it because the patch on trunk broke Python bootstrap. The patch might be applied, but only on py3k and with more tests (ensure that it doesn't break bootstrap on any OS) :-) |
Updated patch:
|
$ diffstat ~/surrogates-7.patch
Doc/library/tarfile.rst | 15 +--
Include/moduleobject.h | 1
Lib/platform.py | 12 +-
Lib/subprocess.py | 2
Lib/tarfile.py | 14 --
Lib/test/regrtest.py | 5 -
Lib/test/test_import.py | 5 +
Lib/test/test_reprlib.py | 4
Lib/test/test_subprocess.py | 4
Lib/test/test_tarfile.py | 4
Lib/test/test_urllib.py | 8 +
Lib/test/test_urllib2.py | 4
Lib/test/test_xml_etree.py | 6 +
Lib/traceback.py | 10 +-
Lib/unittest/runner.py | 4
Modules/_ctypes/callproc.c | 12 +-
Modules/_ssl.c | 10 +-
Modules/_tkinter.c | 6 -
Modules/getpath.c | 100 ++++++++++++++++++--
Modules/main.c | 46 +++++ Modules/posixmodule.c | 18 ++- |
I haven't reviewed the patch in detail yet, but it seems to me that it fixes independent issues. -1000 on that. One problem, one bug report in the tracker, one commit. If this issue is about the import machinery not working anymore if there is a non-ASCII character in the path, then why the heck does it touch posixmodule.c???? As for modules that have non-ASCII characters in their module name: this is, again, an unrelated issue (ISTM), so if you want to deal with it, please create a new issue. |
Right. First I only wanted to fix import machinery, but then I fixed a lot of "indenpendent" issues to test the patch on import. All fixes are related to surrogates. I'm splitting the big patch into small parts: see the dependency list of this issue. I will open a new issue for the import machinery. But this patch requires extra changes which are now discussed in new issues.
I opened issue bpo-8391 for this change: "os.execvpe() doesn't support surrogates in env". |
I opened a different issue to use surrogates in Python module path: bpo-8611, but the issue is not specific to surrogates ("Python3 doesn't support locale different than utf8 and an non-ASCII path (POSIX)"). |
I created a new svn branch for my work on import in unicode. I will open a new issue and so I close this one. |
Remove dependency on bpo-6697 to be able to close this issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: