classification
Title: On Windows, don't encode filenames in the import machinery
Type: Stage: resolved
Components: Interpreter Core, Unicode, Windows Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Drekin, Steven.Velez, amaury.forgeotdarc, eric.snow, haypo, pitrou, python-dev, terry.reedy
Priority: normal Keywords: patch

Created on 2011-03-20 23:58 by haypo, last changed 2013-08-26 20:33 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
dynload_win.patch haypo, 2011-03-21 11:13 review
parser_unicode-3.patch haypo, 2013-08-26 14:11 review
Messages (14)
msg131574 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-03-20 23:58
With #3080, Python 3.3 does now manipulate module paths and names as Unicode in the import machinery. But in 3 remaining places, it does encode filenames (to the ANSI code page):


a) _PyImport_LoadDynamicModule()

It should pass directly the PyObject* (instead of a char*) to _PyImport_GetDynLoadFunc(), but only on Windows (we may change the function name for Windows). _PyImport_GetDynLoadFunc() of dynload_win.c has to be patched to use the Unicode API (eg. LoadLibraryEx => LoadLibraryExW).


b) write_compiled_module()

The problem is to implement open_exclusive() for Windows using Unicode. open_exclusive() uses open() on Windows, but open() expects the filename as a byte string. We may use _Py_fopen() (_wfopen), but this function doesn't have an option to open the file in exclusive mode (O_EXCL flag). GNU has an extension: "x" flag in the file mode, but Windows doesn't support it.

The file is passed to marshal functions like PyMarshal_WriteLongToFile(), and so the file have to be a FILE*.


c) parse_source_module()

=> covered by the issue #10785.
msg131631 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-03-21 10:53
open_exclusive() was created by:

changeset:   14708:89b2aee43e0b
branch:      legacy-trunk
user:        Guido van Rossum <guido@python.org>
date:        Wed Sep 20 20:31:38 2000 +0000
files:       Python/import.c
description:
On Unix, use O_EXCL when creating the .pyc/.pyo files, to avoid a race condition
msg131633 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-03-21 11:13
dynload_win.patch: Fix part (a), _PyImport_LoadDynamicModule().
msg132971 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-04 21:13
New changeset 1b7f484bab6e by Victor Stinner in branch 'default':
Issue #11619: _PyImport_LoadDynamicModule() doesn't encode the path to bytes
http://hg.python.org/cpython/rev/1b7f484bab6e
msg134117 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-20 01:28
New changeset e4e92d68ba3a by Victor Stinner in branch 'default':
Close #11619: write_compiled_module() doesn't encode the filename
http://hg.python.org/cpython/rev/e4e92d68ba3a
msg134118 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-04-20 01:43
> c) parse_source_module()
> => covered by the issue #10785.

Issue #10785 didn't change parse_source_module(): it does still encode the filename.

We need Unicode version of PyParser_ASTFromFile() and PyAST_Compile(): a new version of these functions accepting a filename as a Unicode string.

For PyParser_ASTFromFile(): #10785 prepared the work.

For PyAST_Compile(): struct compiler stores the filename as a byte string, the filename should be stored as Unicode.
msg134183 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-04-20 21:16
compile_filename.patch:
 - Add PyErr_ProgramTextObject() and PyErr_WarnExplicitObject() functions
 - Store the filename as Unicode in compile.c
 - Remove the filename from get_ref_type() fatal error (I never see such fatal error, and I hope that it does never happen, so the filename should not really matter here)

The patch prepares the work to pass the filename to the compiler directly as Unicode.
msg134285 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-04-22 23:44
Another huge patch to support Unicode filenames: parser_unicode.patch


 Doc/c-api/exceptions.rst |   26 +++++++++++---
 Include/ast.h            |    5 ++
 Include/compile.h        |   15 +++++++-
 Include/parsetok.h       |   42 ++++++++++++++++++-----
 Include/pyerrors.h       |    7 +++
 Include/pythonrun.h      |   16 ++++++++
 Include/symtable.h       |    6 ++-
 Include/warnings.h       |    8 ++++
 Modules/parsermodule.c   |   49 +++++++++++++++++----------
 Modules/symtablemodule.c |   10 +++--
 Parser/parsetok.c        |   82 +++++++++++++++++++++++++++++++++++++--------
 Python/_warnings.c       |   31 +++++++++++------
 Python/ast.c             |   40 ++++++++++++----------
 Python/compile.c         |   69 +++++++++++++++++++++-----------------
 Python/errors.c          |   57 ++++++++++++++++++++++---------
 Python/future.c          |   27 +++++++++++---
 Python/import.c          |   20 +++--------
 Python/pythonrun.c       |   85 +++++++++++++++++++++++++++++++++++++----------
 Python/symtable.c        |   73 +++++++++++++++++++++++++++-------------
 19 files changed, 480 insertions(+), 188 deletions(-)

It creates new functions of the following functions which are undocumented:
 - PyAST_FromNode
 - PyFuture_FromAST
 - PyAST_Compile
 - PyParser_ParseFileFlagsEx
 - PyParser_ParseStringFlagsFilenameEx
 - PyErr_ProgramText
 - PyParser_ASTFromString
 - PyParser_ASTFromFile
 - PySymtable_Build

We might remove these functions, but they are part of the public API (but they are undocumented).
msg178882 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-01-03 01:37
The patch is really huge for such a very rare use case, so I prefer to close the issue as wont fix. Common cases with non-ASCII names are already handled correctly in Python 3.3.
msg193837 - (view) Author: Adam BartoŇ° (Drekin) * Date: 2013-07-28 18:52
Is there a chance this will be fixed at least in Python 4?
msg194616 - (view) Author: Steven Velez (Steven.Velez) Date: 2013-08-07 14:29
This may be a small use case, but a use case none-the less.  In my situation, I am distributing a frozen python package and it runs under the users home directory.   If the user's name has international characters, this will fail.

I expect we will have similar problems when dealing with our application which embeds python and is also running from within the user directory...
msg196002 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-08-23 18:57
I reopen the issue because some users are now requesting this feature.

I updated  parser_unicode.patch to the last Python version. The new patch has just a minor nit: test_symtable does crash :-D

I will investigate the crash later.
msg196208 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-08-26 14:11
> I updated  parser_unicode.patch to the last Python version. The new patch has just a minor nit: test_symtable does crash :-D

Fixed in new patch: parser_unicode-3.patch
msg196245 - (view) Author: Roundup Robot (python-dev) Date: 2013-08-26 20:33
New changeset df2fdd42b375 by Victor Stinner in branch 'default':
Close #11619: The parser and the import machinery do not encode Unicode
http://hg.python.org/cpython/rev/df2fdd42b375
History
Date User Action Args
2013-08-26 20:33:50python-devsetstatus: open -> closed
resolution: fixed
messages: + msg196245
2013-08-26 14:41:09hayposetfiles: - parser_unicode-2.patch
2013-08-26 14:41:08hayposetfiles: - parser_unicode.patch
2013-08-26 14:11:18hayposetfiles: + parser_unicode-3.patch

messages: + msg196208
2013-08-23 18:57:36hayposetstatus: closed -> open
files: + parser_unicode-2.patch
resolution: wont fix -> (no value)
messages: + msg196002
2013-08-07 15:14:25eric.snowsetnosy: + eric.snow
2013-08-07 14:29:53Steven.Velezsetnosy: + Steven.Velez
messages: + msg194616
2013-07-28 18:52:07Drekinsetnosy: + Drekin
messages: + msg193837
2013-01-03 01:37:47hayposetstatus: open -> closed
resolution: wont fix
messages: + msg178882
2011-04-22 23:44:58hayposetfiles: - compile_filename.patch
2011-04-22 23:44:53hayposetfiles: + parser_unicode.patch

messages: + msg134285
2011-04-20 21:16:53hayposetfiles: + compile_filename.patch

messages: + msg134183
2011-04-20 01:43:39hayposetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg134118
2011-04-20 01:28:32python-devsetstatus: open -> closed

resolution: fixed
messages: + msg134117
stage: resolved
2011-04-04 21:13:52python-devsetnosy: + python-dev
messages: + msg132971
2011-03-21 11:13:42hayposetfiles: + dynload_win.patch

messages: + msg131633
keywords: + patch
nosy: terry.reedy, amaury.forgeotdarc, pitrou, haypo
2011-03-21 10:53:56hayposetnosy: terry.reedy, amaury.forgeotdarc, pitrou, haypo
messages: + msg131631
2011-03-21 00:59:19terry.reedysetnosy: + terry.reedy
2011-03-21 00:56:42terry.reedylinkissue10828 superseder
2011-03-20 23:58:51haypocreate