Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Windows, don't encode filenames in the import machinery #55828

Closed
vstinner opened this issue Mar 20, 2011 · 14 comments
Closed

On Windows, don't encode filenames in the import machinery #55828

vstinner opened this issue Mar 20, 2011 · 14 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows topic-unicode

Comments

@vstinner
Copy link
Member

BPO 11619
Nosy @terryjreedy, @amauryfa, @pitrou, @vstinner, @ericsnowcurrently
Files
  • dynload_win.patch
  • parser_unicode-3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-08-26.20:33:50.833>
    created_at = <Date 2011-03-20.23:58:51.271>
    labels = ['interpreter-core', 'expert-unicode', 'OS-windows']
    title = "On Windows, don't encode filenames in the import machinery"
    updated_at = <Date 2013-08-26.20:33:50.832>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2013-08-26.20:33:50.832>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-08-26.20:33:50.833>
    closer = 'python-dev'
    components = ['Interpreter Core', 'Unicode', 'Windows']
    creation = <Date 2011-03-20.23:58:51.271>
    creator = 'vstinner'
    dependencies = []
    files = ['21321', '31472']
    hgrepos = []
    issue_num = 11619
    keywords = ['patch']
    message_count = 14.0
    messages = ['131574', '131631', '131633', '132971', '134117', '134118', '134183', '134285', '178882', '193837', '194616', '196002', '196208', '196245']
    nosy_count = 8.0
    nosy_names = ['terry.reedy', 'amaury.forgeotdarc', 'pitrou', 'vstinner', 'python-dev', 'eric.snow', 'Drekin', 'Steven.Velez']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue11619'
    versions = ['Python 3.3']

    @vstinner
    Copy link
    Member Author

    With bpo-3080, Python 3.3 does now manipulate module paths and names as Unicode in the import machinery. But in 3 remaining places, it does encode filenames (to the ANSI code page):

    a) _PyImport_LoadDynamicModule()

    It should pass directly the PyObject* (instead of a char*) to _PyImport_GetDynLoadFunc(), but only on Windows (we may change the function name for Windows). _PyImport_GetDynLoadFunc() of dynload_win.c has to be patched to use the Unicode API (eg. LoadLibraryEx => LoadLibraryExW).

    b) write_compiled_module()

    The problem is to implement open_exclusive() for Windows using Unicode. open_exclusive() uses open() on Windows, but open() expects the filename as a byte string. We may use _Py_fopen() (_wfopen), but this function doesn't have an option to open the file in exclusive mode (O_EXCL flag). GNU has an extension: "x" flag in the file mode, but Windows doesn't support it.

    The file is passed to marshal functions like PyMarshal_WriteLongToFile(), and so the file have to be a FILE*.

    c) parse_source_module()

    => covered by the issue bpo-10785.

    @vstinner vstinner added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode OS-windows labels Mar 20, 2011
    @vstinner
    Copy link
    Member Author

    open_exclusive() was created by:

    changeset: 14708:89b2aee43e0b
    branch: legacy-trunk
    user: Guido van Rossum <guido@python.org>
    date: Wed Sep 20 20:31:38 2000 +0000
    files: Python/import.c
    description:
    On Unix, use O_EXCL when creating the .pyc/.pyo files, to avoid a race condition

    @vstinner
    Copy link
    Member Author

    dynload_win.patch: Fix part (a), _PyImport_LoadDynamicModule().

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 4, 2011

    New changeset 1b7f484bab6e by Victor Stinner in branch 'default':
    Issue bpo-11619: _PyImport_LoadDynamicModule() doesn't encode the path to bytes
    http://hg.python.org/cpython/rev/1b7f484bab6e

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 20, 2011

    New changeset e4e92d68ba3a by Victor Stinner in branch 'default':
    Close bpo-11619: write_compiled_module() doesn't encode the filename
    http://hg.python.org/cpython/rev/e4e92d68ba3a

    @python-dev python-dev mannequin closed this as completed Apr 20, 2011
    @vstinner
    Copy link
    Member Author

    c) parse_source_module()
    => covered by the issue bpo-10785.

    Issue bpo-10785 didn't change parse_source_module(): it does still encode the filename.

    We need Unicode version of PyParser_ASTFromFile() and PyAST_Compile(): a new version of these functions accepting a filename as a Unicode string.

    For PyParser_ASTFromFile(): bpo-10785 prepared the work.

    For PyAST_Compile(): struct compiler stores the filename as a byte string, the filename should be stored as Unicode.

    @vstinner vstinner reopened this Apr 20, 2011
    @vstinner
    Copy link
    Member Author

    compile_filename.patch:

    • Add PyErr_ProgramTextObject() and PyErr_WarnExplicitObject() functions
    • Store the filename as Unicode in compile.c
    • Remove the filename from get_ref_type() fatal error (I never see such fatal error, and I hope that it does never happen, so the filename should not really matter here)

    The patch prepares the work to pass the filename to the compiler directly as Unicode.

    @vstinner
    Copy link
    Member Author

    Another huge patch to support Unicode filenames: parser_unicode.patch

    Doc/c-api/exceptions.rst | 26 +++++++++++---
    Include/ast.h | 5 ++
    Include/compile.h | 15 +++++++-
    Include/parsetok.h | 42 ++++++++++++++++++-----
    Include/pyerrors.h | 7 +++
    Include/pythonrun.h | 16 ++++++++
    Include/symtable.h | 6 ++-
    Include/warnings.h | 8 ++++
    Modules/parsermodule.c | 49 +++++++++++++++++----------
    Modules/symtablemodule.c | 10 +++--
    Parser/parsetok.c | 82 +++++++++++++++++++++++++++++++++++++--------
    Python/_warnings.c | 31 +++++++++++------
    Python/ast.c | 40 ++++++++++++----------
    Python/compile.c | 69 +++++++++++++++++++++-----------------
    Python/errors.c | 57 ++++++++++++++++++++++---------
    Python/future.c | 27 +++++++++++---
    Python/import.c | 20 +++--------
    Python/pythonrun.c | 85 +++++++++++++++++++++++++++++++++++++----------
    Python/symtable.c | 73 +++++++++++++++++++++++++++-------------
    19 files changed, 480 insertions(+), 188 deletions(-)

    It creates new functions of the following functions which are undocumented:

    • PyAST_FromNode
    • PyFuture_FromAST
    • PyAST_Compile
    • PyParser_ParseFileFlagsEx
    • PyParser_ParseStringFlagsFilenameEx
    • PyErr_ProgramText
    • PyParser_ASTFromString
    • PyParser_ASTFromFile
    • PySymtable_Build

    We might remove these functions, but they are part of the public API (but they are undocumented).

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 3, 2013

    The patch is really huge for such a very rare use case, so I prefer to close the issue as wont fix. Common cases with non-ASCII names are already handled correctly in Python 3.3.

    @vstinner vstinner closed this as completed Jan 3, 2013
    @Drekin
    Copy link
    Mannequin

    Drekin mannequin commented Jul 28, 2013

    Is there a chance this will be fixed at least in Python 4?

    @StevenVelez
    Copy link
    Mannequin

    StevenVelez mannequin commented Aug 7, 2013

    This may be a small use case, but a use case none-the less. In my situation, I am distributing a frozen python package and it runs under the users home directory. If the user's name has international characters, this will fail.

    I expect we will have similar problems when dealing with our application which embeds python and is also running from within the user directory...

    @vstinner
    Copy link
    Member Author

    I reopen the issue because some users are now requesting this feature.

    I updated parser_unicode.patch to the last Python version. The new patch has just a minor nit: test_symtable does crash :-D

    I will investigate the crash later.

    @vstinner vstinner reopened this Aug 23, 2013
    @vstinner
    Copy link
    Member Author

    I updated parser_unicode.patch to the last Python version. The new patch has just a minor nit: test_symtable does crash :-D

    Fixed in new patch: parser_unicode-3.patch

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 26, 2013

    New changeset df2fdd42b375 by Victor Stinner in branch 'default':
    Close bpo-11619: The parser and the import machinery do not encode Unicode
    http://hg.python.org/cpython/rev/df2fdd42b375

    @python-dev python-dev mannequin closed this as completed Aug 26, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant