Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.6 cannot reopen .pyc file with non-ASCII path #76562

Closed
tianjg mannequin opened this issue Dec 20, 2017 · 27 comments
Closed

Python 3.6 cannot reopen .pyc file with non-ASCII path #76562

tianjg mannequin opened this issue Dec 20, 2017 · 27 comments
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@tianjg
Copy link
Mannequin

tianjg mannequin commented Dec 20, 2017

BPO 32381
Nosy @pfmoore, @vstinner, @tjguk, @ezio-melotti, @zware, @eryksun, @zooba, @izbyshev, @ZackerySpytz
PRs
  • bpo-32381: .pyc files with non-ASCII paths cannot be reopened on Windows #14699
  • bpo-32381: Fix PyRun_SimpleFileExFlags() encoding #23642
  • [3.9] bpo-32381: Fix PyRun_SimpleFileExFlags() encoding (GH-23642) #23692
  • [3.8] bpo-32381: Fix PyRun_SimpleFileExFlags() encoding (GH-23642) (GH-23692) #23696
  • bpo-32381: Rewrite PyErr_ProgramText() #23700
  • bpo-32381: Add _PyRun_SimpleFileObject() #23709
  • bpo-32381: Remove unused _Py_fopen() function #23711
  • bpo-32381: Add _PyRun_AnyFileObject() #23723
  • bpo-32381: pymain_run_command() uses PyCF_IGNORE_COOKIE #23724
  • bpo-32381: pymain_run_file() uses PySys_FormatStderr() #23778
  • Files
  • 20171218111240.jpg
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-12-08.17:12:35.886>
    created_at = <Date 2017-12-20.06:07:24.842>
    labels = ['interpreter-core', 'type-bug', '3.8', 'OS-windows', '3.10', 'expert-unicode', '3.9']
    title = 'Python 3.6 cannot reopen .pyc file with non-ASCII path'
    updated_at = <Date 2021-01-06.09:13:10.218>
    user = 'https://bugs.python.org/tianjg'

    bugs.python.org fields:

    activity = <Date 2021-01-06.09:13:10.218>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-12-08.17:12:35.886>
    closer = 'izbyshev'
    components = ['Interpreter Core', 'Unicode', 'Windows']
    creation = <Date 2017-12-20.06:07:24.842>
    creator = 'tianjg'
    dependencies = []
    files = ['47341']
    hgrepos = []
    issue_num = 32381
    keywords = ['patch', '3.6regression']
    message_count = 27.0
    messages = ['308705', '308707', '308709', '308723', '308831', '308832', '347896', '382502', '382503', '382504', '382509', '382510', '382512', '382514', '382734', '382743', '382750', '382751', '382752', '382770', '382776', '382804', '382806', '383063', '383067', '383653', '384477']
    nosy_count = 11.0
    nosy_names = ['paul.moore', 'vstinner', 'tim.golden', 'ezio.melotti', 'zach.ware', 'eryksun', 'steve.dower', 'izbyshev', 'ZackerySpytz', 'Tianjg', 'tianjg']
    pr_nums = ['14699', '23642', '23692', '23696', '23700', '23709', '23711', '23723', '23724', '23778']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue32381'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @tianjg
    Copy link
    Mannequin Author

    tianjg mannequin commented Dec 20, 2017

    have a problem that python3.6 can not reopen .pyc file with Chinese path, and python3.5 can reopen the same pyc file. As shown in the picture

    @tianjg tianjg mannequin added build The build process and cross-build OS-windows labels Dec 20, 2017
    @eryksun
    Copy link
    Contributor

    eryksun commented Dec 20, 2017

    run_file encodes the file path via PyUnicode_EncodeFSDefault, which encodes as UTF-8 in Windows, starting with 3.6. PyRun_SimpleFileExFlags subsequently tries to open this encoded path via _Py_fopen, which calls fopen. The CRT expects an ANSI encoded path, so only the common ASCII subset will work. Non-ASCII paths will fail.

    This could be addressed in _Py_fopen by decoding the path and calling _wfopen instead of fopen.

    Executing a .pyc also fails in 3.5 if the wide-character path can't be encoded as ANSI, but the 3.5 branch only accepts security fixes.

    @eryksun eryksun added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode labels Dec 20, 2017
    @eryksun eryksun changed the title python3.6 can not reopen .pyc file with Chinese path Python 3.6 cannot reopen .pyc file with non-ASCII path Dec 20, 2017
    @eryksun eryksun added type-bug An unexpected behavior, bug, or error and removed build The build process and cross-build labels Dec 20, 2017
    @tianjg
    Copy link
    Mannequin Author

    tianjg mannequin commented Dec 20, 2017

    Thanks a lot. What should I do to reopen .pyc file with non-ASCII path use
    python3.6 in cmd?Could you give me* some **code examples*.Thank you again,
    and I look forward to hearing from you

    2017-12-20 15:35 GMT+08:00 Eryk Sun <report@bugs.python.org>:

    Eryk Sun <eryksun@gmail.com> added the comment:

    run_file encodes the file path via PyUnicode_EncodeFSDefault, which
    encodes as UTF-8 in Windows, starting with 3.6. PyRun_SimpleFileExFlags
    subsequently tries to open this encoded path via _Py_fopen, which calls
    fopen. The CRT expects an ANSI encoded path, so only the common ASCII
    subset will work. Non-ASCII paths will fail.

    This could be addressed in _Py_fopen by decoding the path and calling
    _wfopen instead of fopen.

    Executing a .pyc also fails in 3.5 if the wide-character path can't be
    encoded as ANSI, but the 3.5 branch only accepts security fixes.

    ----------
    components: +Interpreter Core, Unicode
    nosy: +eryksun, ezio.melotti, vstinner
    stage: -> test needed
    title: python3.6 can not reopen .pyc file with Chinese path -> Python 3.6
    cannot reopen .pyc file with non-ASCII path
    type: compile error -> behavior
    versions: +Python 3.7


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue32381\>


    @eryksun
    Copy link
    Contributor

    eryksun commented Dec 20, 2017

    Workarounds: (1) force 3.6 to use the legacy ANSI filesystem encoding by setting the environment variable PYTHONLEGACYWINDOWSFSENCODING. (2) Use 8.3 DOS names, if creating them is enabled on the volume. You can check their value in CMD via dir /x. (3) Create alternative directory symbolic links or junctions with ASCII names via CMD's mklink command.

    @vstinner
    Copy link
    Member

    run_file() gets a wchar_t* string which comes from wmain() argv.

    run_file() encodes the wchar_t* using PyUnicode_EncodeFSDefault().

    Later, PyRun_SimpleFileExFlags() calls indirectly fopen() with the encoded filename.

    This could be addressed in _Py_fopen by decoding the path and calling _wfopen instead of fopen.

    I agree that it's the correct fix.

    I would make _Py_fopen() more compatible with the PEP-529.

    @vstinner
    Copy link
    Member

    I would make _Py_fopen() more compatible with the PEP-529.

    Typo: It* would

    @ZackerySpytz ZackerySpytz mannequin added 3.8 only security fixes 3.9 only security fixes labels Jul 11, 2019
    @vstinner
    Copy link
    Member

    Hum. In fact, this problem can be fixed differently: modify PyRun_xxx() functions to pass the filename as an Unicode string. Maybe pass it as a wchar_t* string or even a Python str object.

    @eryksun eryksun added the 3.10 only security fixes label Dec 4, 2020
    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Dec 4, 2020

    Thanks, Eryk, for catching the dup, I missed it somehow.

    @ZackerySpytz: do you plan to proceed with your PR? If not, I can pick it up -- this issue broke the software I develop after upgrade to 3.8.

    I filed bpo-42569 to hopefully clarify the status of _Py_fopen() which became murky to me.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 4, 2020

    I can reproduce the issue on Python 3.10 with a script called 北京市.py which contains: print("hello").

    c:\> python 北京市.py
    hello

    c:\>python __pycache__\北京市.cpython-310.pyc
    python: Can't reopen .pyc file

    And with my PR 23642 fix, it works as expected:

    C:\>python __pycache__\北京市.cpython-310.pyc
    hello

    @vstinner
    Copy link
    Member

    vstinner commented Dec 4, 2020

    bpo-42568 is marked as a duplicate of this issue.

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Dec 4, 2020

    Thanks for the patch, Victor, it looks good.

    Just so it doesn't get lost: the problem with the contract of PyErr_ProgramText() which I mentioned in my dup 42568 is still there.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 4, 2020

    Just so it doesn't get lost: the problem with the contract of PyErr_ProgramText() which I mentioned in my dup 42568 is still there.

    It seems like PyErr_ProgramText() is no longer used in Python. PyErr_ProgramTextObject() is used and it pass the filename as Python object to _Py_fopen_obj().

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Dec 4, 2020

    It seems like PyErr_ProgramText() is no longer used in Python.

    Isn't it a part of the public API? I can't find it in the docs, but it seems to be declared in the public header.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 4, 2020

    Isn't it a part of the public API? I can't find it in the docs, but it seems to be declared in the public header.

    The Python C API has a strange history...

    @vstinner
    Copy link
    Member

    vstinner commented Dec 8, 2020

    New changeset b6d98c1 by Victor Stinner in branch 'master':
    bpo-32381: Fix PyRun_SimpleFileExFlags() encoding (GH-23642)
    b6d98c1

    @vstinner
    Copy link
    Member

    vstinner commented Dec 8, 2020

    New changeset f0e42ae by Victor Stinner in branch '3.9':
    bpo-32381: Fix PyRun_SimpleFileExFlags() encoding (GH-23642) (GH-23692)
    f0e42ae

    @vstinner
    Copy link
    Member

    vstinner commented Dec 8, 2020

    New changeset b5cf308 by Victor Stinner in branch '3.8':
    bpo-32381: Fix PyRun_SimpleFileExFlags() encoding (GH-23642) (GH-23692) (GH-23696)
    b5cf308

    @vstinner
    Copy link
    Member

    vstinner commented Dec 8, 2020

    It's now fixed in 3.8, 3.9 and master branches.

    Thanks for the bug report tianjg.

    @vstinner vstinner removed the 3.7 (EOL) end of life label Dec 8, 2020
    @vstinner vstinner closed this as completed Dec 8, 2020
    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Dec 8, 2020

    Thanks for the fix and backports!

    @izbyshev izbyshev mannequin added the 3.7 (EOL) end of life label Dec 8, 2020
    @izbyshev izbyshev mannequin reopened this Dec 8, 2020
    @izbyshev izbyshev mannequin removed the 3.7 (EOL) end of life label Dec 8, 2020
    @izbyshev izbyshev mannequin closed this as completed Dec 8, 2020
    @vstinner
    Copy link
    Member

    vstinner commented Dec 8, 2020

    New changeset 815506d by Victor Stinner in branch 'master':
    bpo-32381: Rewrite PyErr_ProgramText() (GH-23700)
    815506d

    @vstinner
    Copy link
    Member

    vstinner commented Dec 8, 2020

    New changeset 550e467 by Victor Stinner in branch 'master':
    bpo-32381: Add _PyRun_SimpleFileObject() (GH-23709)
    550e467

    @vstinner
    Copy link
    Member

    vstinner commented Dec 9, 2020

    New changeset ca06440 by Victor Stinner in branch 'master':
    bpo-32381: Remove unused _Py_fopen() function (GH-23711)
    ca06440

    @vstinner
    Copy link
    Member

    vstinner commented Dec 9, 2020

    New changeset a82f63f by Victor Stinner in branch 'master':
    bpo-32381: Add _PyRun_AnyFileObject() (GH-23723)
    a82f63f

    @vstinner
    Copy link
    Member

    My PR 23778 fix the encoding/error handler when writing the filename into stderr, when the file does not exist:

    $ LANG= PYTHONCOERCECLOCALE=0 ./python -X utf8=0 héllo.py
    ./python: can't open file '/home/vstinner/python/master/h\udcc3\udca9llo.py': [Errno 2] No such file or directory

    @vstinner
    Copy link
    Member

    New changeset ceb4202 by Victor Stinner in branch 'master':
    bpo-32381: pymain_run_file() uses PySys_FormatStderr() (GH-23778)
    ceb4202

    @vstinner
    Copy link
    Member

    New changeset a124916 by Victor Stinner in branch 'master':
    bpo-32381: pymain_run_command() uses PyCF_IGNORE_COOKIE (GH-23724)
    a124916

    @vstinner
    Copy link
    Member

    vstinner commented Jan 6, 2021

    boost-python was using the removed private _Py_fopen() function, I proposed boostorg/python#344 to replace _Py_fopen() with fopen().

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants