classification
Title: Python 3.6 cannot reopen .pyc file with non-ASCII path
Type: behavior Stage: resolved
Components: Interpreter Core, Unicode, Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Tianjg, ZackerySpytz, eryksun, ezio.melotti, izbyshev, paul.moore, steve.dower, tianjg, tim.golden, vstinner, zach.ware
Priority: normal Keywords: 3.6regression, patch

Created on 2017-12-20 06:07 by tianjg, last changed 2021-01-06 09:13 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
20171218111240.jpg tianjg, 2017-12-20 06:07
Pull Requests
URL Status Linked Edit
PR 14699 closed ZackerySpytz, 2019-07-11 04:52
PR 23642 merged vstinner, 2020-12-04 16:26
PR 23692 merged vstinner, 2020-12-08 13:46
PR 23696 merged vstinner, 2020-12-08 15:19
PR 23700 merged vstinner, 2020-12-08 16:57
PR 23709 merged vstinner, 2020-12-08 23:00
PR 23711 merged vstinner, 2020-12-08 23:36
PR 23723 merged vstinner, 2020-12-09 20:24
PR 23724 merged vstinner, 2020-12-09 21:40
PR 23778 merged vstinner, 2020-12-15 14:35
Messages (27)
msg308705 - (view) Author: (tianjg) Date: 2017-12-20 06:07
have a problem that python3.6 can not reopen .pyc file with Chinese path, and python3.5 can reopen the same pyc file. As shown in the picture
msg308707 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-12-20 07:35
run_file encodes the file path via PyUnicode_EncodeFSDefault, which encodes as UTF-8 in Windows, starting with 3.6. PyRun_SimpleFileExFlags subsequently tries to open this encoded path via _Py_fopen, which calls fopen. The CRT expects an ANSI encoded path, so only the common ASCII subset will work. Non-ASCII paths will fail.

This could be addressed in _Py_fopen by decoding the path and calling _wfopen instead of fopen. 

Executing a .pyc also fails in 3.5 if the wide-character path can't be encoded as ANSI, but the 3.5 branch only accepts security fixes.
msg308709 - (view) Author: (Tianjg) Date: 2017-12-20 09:53
Thanks a lot. What should I do to  reopen .pyc file with non-ASCII path use
python3.6 in cmd?Could you give me* some **code examples*.Thank you again,
and I look forward to hearing from you

2017-12-20 15:35 GMT+08:00 Eryk Sun <report@bugs.python.org>:

>
> Eryk Sun <eryksun@gmail.com> added the comment:
>
> run_file encodes the file path via PyUnicode_EncodeFSDefault, which
> encodes as UTF-8 in Windows, starting with 3.6. PyRun_SimpleFileExFlags
> subsequently tries to open this encoded path via _Py_fopen, which calls
> fopen. The CRT expects an ANSI encoded path, so only the common ASCII
> subset will work. Non-ASCII paths will fail.
>
> This could be addressed in _Py_fopen by decoding the path and calling
> _wfopen instead of fopen.
>
> Executing a .pyc also fails in 3.5 if the wide-character path can't be
> encoded as ANSI, but the 3.5 branch only accepts security fixes.
>
> ----------
> components: +Interpreter Core, Unicode
> nosy: +eryksun, ezio.melotti, vstinner
> stage:  -> test needed
> title: python3.6 can not reopen .pyc file with Chinese path -> Python 3.6
> cannot reopen .pyc file with non-ASCII path
> type: compile error -> behavior
> versions: +Python 3.7
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue32381>
> _______________________________________
>
msg308723 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-12-20 11:31
Workarounds: (1) force 3.6 to use the legacy ANSI filesystem encoding by setting the environment variable PYTHONLEGACYWINDOWSFSENCODING. (2) Use 8.3 DOS names, if creating them is enabled on the volume. You can check their value in CMD via `dir /x`. (3) Create alternative directory symbolic links or junctions with ASCII names via CMD's `mklink` command.
msg308831 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-12-20 22:35
run_file() gets a wchar_t* string which comes from wmain() argv.

run_file() encodes the wchar_t* using PyUnicode_EncodeFSDefault().

Later, PyRun_SimpleFileExFlags() calls indirectly fopen() with the encoded filename.

> This could be addressed in _Py_fopen by decoding the path and calling _wfopen instead of fopen. 

I agree that it's the correct fix.

I would make _Py_fopen() more compatible with the PEP 529.
msg308832 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-12-20 22:35
> I would make _Py_fopen() more compatible with the PEP 529.

Typo: It* would
msg347896 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-14 09:06
Hum. In fact, this problem can be fixed differently: modify PyRun_xxx() functions to pass the filename as an Unicode string. Maybe pass it as a wchar_t* string or even a Python str object.
msg382502 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2020-12-04 16:22
Thanks, Eryk, for catching the dup, I missed it somehow.

@ZackerySpytz: do you plan to proceed with your PR? If not, I can pick it up -- this issue broke the software I develop after upgrade to 3.8.

I filed issue 42569 to hopefully clarify the status of _Py_fopen() which became murky to me.
msg382503 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-04 16:39
I can reproduce the issue on Python 3.10 with a script called 北京市.py which contains: print("hello").

c:\> python 北京市.py
hello

c:\>python __pycache__\北京市.cpython-310.pyc
python: Can't reopen .pyc file

And with my PR 23642 fix, it works as expected:

C:\>python __pycache__\北京市.cpython-310.pyc
hello
msg382504 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-04 16:43
bpo-42568 is marked as a duplicate of this issue.
msg382509 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2020-12-04 16:55
Thanks for the patch, Victor, it looks good.

Just so it doesn't get lost: the problem with the contract of PyErr_ProgramText() which I mentioned in my dup 42568 is still there.
msg382510 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-04 17:01
> Just so it doesn't get lost: the problem with the contract of PyErr_ProgramText() which I mentioned in my dup 42568 is still there.

It seems like PyErr_ProgramText() is no longer used in Python. PyErr_ProgramTextObject() is used and it pass the filename as Python object to _Py_fopen_obj().
msg382512 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2020-12-04 17:12
> It seems like PyErr_ProgramText() is no longer used in Python.

Isn't it a part of the public API? I can't find it in the docs, but it seems to be declared in the public header.
msg382514 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-04 17:16
> Isn't it a part of the public API? I can't find it in the docs, but it seems to be declared in the public header.

The Python C API has a strange history...
msg382734 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-08 13:38
New changeset b6d98c10fff6f320f8fdf595c3f9a05d8be4e31d by Victor Stinner in branch 'master':
bpo-32381: Fix PyRun_SimpleFileExFlags() encoding (GH-23642)
https://github.com/python/cpython/commit/b6d98c10fff6f320f8fdf595c3f9a05d8be4e31d
msg382743 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-08 15:16
New changeset f0e42ae03c41ec32fcb3064772f46ff7f2c5ff3b by Victor Stinner in branch '3.9':
bpo-32381: Fix PyRun_SimpleFileExFlags() encoding (GH-23642) (GH-23692)
https://github.com/python/cpython/commit/f0e42ae03c41ec32fcb3064772f46ff7f2c5ff3b
msg382750 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-08 16:42
New changeset b5cf308de8b19bf8f77053013f7e8a944159e1aa by Victor Stinner in branch '3.8':
bpo-32381: Fix PyRun_SimpleFileExFlags() encoding (GH-23642) (GH-23692) (GH-23696)
https://github.com/python/cpython/commit/b5cf308de8b19bf8f77053013f7e8a944159e1aa
msg382751 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-08 16:43
It's now fixed in 3.8, 3.9 and master branches.

Thanks for the bug report tianjg.
msg382752 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2020-12-08 17:11
Thanks for the fix and backports!
msg382770 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-08 22:51
New changeset 815506d852daabc40e14ff0987c1142c0205fbe7 by Victor Stinner in branch 'master':
bpo-32381: Rewrite PyErr_ProgramText() (GH-23700)
https://github.com/python/cpython/commit/815506d852daabc40e14ff0987c1142c0205fbe7
msg382776 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-08 23:33
New changeset 550e4673be538d98b6ddf5550b3922539cf5c4b2 by Victor Stinner in branch 'master':
bpo-32381: Add _PyRun_SimpleFileObject() (GH-23709)
https://github.com/python/cpython/commit/550e4673be538d98b6ddf5550b3922539cf5c4b2
msg382804 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-09 19:54
New changeset ca064402079f889226cb107b26b329891431c319 by Victor Stinner in branch 'master':
bpo-32381: Remove unused _Py_fopen() function (GH-23711)
https://github.com/python/cpython/commit/ca064402079f889226cb107b26b329891431c319
msg382806 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-09 21:37
New changeset a82f63f5af027a0eab0f0812d750b804368cbd25 by Victor Stinner in branch 'master':
bpo-32381: Add _PyRun_AnyFileObject() (GH-23723)
https://github.com/python/cpython/commit/a82f63f5af027a0eab0f0812d750b804368cbd25
msg383063 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-15 15:03
My PR 23778 fix the encoding/error handler when writing the filename into stderr, when the file does not exist:

$ LANG= PYTHONCOERCECLOCALE=0 ./python -X utf8=0 héllo.py
./python: can't open file '/home/vstinner/python/master/h\udcc3\udca9llo.py': [Errno 2] No such file or directory
msg383067 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-15 15:25
New changeset ceb420251c1d635520049fbb7b5269a73d63fb58 by Victor Stinner in branch 'master':
bpo-32381: pymain_run_file() uses PySys_FormatStderr() (GH-23778)
https://github.com/python/cpython/commit/ceb420251c1d635520049fbb7b5269a73d63fb58
msg383653 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-23 18:17
New changeset a12491681f08a33abcca843f5150330740c91111 by Victor Stinner in branch 'master':
bpo-32381: pymain_run_command() uses PyCF_IGNORE_COOKIE (GH-23724)
https://github.com/python/cpython/commit/a12491681f08a33abcca843f5150330740c91111
msg384477 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-06 09:13
boost-python was using the removed private _Py_fopen() function, I proposed https://github.com/boostorg/python/pull/344 to replace _Py_fopen() with fopen().
History
Date User Action Args
2021-01-06 09:13:10vstinnersetmessages: + msg384477
2020-12-23 18:17:04vstinnersetmessages: + msg383653
2020-12-15 15:25:36vstinnersetmessages: + msg383067
2020-12-15 15:03:37vstinnersetmessages: + msg383063
2020-12-15 14:35:34vstinnersetpull_requests: + pull_request22636
2020-12-09 21:40:53vstinnersetpull_requests: + pull_request22585
2020-12-09 21:37:55vstinnersetmessages: + msg382806
2020-12-09 20:24:27vstinnersetpull_requests: + pull_request22584
2020-12-09 19:54:40vstinnersetmessages: + msg382804
2020-12-08 23:36:58vstinnersetpull_requests: + pull_request22574
2020-12-08 23:33:02vstinnersetmessages: + msg382776
2020-12-08 23:00:16vstinnersetpull_requests: + pull_request22573
2020-12-08 22:51:35vstinnersetmessages: + msg382770
2020-12-08 17:12:35izbyshevsetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: - Python 3.7
2020-12-08 17:11:57izbyshevsetstatus: closed -> open
versions: + Python 3.7
messages: + msg382752

resolution: fixed -> (no value)
stage: resolved -> patch review
2020-12-08 16:57:31vstinnersetpull_requests: + pull_request22567
2020-12-08 16:43:19vstinnersetstatus: open -> closed
versions: - Python 3.7
messages: + msg382751

resolution: fixed
stage: patch review -> resolved
2020-12-08 16:42:39vstinnersetmessages: + msg382750
2020-12-08 15:19:43vstinnersetpull_requests: + pull_request22563
2020-12-08 15:16:12vstinnersetmessages: + msg382743
2020-12-08 13:46:09vstinnersetpull_requests: + pull_request22561
2020-12-08 13:38:42vstinnersetmessages: + msg382734
2020-12-04 17:16:40vstinnersetmessages: + msg382514
2020-12-04 17:12:39izbyshevsetmessages: + msg382512
2020-12-04 17:01:06vstinnersetmessages: + msg382510
2020-12-04 16:55:19izbyshevsetmessages: + msg382509
2020-12-04 16:43:54vstinnersetmessages: + msg382504
2020-12-04 16:39:38vstinnersetmessages: + msg382503
2020-12-04 16:26:10vstinnersetpull_requests: + pull_request22510
2020-12-04 16:22:33izbyshevsetnosy: + izbyshev
messages: + msg382502
2020-12-04 15:24:00eryksunlinkissue42568 superseder
2020-12-04 15:23:25eryksunsetversions: + Python 3.10
2019-07-14 09:06:09vstinnersetmessages: + msg347896
2019-07-11 06:24:20ZackerySpytzsetnosy: + ZackerySpytz

versions: + Python 3.8, Python 3.9, - Python 3.6
2019-07-11 04:52:57ZackerySpytzsetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request14498
2017-12-20 22:35:31vstinnersetmessages: + msg308832
2017-12-20 22:35:02vstinnersetmessages: + msg308831
2017-12-20 14:24:03r.david.murraysetkeywords: + 3.6regression
2017-12-20 11:31:49eryksunsetmessages: + msg308723
2017-12-20 09:53:43Tianjgsetnosy: + Tianjg
messages: + msg308709
2017-12-20 07:35:55eryksunsettype: compile error -> behavior
title: python3.6 can not reopen .pyc file with Chinese path -> Python 3.6 cannot reopen .pyc file with non-ASCII path
components: + Interpreter Core, Unicode

nosy: + ezio.melotti, eryksun, vstinner
versions: + Python 3.7
messages: + msg308707
stage: test needed
2017-12-20 06:07:24tianjgcreate