classification
Title: Cannot install package with unicode module names on Windows
Type: behavior Stage: resolved
Components: Distutils Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.araujo Nosy List: dstufft, eric.araujo, jkloth, julien.malard, miss-islington, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2018-08-17 17:06 by julien.malard, last changed 2018-09-23 11:11 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 8799 merged julien.malard, 2018-08-17 17:10
PR 9117 merged miss-islington, 2018-09-08 20:32
PR 9118 merged miss-islington, 2018-09-08 20:32
PR 9126 merged serhiy.storchaka, 2018-09-09 14:15
PR 9503 closed miss-islington, 2018-09-23 06:13
PR 9504 closed miss-islington, 2018-09-23 06:13
PR 9506 merged serhiy.storchaka, 2018-09-23 06:51
PR 9510 merged miss-islington, 2018-09-23 07:32
Messages (14)
msg323699 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-08-18 10:32
Please provide more details. How to reproduce your issue? What you got, and what you expect to get?

Seems the code just before lines modified by your PR are purposed to solve this issue. Why it doesn't work?
msg323714 - (view) Author: Julien Malard (julien.malard) * Date: 2018-08-18 14:55
Hello,

Yes, it does seem odd that that code does not work. On my Windows machine (WIndows 7, 64 bits, running 32-bit Python) I checked and it seems that the code in the if block immediately preceding my PR does not run at all, whereby the error.

For a reproducible example, my Taqdir package, mostly consisting of unicode packages and modules, runs into this issue (and installs successfully after my proposed fix here combined with a separate PR in pip). Perhaps the most easily accessible example would be the Appveyor build (https://ci.appveyor.com/project/julienmalard/Tinamit) for my TInamit project, which has Taqdir as a dependency.

Thanks!

-Julien Malard

________________________________
દ્વારા: Serhiy Storchaka <report@bugs.python.org>
મોકલ્યું: 18 ઑગસ્ટ 2018 06:32
પ્રતિ: Julien Malard
વિષય: [issue34421] Cannot install package with unicode module names on Windows

New submission from Serhiy Storchaka <storchaka+cpython@gmail.com>:

Please provide more details. How to reproduce your issue? What you got, and what you expect to get?

Seems the code just before lines modified by your PR are purposed to solve this issue. Why it doesn't work?

----------
nosy: +serhiy.storchaka

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue34421>
_______________________________________
msg324861 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2018-09-08 20:31
New changeset 0afada163c7ef25c3a9d46ed445481fb69f2ecaf by Éric Araujo (Julien Malard) in branch 'master':
bpo-34421 avoid unicode error in distutils logging (GH-8799)
https://github.com/python/cpython/commit/0afada163c7ef25c3a9d46ed445481fb69f2ecaf
msg324862 - (view) Author: miss-islington (miss-islington) Date: 2018-09-08 20:44
New changeset 3b36642924a51e6bceb7033916c3049764817166 by Miss Islington (bot) in branch '3.6':
bpo-34421 avoid unicode error in distutils logging (GH-8799)
https://github.com/python/cpython/commit/3b36642924a51e6bceb7033916c3049764817166
msg324863 - (view) Author: miss-islington (miss-islington) Date: 2018-09-08 20:53
New changeset 77b92b15a5e5c84b91d3fd9d02f63db432fa8903 by Miss Islington (bot) in branch '3.7':
bpo-34421 avoid unicode error in distutils logging (GH-8799)
https://github.com/python/cpython/commit/77b92b15a5e5c84b91d3fd9d02f63db432fa8903
msg324874 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-09 07:06
I would prefer to use the backslashreplace error handler rather of the unicode-escape codec. Just as few lines above, but with ASCII encoding.

    msg = msg.encode('ascii', 'backslashreplace').decode('ascii')

It is still not clear to me why the current code purposed to handle this problem doesn't work in this case. We need to find the cause and fix the existing solution.
msg324878 - (view) Author: Jeremy Kloth (jkloth) * Date: 2018-09-09 09:47
The existing re-code solution is being triggered, as the `errors` in this case is 'surrogateescape' with an encoding of 'cp1252'.

Here, pip is using subprocess.Popen() to have Python run setup.py.  During execution, a filename, 'taqdir\\\u0634\u0645\u0627\u0631.py', which has characters not encodable in cp1252.

I think that here, Python is not configuring its stdin/stdout/stderr streams correctly when run as a subprocess connected to pipes.  Or, at least, subprocess.Popen() isn't passing the right (or enough) information to Python to get itself configured.

There should ultimately be a way to have Python (in a subprocess, on Windows) pass through Unicode untouched to its calling process.  I suppose it would mean setting the PYTHONIOENCODING envvar when using subprocess.

After all that, it seems that:
1) pip needs to be changed to support calling Python subprocesses to enable lossless unicode transmission,
2) change the `errors` check in distutils.log to include 'surrogateescape'? (the heart of this issue)
msg324888 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-09 14:25
PR 9126 makes distutils.log using "backslashreplace" instead of "unicode-escape" and simplifies the code (it is more efficient now, although the performance of logging is not critical).

"unicode-escape" escapes all non-ASCII characters, even encodable. It also escapes control characters like \t, \b, \r or \x1a (which starts control sequences for ANSI compatible terminals), this can be not desirable.
msg324889 - (view) Author: Julien Malard (julien.malard) * Date: 2018-09-09 14:46
Hello,

Thanks for the insights and better fixes. Regarding (1), do you have any pointers on how or where to fix pip? I have an inprogress pull request there (https://github.com/pypa/pip/pull/5712) to fix a related unicode error during installation and could perhaps combine both solutions.

Thanks!

-Julien
msg324907 - (view) Author: Jeremy Kloth (jkloth) * Date: 2018-09-10 02:57
For pip, in call_subprocess() (given here in rough pseudo-code)

is_python = (cmd[0] == sys.executable)
kwds = {}
if is_python:
    env['PYTHONIOENCODING'] = 'utf8'
    kwds['encoding'] = 'utf8'
proc = Popen(..., **kwds)
.
.
.
if stdout is not None:
    while True:
        line = proc.stdout.readline()
        # When running Python, the output is already Unicode
        if not is_python:
            line = console_to_str(line)
        if not line:
            break


Hopefully, there is enough context to figure out the exact placement.
msg324921 - (view) Author: Julien Malard (julien.malard) * Date: 2018-09-10 12:26
Thanks! Will give it a try and reference this conversation here as background.
msg326137 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-23 06:13
New changeset 4b860fd777e983f5d2a6bd1288e2b53099c6a803 by Serhiy Storchaka in branch 'master':
bpo-34421: Improve distutils logging for non-ASCII strings. (GH-9126)
https://github.com/python/cpython/commit/4b860fd777e983f5d2a6bd1288e2b53099c6a803
msg326143 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-23 07:31
New changeset c73df53569f86d0c7742bafa55958c53d57a02e4 by Serhiy Storchaka in branch '3.7':
bpo-34421: Improve distutils logging for non-ASCII strings. (GH-9126) (GH-9506)
https://github.com/python/cpython/commit/c73df53569f86d0c7742bafa55958c53d57a02e4
msg326145 - (view) Author: miss-islington (miss-islington) Date: 2018-09-23 07:54
New changeset 0b67995bfa45393585e2e0017c82c706c4a04b04 by Miss Islington (bot) in branch '3.6':
bpo-34421: Improve distutils logging for non-ASCII strings. (GH-9126) (GH-9506)
https://github.com/python/cpython/commit/0b67995bfa45393585e2e0017c82c706c4a04b04
History
Date User Action Args
2018-09-23 11:11:42serhiy.storchakasetstatus: open -> closed
stage: patch review -> resolved
2018-09-23 07:54:02miss-islingtonsetmessages: + msg326145
2018-09-23 07:32:07miss-islingtonsetpull_requests: + pull_request8916
2018-09-23 07:31:56serhiy.storchakasetmessages: + msg326143
2018-09-23 06:51:00serhiy.storchakasetpull_requests: + pull_request8913
2018-09-23 06:13:23miss-islingtonsetpull_requests: + pull_request8911
2018-09-23 06:13:14miss-islingtonsetstage: commit review -> patch review
pull_requests: + pull_request8910
2018-09-23 06:13:03serhiy.storchakasetmessages: + msg326137
2018-09-10 12:26:50julien.malardsetmessages: + msg324921
2018-09-10 02:57:57jklothsetmessages: + msg324907
2018-09-09 14:46:55julien.malardsetmessages: + msg324889
2018-09-09 14:25:50serhiy.storchakasetmessages: + msg324888
stage: patch review -> commit review
2018-09-09 14:15:47serhiy.storchakasetstage: commit review -> patch review
pull_requests: + pull_request8579
2018-09-09 09:47:56jklothsetnosy: + jkloth
messages: + msg324878
2018-09-09 07:06:29serhiy.storchakasetmessages: + msg324874
2018-09-08 20:53:02miss-islingtonsetmessages: + msg324863
2018-09-08 20:44:23miss-islingtonsetstatus: pending -> open
nosy: + miss-islington
messages: + msg324862

2018-09-08 20:35:11eric.araujosetstatus: open -> pending
versions: + Python 3.7, Python 3.8
resolution: fixed
assignee: eric.araujo
type: crash -> behavior
stage: patch review -> commit review
2018-09-08 20:32:13miss-islingtonsetpull_requests: + pull_request8571
2018-09-08 20:32:05miss-islingtonsetpull_requests: + pull_request8570
2018-09-08 20:31:29eric.araujosetmessages: + msg324861
2018-08-18 14:55:28julien.malardsetmessages: + msg323714
2018-08-18 10:32:21serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg323699
2018-08-17 17:10:09julien.malardsetkeywords: + patch
stage: patch review
pull_requests: + pull_request8274
2018-08-17 17:06:15julien.malardcreate