classification
Title: ensurepip/venv broken on Windows if path includes unicode
Type: Stage: resolved
Components: Unicode, Windows Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: steve.dower Nosy List: Dima.Tisnek, Marcus.Smith, brett.cannon, dstufft, eric.snow, eryksun, ezio.melotti, jayvdb, ncoghlan, paul.moore, python-dev, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2015-11-28 09:56 by Dima.Tisnek, last changed 2016-09-10 00:35 by steve.dower. This issue is now closed.

Files
File name Uploaded Description Edit
issue25758_1.patch eryksun, 2015-11-28 15:56 review
Messages (6)
msg255534 - (view) Author: Dima Tisnek (Dima.Tisnek) * Date: 2015-11-28 09:56
One of my students installed Python 3.5 on Windows 10 to default location where user name "Łukasz" contains unicode.

Now "-m venv" and "-m ensurepip" do not work:

C:\Users\Łukasz>C:\Users\Łukasz\AppData\Local\Programs\Python\Python35-32\python.exe -m venv workshops
Error: Command '['C:\\Users\\\u0141ukasz\\workshops\\Scripts\\python.exe', '-Im', 'ensurepip', '--upgrade', '--default-pip']' returned non-zero exit status 1

C:\Users\Łukasz>C:\Users\Łukasz\AppData\Local\Programs\Python\Python35-32\python.exe -m ensurepip
Traceback (most recent call last):
  File "C:\Users\\u0141ukasz\AppData\Local\Programs\Python\Python35-32\lib\runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\\u0141ukasz\AppData\Local\Programs\Python\Python35-32\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\\u0141ukasz\AppData\Local\Programs\Python\Python35-32\lib\ensurepip\__main__.py", line 4, in <module>
    ensurepip._main()
  File "C:\Users\\u0141ukasz\AppData\Local\Programs\Python\Python35-32\lib\ensurepip\__init__.py", line 209, in _main
    default_pip=args.default_pip,
  File "C:\Users\\u0141ukasz\AppData\Local\Programs\Python\Python35-32\lib\ensurepip\__init__.py", line 116, in bootstrap
    _run_pip(args + [p[0] for p in _PROJECTS], additional_paths)
  File "C:\Users\\u0141ukasz\AppData\Local\Programs\Python\Python35-32\lib\ensurepip\__init__.py", line 40, in _run_pip
    import pip
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 954, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 896, in _find_spec
  File "<frozen importlib._bootstrap_external>", line 1136, in find_spec
  File "<frozen importlib._bootstrap_external>", line 1112, in _get_spec
  File "<frozen importlib._bootstrap_external>", line 1093, in _legacy_get_spec
  File "<frozen importlib._bootstrap>", line 444, in spec_from_loader
  File "<frozen importlib._bootstrap_external>", line 530, in spec_from_file_location
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character
msg255544 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-11-28 15:56
The problem is that the compile_source function in Modules/zipimport.c calls PyUnicode_EncodeFSDefault to get an encoded string to pass as the filename argument of Py_CompileString. On Windows this uses the ANSI codepage (i.e. 'mbcs'). Apparently your system's ANSI codepage doesn't map the "Ł" character. 

I reproduced the problem more simply by copying pip-7.1.2-py2.py3-none-any.whl to a subdirectory named "Łukasz"; adding the wheel path to sys.path; and attempting to execute "import pip". 

One solution is to replace Py_CompileString with Py_CompileStringObject. This way compile_source doesn't have to worry about encoding its pathname argument. A minimal patch is attached, but it needs a test.
msg274095 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-09-01 04:44
As a test case for handling non-ASCII characters in the name of the zipfile itself, I believe it should be sufficient to add 'Ł' to TEMP_DIR and TEMP_ZIP in https://hg.python.org/cpython/file/default/Lib/test/test_zipimport.py
msg274382 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-05 01:18
Given a quick read, it looks like issue27781 (PEP 529) will resolve this?

Not encoding the path at all is obviously better, but maybe I'll add this as supporting evidence to the PEP...
msg275511 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-10 00:34
New changeset 663a62bcf9c9 by Steve Dower in branch '3.5':
Issue #25758: Prevents zipimport from unnecessarily encoding a filename (patch by Eryk Sun)
https://hg.python.org/cpython/rev/663a62bcf9c9

New changeset ead30e7262d5 by Steve Dower in branch 'default':
Issue #25758: Prevents zipimport from unnecessarily encoding a filename (patch by Eryk Sun)
https://hg.python.org/cpython/rev/ead30e7262d5
msg275512 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-10 00:35
Applied Eryk's patch and updated the test to repro it (though it wouldn't repro on 3.6 with PEP 529 applied, but it definitely did on 3.5).
History
Date User Action Args
2016-09-10 00:35:46steve.dowersetstatus: open -> closed
messages: + msg275512

assignee: steve.dower
resolution: fixed
stage: resolved
2016-09-10 00:34:06python-devsetnosy: + python-dev
messages: + msg275511
2016-09-05 01:18:39steve.dowersetmessages: + msg274382
2016-09-01 04:44:34ncoghlansetmessages: + msg274095
2016-09-01 01:40:33jayvdbsetnosy: + jayvdb
2015-11-28 15:56:25eryksunsetfiles: + issue25758_1.patch
versions: + Python 3.6
nosy: + eryksun

messages: + msg255544

keywords: + patch
2015-11-28 11:15:36SilentGhostsetnosy: + eric.snow, brett.cannon, tim.golden, zach.ware, steve.dower
components: + Windows
2015-11-28 10:23:33lacsetnosy: + paul.moore, ncoghlan, dstufft, Marcus.Smith
2015-11-28 09:56:53Dima.Tisnekcreate