classification
Title: ensurepip and distutils' build_scripts fails on Windows when path to Python contains accented characters
Type: behavior Stage: resolved
Components: Distutils, Library (Lib) Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: dstufft, eric.araujo, eryksun, gmljosea, jayvdb, pitrou, steve.dower
Priority: normal Keywords:

Created on 2014-11-17 02:03 by gmljosea, last changed 2021-02-03 18:21 by steve.dower. This issue is now closed.

Messages (4)
msg231263 - (view) Author: José Alberto Goncalves (gmljosea) Date: 2014-11-17 02:03
Summary:

Python 3.4's venv works fine in Windows, and pip works fine when installing both pure Python libraries and extension modules. However, when the virtual environment is under a path with non-ASCII characters, attempting to install a package that specifies console_scripts or scripts (like pip or mutagen, respectivelly), it fails with encoding errors.

I looked around the Internet for a solution but the best I could find was Issue #10419, which is over 3 years old and is marked as resolved, and couldn't find any other open issue about this.

Details of my case:

I created a Python 3.4 (32-bit) virtualenv via Python Tools for Visual Studio, on windows 8.1 (64-bit), in a folder that is under my home directory (C:\Users\José Alberto\), which happens to contain an accented character, using the latest Python you can download from the homepage.

Via Powershell I activated the virtualenv and tried to execute pip install mutagen (https://pypi.python.org/pypi/mutagen, it is relevant because it specifies scripts in its setup.py). The installation failed with the following error:

Downloading/unpacking mutagen
  Running setup.py (path:C:\Users\José Alberto\Documents\podtimizer\env_podtimizer\build\mutagen\setup.py) egg_info for
package mutagen

Installing collected packages: mutagen
  Running setup.py install for mutagen
    Traceback (most recent call last):
      File "C:\Python34\lib\distutils\command\build_scripts.py", line 114, in copy_scripts
        shebang.decode('utf-8')
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 14: invalid continuation byte

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\JosÚ Alberto\Documents\podtimizer\env_podtimizer\build\mutagen\setup.py", line 277, in <module>
        """
      File "C:\Python34\lib\distutils\core.py", line 148, in setup
        dist.run_commands()
      File "C:\Python34\lib\distutils\dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
        cmd_obj.run()
      File "C:\Users\JosÚ Alberto\Documents\podtimizer\env_podtimizer\lib\site-packages\setuptools-6.0.2-py3.4.egg\setuptools\command\install.py", line 61, in run
      File "C:\Python34\lib\distutils\command\install.py", line 539, in run
        self.run_command('build')
      File "C:\Python34\lib\distutils\cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
        cmd_obj.run()
      File "C:\Python34\lib\distutils\command\build.py", line 126, in run
        self.run_command(cmd_name)
      File "C:\Python34\lib\distutils\cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
        cmd_obj.run()
      File "C:\Python34\lib\distutils\command\build_scripts.py", line 50, in run
        self.copy_scripts()
      File "C:\Python34\lib\distutils\command\build_scripts.py", line 118, in copy_scripts
        "from utf-8".format(shebang))
    ValueError: The shebang (b'#!C:\\Users\\Jos\xe9 Alberto\\Documents\\podtimizer\\env_podtimizer\\Scripts\\python.exe\n') is not decodable from utf-8

I looked around the Internet for a solution, but the best I could find was the Issue #10419, which is over 3 years old and is marked as closed and resolved. The last comment mentions a fix that was commited to Distribute around that time, with the caveat that entry points script creation would fail if the path contained unencodeable characters (which sounds exactly like the problem I'm having). I Couldn't find an open issue to follow up on this.

I went to the source of the error, around Lib/distutils/command/build_scripts.py:106. Since this is Windows, the result of os.fsencode() uses the encoding 'mbcs' (as reported by Python), then it tries to decode it back using utf-8, and it blows up:

>>> import os
>>> os.fsencode('C:\\Users\\José Alberto\\')
b'C:\\Users\\Jos\xe9 Alberto\\'
>>> 'C:\\Users\\José Alberto\\'.encode('utf-8')
b'C:\\Users\\Jos\xc3\xa9 Alberto\\'

I commented both try..except after the os.fsencode and it worked, but commenting random code whose purpose I don't fully understand doesn't seem like a good strategy.

While testing for the above, I found I couldn't finish installing pip successfully on a virtualenv using just the Python installed from python.org.

On Powershell I created several virtualenvs using C:\Python34\python.exe -m venv. The envs were created successfully, but the pip's console_scripts installation failed silently. I could still run python -m pip and install packages, but the pip.exe files were not created.

I removed pip from the environment's site-packages directory and tried to reinstall it via python -m ensurepip, but instead got the following error:

Installing collected packages: pip
Cleaning up...
  Removing temporary dir C:\Users\José Alberto\test_env3\build...
Exception:
Traceback (most recent call last):
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\_vendor\distlib\scripts.py", line 124, in _get_shebang
    shebang.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 15: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\basecommand.py", line 122, in main
    status = self.run(options, args)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\commands\install.py", line 283, in run
    requirement_set.install(install_options, global_options, root=options.root_path)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\req.py", line 1435, in install
    requirement.install(install_options, global_options, *args, **kwargs)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\req.py", line 671, in install
    self.move_wheel_files(self.source_dir, root=root)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\req.py", line 901, in move_wheel_files
    pycompile=self.pycompile,
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\wheel.py", line 325, in move_wheel_files
    generated.extend(maker.make(spec))
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\_vendor\distlib\scripts.py", line 311, in make
    self._make_script(entry, filenames, options=options)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\_vendor\distlib\scripts.py", line 201, in _make_script
    shebang = self._get_shebang('utf-8', options=options)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\_vendor\distlib\scripts.py", line 127, in _get_shebang
    'The shebang (%r) is not decodable from utf-8' % shebang)
ValueError: The shebang (b'#!"C:\\Users\\Jos\xe9 Alberto\\test_env3\\Scripts\\python.exe"\n') is not decodable from utf-8

Which is exactly the same issue I was running into with build_scripts, but this time in a similar code within ensurepip's pip wheel. This time I tried again to comment the utf-8 encoding checks, and although ensurepip now finished successfully, the executables failed with "Couldn't create process". This is as far as I could go within my very limited understanding of encoding issues and pip, so I decided to write this issue.

Is it possible to fix this? Is there something I can do to help?
msg231267 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2014-11-17 06:51
On Windows, shouldn't copy_scripts use UTF-8 instead of os.fsencode (MBCS)? The Python launcher executes the shebang line on Windows, and it defaults to UTF-8 if a script doesn't have a BOM. See line 1105 in maybe_handle_shebang:

https://hg.python.org/cpython/file/ab2c023a9432/PC/launcher.c#l1064
msg231302 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-11-17 21:56
> On Windows, shouldn't copy_scripts use UTF-8 instead of os.fsencode
> (MBCS)? The Python launcher executes the shebang line on Windows, and 
> it defaults to UTF-8 if a script doesn't have a BOM.

Good catch! It seems you're right. Do you want to provide a patch + tests?
msg386364 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-02-03 18:21
Distutils is now deprecated (see PEP 632) and all tagged issues are being closed. From now until removal, only release blocking issues will be considered for distutils.

If this issue does not relate to distutils, please remove the component and reopen it. If you believe it still requires a fix, most likely the issue should be re-reported at https://github.com/pypa/setuptools
History
Date User Action Args
2021-02-03 18:21:31steve.dowersetstatus: open -> closed

nosy: + steve.dower
messages: + msg386364

resolution: out of date
stage: needs patch -> resolved
2016-09-01 01:40:37jayvdbsetnosy: + jayvdb
2014-11-17 21:56:39pitrousetstage: needs patch
versions: + Python 3.5
2014-11-17 21:56:31pitrousetnosy: + pitrou
messages: + msg231302
2014-11-17 06:51:20eryksunsetnosy: + eryksun
messages: + msg231267
2014-11-17 02:19:34ezio.melottilinkissue22887 superseder
2014-11-17 02:03:25gmljoseacreate