This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gmljosea
Recipients dstufft, eric.araujo, gmljosea
Date 2014-11-17.02:03:18
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1416189804.36.0.639606237849.issue22887@psf.upfronthosting.co.za>
In-reply-to
Content
Summary:

Python 3.4's venv works fine in Windows, and pip works fine when installing both pure Python libraries and extension modules. However, when the virtual environment is under a path with non-ASCII characters, attempting to install a package that specifies console_scripts or scripts (like pip or mutagen, respectivelly), it fails with encoding errors.

I looked around the Internet for a solution but the best I could find was Issue #10419, which is over 3 years old and is marked as resolved, and couldn't find any other open issue about this.

Details of my case:

I created a Python 3.4 (32-bit) virtualenv via Python Tools for Visual Studio, on windows 8.1 (64-bit), in a folder that is under my home directory (C:\Users\José Alberto\), which happens to contain an accented character, using the latest Python you can download from the homepage.

Via Powershell I activated the virtualenv and tried to execute pip install mutagen (https://pypi.python.org/pypi/mutagen, it is relevant because it specifies scripts in its setup.py). The installation failed with the following error:

Downloading/unpacking mutagen
  Running setup.py (path:C:\Users\José Alberto\Documents\podtimizer\env_podtimizer\build\mutagen\setup.py) egg_info for
package mutagen

Installing collected packages: mutagen
  Running setup.py install for mutagen
    Traceback (most recent call last):
      File "C:\Python34\lib\distutils\command\build_scripts.py", line 114, in copy_scripts
        shebang.decode('utf-8')
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 14: invalid continuation byte

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\JosÚ Alberto\Documents\podtimizer\env_podtimizer\build\mutagen\setup.py", line 277, in <module>
        """
      File "C:\Python34\lib\distutils\core.py", line 148, in setup
        dist.run_commands()
      File "C:\Python34\lib\distutils\dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
        cmd_obj.run()
      File "C:\Users\JosÚ Alberto\Documents\podtimizer\env_podtimizer\lib\site-packages\setuptools-6.0.2-py3.4.egg\setuptools\command\install.py", line 61, in run
      File "C:\Python34\lib\distutils\command\install.py", line 539, in run
        self.run_command('build')
      File "C:\Python34\lib\distutils\cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
        cmd_obj.run()
      File "C:\Python34\lib\distutils\command\build.py", line 126, in run
        self.run_command(cmd_name)
      File "C:\Python34\lib\distutils\cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
        cmd_obj.run()
      File "C:\Python34\lib\distutils\command\build_scripts.py", line 50, in run
        self.copy_scripts()
      File "C:\Python34\lib\distutils\command\build_scripts.py", line 118, in copy_scripts
        "from utf-8".format(shebang))
    ValueError: The shebang (b'#!C:\\Users\\Jos\xe9 Alberto\\Documents\\podtimizer\\env_podtimizer\\Scripts\\python.exe\n') is not decodable from utf-8

I looked around the Internet for a solution, but the best I could find was the Issue #10419, which is over 3 years old and is marked as closed and resolved. The last comment mentions a fix that was commited to Distribute around that time, with the caveat that entry points script creation would fail if the path contained unencodeable characters (which sounds exactly like the problem I'm having). I Couldn't find an open issue to follow up on this.

I went to the source of the error, around Lib/distutils/command/build_scripts.py:106. Since this is Windows, the result of os.fsencode() uses the encoding 'mbcs' (as reported by Python), then it tries to decode it back using utf-8, and it blows up:

>>> import os
>>> os.fsencode('C:\\Users\\José Alberto\\')
b'C:\\Users\\Jos\xe9 Alberto\\'
>>> 'C:\\Users\\José Alberto\\'.encode('utf-8')
b'C:\\Users\\Jos\xc3\xa9 Alberto\\'

I commented both try..except after the os.fsencode and it worked, but commenting random code whose purpose I don't fully understand doesn't seem like a good strategy.

While testing for the above, I found I couldn't finish installing pip successfully on a virtualenv using just the Python installed from python.org.

On Powershell I created several virtualenvs using C:\Python34\python.exe -m venv. The envs were created successfully, but the pip's console_scripts installation failed silently. I could still run python -m pip and install packages, but the pip.exe files were not created.

I removed pip from the environment's site-packages directory and tried to reinstall it via python -m ensurepip, but instead got the following error:

Installing collected packages: pip
Cleaning up...
  Removing temporary dir C:\Users\José Alberto\test_env3\build...
Exception:
Traceback (most recent call last):
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\_vendor\distlib\scripts.py", line 124, in _get_shebang
    shebang.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 15: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\basecommand.py", line 122, in main
    status = self.run(options, args)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\commands\install.py", line 283, in run
    requirement_set.install(install_options, global_options, root=options.root_path)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\req.py", line 1435, in install
    requirement.install(install_options, global_options, *args, **kwargs)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\req.py", line 671, in install
    self.move_wheel_files(self.source_dir, root=root)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\req.py", line 901, in move_wheel_files
    pycompile=self.pycompile,
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\wheel.py", line 325, in move_wheel_files
    generated.extend(maker.make(spec))
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\_vendor\distlib\scripts.py", line 311, in make
    self._make_script(entry, filenames, options=options)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\_vendor\distlib\scripts.py", line 201, in _make_script
    shebang = self._get_shebang('utf-8', options=options)
  File "C:\Users\JOSALB~1\AppData\Local\Temp\tmpax00n0z5\pip-1.5.6-py2.py3-none-any.whl\pip\_vendor\distlib\scripts.py", line 127, in _get_shebang
    'The shebang (%r) is not decodable from utf-8' % shebang)
ValueError: The shebang (b'#!"C:\\Users\\Jos\xe9 Alberto\\test_env3\\Scripts\\python.exe"\n') is not decodable from utf-8

Which is exactly the same issue I was running into with build_scripts, but this time in a similar code within ensurepip's pip wheel. This time I tried again to comment the utf-8 encoding checks, and although ensurepip now finished successfully, the executables failed with "Couldn't create process". This is as far as I could go within my very limited understanding of encoding issues and pip, so I decided to write this issue.

Is it possible to fix this? Is there something I can do to help?
History
Date User Action Args
2014-11-17 02:03:26gmljoseasetrecipients: + gmljosea, eric.araujo, dstufft
2014-11-17 02:03:24gmljoseasetmessageid: <1416189804.36.0.639606237849.issue22887@psf.upfronthosting.co.za>
2014-11-17 02:03:24gmljosealinkissue22887 messages
2014-11-17 02:03:18gmljoseacreate