New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distutils command build_scripts fails with UnicodeDecodeError #54628
Comments
As suggested in bpo-9561, I'm creating a new bug for the encoding problem in build_scripts: If a script file can't be decoded with the (locale dependent) standard encoding, then "build_scripts" fails with UnicodeDecodeError. Reproducable e.g. with LANG=C and a script file containing non ASCII chars near the beginning (so that they're read on a single readline()). Attaching a patch that uses "surrogateescape", as proposed for bpo-6011. |
I’m not sure how I feel about using surrogateescape. The distutils source is very similar across 2.7, 3.1, 3.2 and default, especially after the Great Revert and freeze last year to restore buggy-but-known behavior while the distutils2 project was created and allowed to fix things and break stuff. Haypo added a fix using surrogateescape in 3.2, so it couldn’t be backported to all stable branches. You may say that at least it was fixed in one version, which is something good. I don’t know if I’d prefer to apply the patch (if a test is provided) or to raise an exception instead of silently changing behavior. |
Éric Araujo wrote:
I think this patch should be applied to all 3.x versions, since Python shouldn't really care about the script file's encoding and I don't think this is needed for 2.7, since Python 2.x's open() |
Alternatively it's possible to use binary mode. I'm attaching the patch, which shows this possibility. |
Was the patch tested in 2.7 only? I think the first_line_re needs to be changed to bytes too. (3.x would have disallowed mixing bytes and str for a regex.) |
Which patch do you mean? |
Indeed, I missed those two lines. |
Apparently setuptools.command.easy_install.get_script_header() imports distutils.command.build_scripts.first_line_re and checks if this regex matches a str object, which results in TypeError. If breaking compatibility is not acceptable, then the surrogateescape patch should be applied. |
Hey, I had already this bug and I also wrote a patch: copy_script-2.patch attached to bpo-6011. It is very similar to build_scripts-binary_mode.patch (read the file in binary mode to avoid the encode/decode dance). But it checks also that the path to Python program is decodable from UTF-8 and from the script encoding. Éric Araujo doesn't want to apply copy_script-2.patch on Python 3 before distutils2 is ported to Python 3 and included into Python (3.3): read msg124648. Five months later: distutils2 is not yet included to Python 3, the patch is not commited yet, and we have now a duplicate issue (and 3 patches for a single bug) :-) This situation sucks. How can we move forward? What is the status of distutils2? Is it ported to Python3? Is it ready for an inclusion into Python3? When distutils2 will be part of Python 3.3, should we fix distutils bugs or not? I suppose that few people use Python 3.3, maybe because it will not be released before August 2012 (PEP-398) :-) So users will continue to have this bug until everybody moves to 3.3 (or later)... I think that we should fix this bug today. I don't really care of distutils2 today because it is not yet part of Python. |
Setuptools is not compatible with 3.x TTBOMK; distribute is, but could |
is not commited yet,
Looks like you’re not following PyCon reports, or Tarek’s mails to Yes: we’ll fix bugs in packaging and distutils. Packaging releases will |
copy_script-2.patch uses os.fsencode(), which doesn't exist in Python 3.1. |
Correct, with Python 3.1, you can use filename.encode(sys.getfilesystemencoding(), 'surrogateescape'). But you must use os.fsencode() with Python >= 3.2 because on Windows, you cannot use surrogateescape with MBCS (you should use the strict error handler). |
Please commit any patch before releases of Python 3.1.4 and 3.2.1. (3.2.1 rc1 is planned on 2011-05-14.) |
New changeset 6ad356525381 by Victor Stinner in branch 'default': |
New changeset 47236a0cfb15 by Victor Stinner in branch '3.2': |
New changeset fd7d4639dae2 by Victor Stinner in branch '3.1': |
Issue fixed in Python 3.1, 3.2, 3.3. Thanks to Arfrever, I realized that this issue not only concerns the compilation of Python itself with a non-ASCII prefix (issue bpo-6011), but the installation of any Python script containing a non-ASCII character. So I also fixed it in Python 3.1. I replaced os.fsencode(name) by name.encode(sys.getfilesystemencoding(), 'surrogateescape') in 3.1. |
I have committed the fix for Distribute: (However Distribute would fail to create entry points scripts if sys.executable contained unencodable characters.) |
New changeset cc5cfeaa4a8d by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: