Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distutils command build_scripts fails with UnicodeDecodeError #54628

Closed
hagen mannequin opened this issue Nov 14, 2010 · 20 comments
Closed

distutils command build_scripts fails with UnicodeDecodeError #54628

hagen mannequin opened this issue Nov 14, 2010 · 20 comments
Assignees
Labels
release-blocker stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@hagen
Copy link
Mannequin

hagen mannequin commented Nov 14, 2010

BPO 10419
Nosy @malemburg, @birkenfeld, @vstinner, @benjaminp, @tarekziade, @merwok, @mgorny
Files
  • surrogateescape.patch: use surrogateescape for reading and writing script files
  • build_scripts-binary_mode.patch: Use binary mode for reading and writing script files
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/tarekziade'
    closed_at = <Date 2011-05-10.22:15:39.120>
    created_at = <Date 2010-11-14.20:32:31.235>
    labels = ['type-bug', 'library', 'release-blocker']
    title = 'distutils command build_scripts fails with UnicodeDecodeError'
    updated_at = <Date 2011-05-19.13:18:46.660>
    user = 'https://bugs.python.org/hagen'

    bugs.python.org fields:

    activity = <Date 2011-05-19.13:18:46.660>
    actor = 'python-dev'
    assignee = 'tarek'
    closed = True
    closed_date = <Date 2011-05-10.22:15:39.120>
    closer = 'python-dev'
    components = ['Distutils', 'Distutils2']
    creation = <Date 2010-11-14.20:32:31.235>
    creator = 'hagen'
    dependencies = []
    files = ['19607', '21821']
    hgrepos = []
    issue_num = 10419
    keywords = ['patch']
    message_count = 20.0
    messages = ['121207', '134630', '134661', '134678', '134680', '134681', '134773', '134894', '134934', '134936', '134937', '134971', '134972', '135374', '135749', '135752', '135754', '135756', '135786', '136289']
    nosy_count = 11.0
    nosy_names = ['lemburg', 'georg.brandl', 'vstinner', 'benjamin.peterson', 'tarek', 'eric.araujo', 'hagen', 'Arfrever', 'alexis', 'mgorny', 'python-dev']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue10419'
    versions = ['3rd party', 'Python 3.1', 'Python 2.7', 'Python 3.2', 'Python 3.3']

    @hagen
    Copy link
    Mannequin Author

    hagen mannequin commented Nov 14, 2010

    As suggested in bpo-9561, I'm creating a new bug for the encoding problem in build_scripts: If a script file can't be decoded with the (locale dependent) standard encoding, then "build_scripts" fails with UnicodeDecodeError. Reproducable e.g. with LANG=C and a script file containing non ASCII chars near the beginning (so that they're read on a single readline()).

    Attaching a patch that uses "surrogateescape", as proposed for bpo-6011.

    @hagen hagen mannequin added the type-crash A hard crash of the interpreter, possibly with a core dump label Nov 14, 2010
    @hagen hagen mannequin assigned tarekziade Nov 14, 2010
    @hagen hagen mannequin added the stdlib Python modules in the Lib dir label Nov 14, 2010
    @abalkin abalkin added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Feb 4, 2011
    @merwok
    Copy link
    Member

    merwok commented Apr 27, 2011

    I’m not sure how I feel about using surrogateescape. The distutils source is very similar across 2.7, 3.1, 3.2 and default, especially after the Great Revert and freeze last year to restore buggy-but-known behavior while the distutils2 project was created and allowed to fix things and break stuff. Haypo added a fix using surrogateescape in 3.2, so it couldn’t be backported to all stable branches. You may say that at least it was fixed in one version, which is something good. I don’t know if I’d prefer to apply the patch (if a test is provided) or to raise an exception instead of silently changing behavior.

    @malemburg
    Copy link
    Member

    Éric Araujo wrote:

    Éric Araujo <merwok@netwok.org> added the comment:

    I’m not sure how I feel about using surrogateescape. The distutils source is very similar across 2.7, 3.1, 3.2 and default, especially after the Great Revert and freeze last year to restore buggy-but-known behavior while the distutils2 project was created and allowed to fix things and break stuff. Haypo added a fix using surrogateescape in 3.2, so it couldn’t be backported to all stable branches. You may say that at least it was fixed in one version, which is something good. I don’t know if I’d prefer to apply the patch (if a test is provided) or to raise an exception instead of silently changing behavior.

    I think this patch should be applied to all 3.x versions, since
    all of them are affected by the same problem: reading a file with
    unknown encoding, adding a shebang and writing it back again.

    Python shouldn't really care about the script file's encoding and
    since the "surrogateescape" error handler is the only way to
    more or less cleanly get around the problem, I'm +1 on adding the
    patch to the 3.x series.

    I don't think this is needed for 2.7, since Python 2.x's open()
    doesn't care about the file encoding anyway.

    @malemburg malemburg changed the title distutils command build_scripts fails with UnicodeDecodeError distutils command build_scripts fails with UnicodeDecodeError Apr 28, 2011
    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Apr 28, 2011

    Alternatively it's possible to use binary mode. I'm attaching the patch, which shows this possibility.

    @Arfrever Arfrever mannequin changed the title distutils command build_scripts fails with UnicodeDecodeError distutils command build_scripts fails with UnicodeDecodeError Apr 28, 2011
    @merwok
    Copy link
    Member

    merwok commented Apr 28, 2011

    Was the patch tested in 2.7 only? I think the first_line_re needs to be changed to bytes too. (3.x would have disallowed mixing bytes and str for a regex.)

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Apr 28, 2011

    Which patch do you mean?
    (My patch already changes first_line_re to bytes. My patch was tested only with 3.2. Lib/distutils/command/build_scripts.py is currently identical in 3.1, 3.2 and 3.3.)

    @merwok
    Copy link
    Member

    merwok commented Apr 29, 2011

    Indeed, I missed those two lines.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Apr 30, 2011

    Apparently setuptools.command.easy_install.get_script_header() imports distutils.command.build_scripts.first_line_re and checks if this regex matches a str object, which results in TypeError. If breaking compatibility is not acceptable, then the surrogateescape patch should be applied.

    @vstinner
    Copy link
    Member

    vstinner commented May 1, 2011

    Hey, I had already this bug and I also wrote a patch: copy_script-2.patch attached to bpo-6011. It is very similar to build_scripts-binary_mode.patch (read the file in binary mode to avoid the encode/decode dance). But it checks also that the path to Python program is decodable from UTF-8 and from the script encoding.

    Éric Araujo doesn't want to apply copy_script-2.patch on Python 3 before distutils2 is ported to Python 3 and included into Python (3.3): read msg124648. Five months later: distutils2 is not yet included to Python 3, the patch is not commited yet, and we have now a duplicate issue (and 3 patches for a single bug) :-)

    This situation sucks. How can we move forward? What is the status of distutils2? Is it ported to Python3? Is it ready for an inclusion into Python3?

    When distutils2 will be part of Python 3.3, should we fix distutils bugs or not? I suppose that few people use Python 3.3, maybe because it will not be released before August 2012 (PEP-398) :-) So users will continue to have this bug until everybody moves to 3.3 (or later)...

    I think that we should fix this bug today. I don't really care of distutils2 today because it is not yet part of Python.

    @merwok
    Copy link
    Member

    merwok commented May 1, 2011

    Apparently setuptools.command.easy_install.get_script_header() imports
    distutils.command.build_scripts.first_line_re and checks if this regex
    matches a str object, which results in TypeError. If breaking
    compatibility is not acceptable, then the surrogateescape patch should
    be applied.

    Setuptools is not compatible with 3.x TTBOMK; distribute is, but could
    be fixed quickly, so there is no compat problem with this (these)
    library(ries). However, the public/private status of first_line_re is
    unclear, so there could be other projects out there depending on its
    type. Given that there is already one patch in distutils that uses
    surrogateescape, I think we could accept another similar patch.

    @merwok
    Copy link
    Member

    merwok commented May 1, 2011

    is not commited yet,

    and we have now a duplicate issue (and 3 patches for a single bug) :-)
    Feel free to close duplicate issues.

    Looks like you’re not following PyCon reports, or Tarek’s mails to
    python-dev. distutils2 has been ported to 3.3 under the name
    “packaging”; there is a repo on bitbucket (tarek/cpython) with this
    code. Tarek will produce a patch from this repo and push it to the main
    repository soon.

    Yes: we’ll fix bugs in packaging and distutils. Packaging releases will
    be backported for 2.4-3.2 under the name “distutils2”.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented May 2, 2011

    copy_script-2.patch uses os.fsencode(), which doesn't exist in Python 3.1.

    @vstinner
    Copy link
    Member

    vstinner commented May 2, 2011

    copy_script-2.patch uses os.fsencode(), which doesn't exist in Python 3.1.

    Correct, with Python 3.1, you can use filename.encode(sys.getfilesystemencoding(), 'surrogateescape'). But you must use os.fsencode() with Python >= 3.2 because on Windows, you cannot use surrogateescape with MBCS (you should use the strict error handler).

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented May 6, 2011

    Please commit any patch before releases of Python 3.1.4 and 3.2.1. (3.2.1 rc1 is planned on 2011-05-14.)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 10, 2011

    New changeset 6ad356525381 by Victor Stinner in branch 'default':
    Close bpo-10419, issue bpo-6011: build_scripts command of distutils handles correctly
    http://hg.python.org/cpython/rev/6ad356525381

    @python-dev python-dev mannequin closed this as completed May 10, 2011
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 10, 2011

    New changeset 47236a0cfb15 by Victor Stinner in branch '3.2':
    Close bpo-10419, issue bpo-6011: build_scripts command of distutils handles correctly
    http://hg.python.org/cpython/rev/47236a0cfb15

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 10, 2011

    New changeset fd7d4639dae2 by Victor Stinner in branch '3.1':
    Issue bpo-10419: Fix build_scripts command of distutils to handle correctly
    http://hg.python.org/cpython/rev/fd7d4639dae2

    @vstinner
    Copy link
    Member

    Issue fixed in Python 3.1, 3.2, 3.3.

    Thanks to Arfrever, I realized that this issue not only concerns the compilation of Python itself with a non-ASCII prefix (issue bpo-6011), but the installation of any Python script containing a non-ASCII character. So I also fixed it in Python 3.1. I replaced os.fsencode(name) by name.encode(sys.getfilesystemencoding(), 'surrogateescape') in 3.1.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented May 11, 2011

    I have committed the fix for Distribute:
    https://bitbucket.org/tarek/distribute/changeset/97f12f8f6bf1

    (However Distribute would fail to create entry points scripts if sys.executable contained unencodable characters.)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 19, 2011

    New changeset cc5cfeaa4a8d by Victor Stinner in branch 'default':
    Issue bpo-10419, issue bpo-6011: port 6ad356525381 fix from distutils to packaging
    http://hg.python.org/cpython/rev/cc5cfeaa4a8d

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    release-blocker stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants