msg121207 - (view) |
Author: Hagen Fürstenau (hagen) |
Date: 2010-11-14 20:32 |
As suggested in issue 9561, I'm creating a new bug for the encoding problem in build_scripts: If a script file can't be decoded with the (locale dependent) standard encoding, then "build_scripts" fails with UnicodeDecodeError. Reproducable e.g. with LANG=C and a script file containing non ASCII chars near the beginning (so that they're read on a single readline()).
Attaching a patch that uses "surrogateescape", as proposed for issue 6011.
|
msg134630 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-04-27 23:38 |
I’m not sure how I feel about using surrogateescape. The distutils source is very similar across 2.7, 3.1, 3.2 and default, especially after the Great Revert and freeze last year to restore buggy-but-known behavior while the distutils2 project was created and allowed to fix things and break stuff. Haypo added a fix using surrogateescape in 3.2, so it couldn’t be backported to all stable branches. You may say that at least it was fixed in one version, which is something good. I don’t know if I’d prefer to apply the patch (if a test is provided) or to raise an exception instead of silently changing behavior.
|
msg134661 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2011-04-28 08:20 |
Éric Araujo wrote:
>
> Éric Araujo <merwok@netwok.org> added the comment:
>
> I’m not sure how I feel about using surrogateescape. The distutils source is very similar across 2.7, 3.1, 3.2 and default, especially after the Great Revert and freeze last year to restore buggy-but-known behavior while the distutils2 project was created and allowed to fix things and break stuff. Haypo added a fix using surrogateescape in 3.2, so it couldn’t be backported to all stable branches. You may say that at least it was fixed in one version, which is something good. I don’t know if I’d prefer to apply the patch (if a test is provided) or to raise an exception instead of silently changing behavior.
I think this patch should be applied to all 3.x versions, since
all of them are affected by the same problem: reading a file with
unknown encoding, adding a shebang and writing it back again.
Python shouldn't really care about the script file's encoding and
since the "surrogateescape" error handler is the only way to
more or less cleanly get around the problem, I'm +1 on adding the
patch to the 3.x series.
I don't think this is needed for 2.7, since Python 2.x's open()
doesn't care about the file encoding anyway.
|
msg134678 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * |
Date: 2011-04-28 14:29 |
Alternatively it's possible to use binary mode. I'm attaching the patch, which shows this possibility.
|
msg134680 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-04-28 14:48 |
Was the patch tested in 2.7 only? I think the first_line_re needs to be changed to bytes too. (3.x would have disallowed mixing bytes and str for a regex.)
|
msg134681 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * |
Date: 2011-04-28 14:52 |
Which patch do you mean?
(My patch already changes first_line_re to bytes. My patch was tested only with 3.2. Lib/distutils/command/build_scripts.py is currently identical in 3.1, 3.2 and 3.3.)
|
msg134773 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-04-29 15:19 |
Indeed, I missed those two lines.
|
msg134894 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * |
Date: 2011-04-30 23:43 |
Apparently setuptools.command.easy_install.get_script_header() imports distutils.command.build_scripts.first_line_re and checks if this regex matches a str object, which results in TypeError. If breaking compatibility is not acceptable, then the surrogateescape patch should be applied.
|
msg134934 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-05-01 22:21 |
Hey, I had already this bug and I also wrote a patch: copy_script-2.patch attached to #6011. It is very similar to build_scripts-binary_mode.patch (read the file in binary mode to avoid the encode/decode dance). But it checks also that the path to Python program is decodable from UTF-8 and from the script encoding.
Éric Araujo doesn't want to apply copy_script-2.patch on Python 3 before distutils2 is ported to Python 3 and included into Python (3.3): read msg124648. Five months later: distutils2 is not yet included to Python 3, the patch is not commited yet, and we have now a duplicate issue (and 3 patches for a single bug) :-)
This situation sucks. How can we move forward? What is the status of distutils2? Is it ported to Python3? Is it ready for an inclusion into Python3?
When distutils2 will be part of Python 3.3, should we fix distutils bugs or not? I suppose that few people use Python 3.3, maybe because it will not be released before August 2012 (PEP 398) :-) So users will continue to have this bug until everybody moves to 3.3 (or later)...
I think that we should fix this bug today. I don't really care of distutils2 today because it is not yet part of Python.
|
msg134936 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-05-01 22:27 |
> Apparently setuptools.command.easy_install.get_script_header() imports
> distutils.command.build_scripts.first_line_re and checks if this regex
> matches a str object, which results in TypeError. If breaking
> compatibility is not acceptable, then the surrogateescape patch should
> be applied.
Setuptools is not compatible with 3.x TTBOMK; distribute is, but could
be fixed quickly, so there is no compat problem with this (these)
library(ries). However, the public/private status of first_line_re is
unclear, so there could be other projects out there depending on its
type. Given that there is already one patch in distutils that uses
surrogateescape, I think we could accept another similar patch.
|
msg134937 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-05-01 22:35 |
is not commited yet,
> and we have now a duplicate issue (and 3 patches for a single bug) :-)
Feel free to close duplicate issues.
Looks like you’re not following PyCon reports, or Tarek’s mails to
python-dev. distutils2 has been ported to 3.3 under the name
“packaging”; there is a repo on bitbucket (tarek/cpython) with this
code. Tarek will produce a patch from this repo and push it to the main
repository soon.
Yes: we’ll fix bugs in packaging and distutils. Packaging releases will
be backported for 2.4-3.2 under the name “distutils2”.
|
msg134971 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * |
Date: 2011-05-02 13:51 |
copy_script-2.patch uses os.fsencode(), which doesn't exist in Python 3.1.
|
msg134972 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-05-02 13:53 |
> copy_script-2.patch uses os.fsencode(), which doesn't exist in Python 3.1.
Correct, with Python 3.1, you can use filename.encode(sys.getfilesystemencoding(), 'surrogateescape'). But you must use os.fsencode() with Python >= 3.2 because on Windows, you cannot use surrogateescape with MBCS (you should use the strict error handler).
|
msg135374 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * |
Date: 2011-05-06 22:15 |
Please commit any patch before releases of Python 3.1.4 and 3.2.1. (3.2.1 rc1 is planned on 2011-05-14.)
|
msg135749 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-05-10 22:15 |
New changeset 6ad356525381 by Victor Stinner in branch 'default':
Close #10419, issue #6011: build_scripts command of distutils handles correctly
http://hg.python.org/cpython/rev/6ad356525381
|
msg135752 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-05-10 22:32 |
New changeset 47236a0cfb15 by Victor Stinner in branch '3.2':
Close #10419, issue #6011: build_scripts command of distutils handles correctly
http://hg.python.org/cpython/rev/47236a0cfb15
|
msg135754 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-05-10 22:59 |
New changeset fd7d4639dae2 by Victor Stinner in branch '3.1':
Issue #10419: Fix build_scripts command of distutils to handle correctly
http://hg.python.org/cpython/rev/fd7d4639dae2
|
msg135756 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-05-10 23:06 |
Issue fixed in Python 3.1, 3.2, 3.3.
Thanks to Arfrever, I realized that this issue not only concerns the compilation of Python itself with a non-ASCII prefix (issue #6011), but the installation of any Python script containing a non-ASCII character. So I also fixed it in Python 3.1. I replaced os.fsencode(name) by name.encode(sys.getfilesystemencoding(), 'surrogateescape') in 3.1.
|
msg135786 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * |
Date: 2011-05-11 17:10 |
I have committed the fix for Distribute:
https://bitbucket.org/tarek/distribute/changeset/97f12f8f6bf1
(However Distribute would fail to create entry points scripts if sys.executable contained unencodable characters.)
|
msg136289 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-05-19 13:18 |
New changeset cc5cfeaa4a8d by Victor Stinner in branch 'default':
Issue #10419, issue #6011: port 6ad356525381 fix from distutils to packaging
http://hg.python.org/cpython/rev/cc5cfeaa4a8d
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:08 | admin | set | github: 54628 |
2011-05-19 13:18:46 | python-dev | set | messages:
+ msg136289 |
2011-05-11 17:10:22 | Arfrever | set | messages:
+ msg135786 |
2011-05-10 23:06:21 | vstinner | set | messages:
+ msg135756 |
2011-05-10 22:59:44 | python-dev | set | messages:
+ msg135754 |
2011-05-10 22:32:15 | python-dev | set | messages:
+ msg135752 |
2011-05-10 22:15:39 | python-dev | set | status: open -> closed
nosy:
+ python-dev messages:
+ msg135749
resolution: fixed stage: resolved |
2011-05-07 09:49:32 | vstinner | set | priority: normal -> release blocker nosy:
+ benjamin.peterson, georg.brandl
|
2011-05-06 22:15:08 | Arfrever | set | messages:
+ msg135374 |
2011-05-02 13:53:21 | vstinner | set | messages:
+ msg134972 |
2011-05-02 13:51:24 | Arfrever | set | messages:
+ msg134971 |
2011-05-01 22:35:59 | eric.araujo | set | messages:
+ msg134937 |
2011-05-01 22:27:47 | eric.araujo | set | messages:
+ msg134936 |
2011-05-01 22:21:05 | vstinner | set | messages:
+ msg134934 |
2011-04-30 23:43:37 | Arfrever | set | messages:
+ msg134894 |
2011-04-29 15:19:19 | eric.araujo | set | messages:
+ msg134773 |
2011-04-28 14:52:40 | Arfrever | set | messages:
+ msg134681 |
2011-04-28 14:48:59 | eric.araujo | set | messages:
+ msg134680 |
2011-04-28 14:29:28 | Arfrever | set | files:
+ build_scripts-binary_mode.patch
messages:
+ msg134678 title: distutils command build_scripts fails with UnicodeDecodeError -> distutils command build_scripts fails with UnicodeDecodeError |
2011-04-28 08:20:33 | lemburg | set | nosy:
+ lemburg title: distutils command build_scripts fails with UnicodeDecodeError -> distutils command build_scripts fails with UnicodeDecodeError messages:
+ msg134661
|
2011-04-27 23:38:56 | eric.araujo | set | versions:
+ 3rd party, Python 2.7 nosy:
+ alexis
messages:
+ msg134630
components:
+ Distutils2 |
2011-04-27 17:16:50 | Arfrever | set | nosy:
+ vstinner, Arfrever
versions:
+ Python 3.3 |
2011-02-04 03:44:00 | belopolsky | set | nosy:
tarek, eric.araujo, hagen, mgorny type: crash -> behavior |
2010-11-15 08:13:31 | mgorny | set | nosy:
+ mgorny
|
2010-11-14 20:32:31 | hagen | create | |