classification
Title: python doesn't build if prefix contains non-ascii characters
Type: behavior Stage:
Components: Distutils Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: 9841 Superseder:
Assigned To: haypo Nosy List: dstufft, haypo, merwok, nils, tarek, zegreek
Priority: normal Keywords: patch

Created on 2009-05-13 11:01 by zegreek, last changed 2014-03-24 21:44 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
tb.txt zegreek, 2009-05-13 11:01 configure line and traceback
text_file.diff zegreek, 2009-05-13 11:03 patch to distutils.text_file.py
textio.diff zegreek, 2009-05-14 21:16 patch to Modules/_io/textio.c (alternative to the first)
distutils_makefile_encoding.patch haypo, 2010-09-10 23:32 review
copy_script.patch haypo, 2010-11-09 00:59 review
copy_script-2.patch haypo, 2010-12-25 22:16 review
test_distutils_surrogateescape.diff zegreek, 2011-05-13 13:11 review
Messages (47)
msg87674 - (view) Author: Baptiste Carvello (zegreek) Date: 2009-05-13 11:01
I have tried to build python (version 3.1 beta 1) on linux and install
it to a non-standard prefix which contains non-ascii utf-8 characters
(my locale being utf-8). The build directory's path is ascii-only. The
exact configure line is given in the attached file 'tb.txt'.

Then the 'make' command fails at the stage where python extensions are
built, with the traceback displayed in file tb.txt (in short:
UnicodeDecodeError: 'ascii' codec can't decode byte ... ).

The problem is triggered when 'distutils.sysconfig.get_config_vars'
tries to parse the Makefile. The Makefile is opened with
'distutils.text_file.TextFile', which in turns calls 'io.open' with no
'encoding' parameter. At this stage of the build, the 'locale' module is
not available (due to '_collections' not being), so that
'locale.getprefferedencoding' cannot be called and the encoding falls
back to ascii (a quick look to 'Modules/_io/textio.c' suggests that this
fallback mechanism is already designed for being used at build time).

The solution I propose would be to use 'sys.getfilesystemencoding' as a
fallback first, as it is defined during build time on most systems:
windows, mac and on posix if 'CODESET' exists in 'langinfo.h'. Given
that in build routines, non-ascii characters are only likely to be
encountered in filesystem paths, this seems a reasonable behavior.

The attached patch 'text_file.diff' implements this strategy in
'distutils.text_file', and then calls 'io.open' with the appropriate
'encoding' parameter. It could be argued, however, that this new
fallback is of general interest and should be implemented directly in
'Modules/_io/textio.c'. If you deem so, I could try to come up with a
new patch.

The attached patch solves the problem on my system, and does not
introduce test failures (which is expected, as the new fallback should
only make a difference at build time).

Cheers,
Baptiste
msg87675 - (view) Author: Baptiste Carvello (zegreek) Date: 2009-05-13 11:03
And here comes the patch
msg87766 - (view) Author: Baptiste Carvello (zegreek) Date: 2009-05-14 21:16
OK, here is also the patch to 'Modules/_io/textio.c', as it is in fact
quite trivial. Choose which one you prefer :-)

Baptiste
msg87772 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2009-05-14 21:58
Thanks, I'll work on this during this week end hopefully
msg114568 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-08-21 21:42
This issue may be related to #9561.
msg114571 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-08-21 21:46
Change _io.TextIOWrapper() heuristic to choose the encoding is a bad idea.
msg116060 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-09-10 23:32
New patch:
 - add encoding option to TextFile constructor
 - parse_makefile() uses the heuristic from text_file.diff

Note: sys.getfilesystemencoding() is always set in Python 3.2 (but it may be None in Python 2.x and Python < 3.2).
msg116087 - (view) Author: Baptiste Carvello (zegreek) Date: 2010-09-11 10:21
Hello,

I just tried your patch on latest svn (r84707), but I found out that the problem I reported can no more be reproduced. First, '_locale' seems now to be built earlier. Also, a fallback has been introduced in 'locale.getpreferredencoding'. When '_locale' cannot be imported, the encoding is now parsed from the environment variables (cf 'Lib/locale.py', line 558 and below). It looks like 'locale.getpreferredencoding' is now no more likely to fail than 'sys.getfilesystemencoding'. So I'm not sure if a patch is still needed at all.

In case a patch still makes sense, pay attention that there now also is a 'Lib/sysconfig.py', which also has a '_parse_makefile' function. This function uses a logic similar to the one in 'Lib/distutils/sysconfig.py', even thought it uses the builtin 'open', so it would need the same fix, if one is needed.

Cheers, B.
msg116263 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-09-13 01:01
Victor: I would like #9841 to be solved first so that we’re sure what file(s) need to be patched (and where to add a test for parsing).

Baptiste: Can you try to reproduce your bug with 2.7 and 3.1?

Tarek: I can take this over if you want.
msg116297 - (view) Author: Baptiste Carvello (zegreek) Date: 2010-09-13 12:17
Eric: the bug does not exist with 2.7, as the Makefile is read as bytes. 
It exists with 3.1.2.

By the way, when I say the bug is solved on 3.2, I only mean the narrow 
problem of using a
non-ascii prefix that *is* decodable with the current locale. I do not 
mean the more general
problem that arises, for example, when building with the 'C' locale, as 
is discussed in issue9561.
With a 'C' locale, the build fails also with 3.2.
msg116390 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-09-14 12:02
For non-ascii directory name but ascii locale (eg. C locale), we have 3 choices:
 a- read Makefile as a binary file
 b- use the PEP 383
 c- refuse to compile

(a) doesn't seem easy because it looks like distutils use the unicode type for all paths. (b) supposes to patch distutils to ensure that reading (and writing?) Makefile uses errors='surrogateescape'.

About (c), it can be a temporary solution. But I also think that non-ascii directory name and ascii locale encoding is a rare use case.
msg116391 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-09-14 12:04
Warning: "use the PEP 383" may impact other distutils component because the path may be written into to other files, which mean that we have to use errors='surrogateescape' for these files too.
msg119074 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-10-18 20:19
+1 for distutils_makefile_encoding.patch.  The doc is not updated, because it does not exist, so that’s okay; tests for the new behavior are missing.
msg119102 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-19 02:03
I'm not sure that the patch is still needed on Python 3.2.
msg119353 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-10-21 23:43
I think it is:

$ pwd
/tmp/éric
$ LC_ALL=C ./python
Fatal Python error: Py_Initialize: Unable to get the locale encoding
SystemError: NULL result without error in PyObject_Call
Abandon
msg119368 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-22 08:51
> $ LC_ALL=C ./python
> Fatal Python error: Py_Initialize: Unable to get the locale encoding
> SystemError: NULL result without error in PyObject_Call
> Abandon

What is your Python version? I fixed Python 3.2, but I don't plan to fix Python 3.1 for this problem (Python doesn't work if it is installed in a non-ascii directory).
msg119383 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-10-22 14:23
I’m trying to build py3k on posix in a subdir called sep-build-dir-éric, with locale set to C.  I get these errors:
gcc [...] -DSVNVERSION="\"`LC_ALL=C svnversion ..`\"" -o Modules/getbuildinfo.o ../Modules/getbuildinfo.c
svn: Error converting entry in directory '..' to UTF-8
svn: Can't convert string from native encoding to 'UTF-8':
svn: sep-build-dir-?\195?\169ric
[...]
gcc -pthread   -Xlinker -export-dynamic -o python Modules/python.o libpython3.2m.a -lpthread -ldl  -lutil   -lm  
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
SystemError: NULL result without error in PyObject_Call
Aborted
make: *** [sharedmods] Error 134
msg119389 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-10-22 15:12
Building in the same directory works.
msg119409 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-23 00:13
> I’m trying to build py3k on posix in a subdir called
> sep-build-dir-éric, with locale set to C.

Ah yes, this particular use case doesn't work: r85800 should fix it. Please retry.
msg119420 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-10-23 08:30
Same errors.
msg119449 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-23 17:04
> Same errors.

Please describe exactly how you reproduced the error (write each command).

r85805 fixes another bug related to this problem. Is it a better fix than distutils_makefile_encoding.patch: use surrogateescape error handler to decode the Makefile file.
msg119847 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-10-29 03:37
I can’t reliably reproduce it, but here you go:

$ pwd
/home/wok/python/3.2/sep-build-dir-éric♥
$ ../configure --prefix $PWD
[okay]
$ make
[snip gcc and ar]
ranlib libpython3.2m.a
gcc -pthread   -Xlinker -export-dynamic -o python Modules/python.o libpython3.2m.a -lpthread -ldl  -lutil   -lm  
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
SystemError: NULL result without error in PyObject_Call
Aborted
make: *** [sharedmods] Error 134

Setting PYTHONHOME to ., .:., $PWD or $PWD:$PWD did not help.
msg119848 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-10-29 03:39
Of course, first export LANG and LC_ALL to C.
msg119891 - (view) Author: Baptiste Carvello (zegreek) Date: 2010-10-29 12:17
Hello,

I can reproduce the exact same error as Éric. The end of the output is a 
little bit more informative here:

Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
SystemError: NULL result without error in PyObject_Call
/bin/sh: line 1: 13513 Aborted                 CC='gcc -pthread' 
LDSHARED='gcc -pthread -shared  ' OPT='-DNDEBUG -g -fwrapv -O3 -Wall 
-Wstrict-prototypes' ./python -E ./setup.py build
make: *** [sharedmods] Error 134

I only kept the part of the log that is output if I rerun "make". The 
number 13513 must be a PID number. It is not stable across invocations.

Running just "./python -E ./setup.py build", or just "./python setup.py 
build" does also abort with

Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
SystemError: NULL result without error in PyObject_Call
Aborted

The instructions to reproduce, used in my home directory (which has a 
ASCII path), are:

svn co http://svn.python.org/projects/python/branches/py3k
cd py3k/
export LC_ALL=C
export LANG=C
./configure --prefix=/home/baptiste/Desktop/Téléchargements/PyInstall
make
make
./python -E ./setup.py build
./python setup.py build

The svn revision pulled was 85926.

Hope this helps,

Baptiste
msg119894 - (view) Author: Baptiste Carvello (zegreek) Date: 2010-10-29 12:52
A little bit more information:

the error message comes from Python/pythonrun.c, line 736, in function 
initfsencoding.

This part of the code is protected with a preprocessor #if:

#if defined(HAVE_LANGINFO_H) && defined(CODESET)

so I tried replacing that with #if 0. However, the function then fails 
on line 750. The comment on line 749 states:

/* Such error can only occurs in critical situations: no more
 * memory, import a module of the standard library failed,
 * etc. */

It looks like it is not the case, and that Py_FileSystemDefaultEncoding 
has no reasonable default when "python" is called from the build 
directory with the "C" locale.

For the record, when running the system "python" with the "C" locale, 
the filesystemencoding is gets set to 'ANSI_X3.4-1968'.

Last thing, in case it is of any use, all my testing is done on an amd64 
Debian stable system.

Cheers,
Baptiste
msg120178 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-11-01 21:22
I reproduced the problem with the following commands:

cd py3k
export LANG=C
export LC_ALL=C
make distclean
./configure --with-pydebug --prefix=/home/haypo/tmp/py3ké
make

It looks like the problem is that srcdir environment variable of Makefile.pre is ".". In this case, VPATH environment variable is not set and so calculate_path() fails to retrieve the source code directory.

configure script contains a strange comment (whereas i cannot find "VPATH" in configure.in):

# VPATH may cause trouble with some makes, so we remove sole $(srcdir),
# ${srcdir} and @srcdir@ entries from VPATH if srcdir is ".", strip leading and
# trailing colons and then remove the whole line if VPATH becomes empty
# (actually we leave an empty line to preserve line numbers).
msg120818 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-11-08 23:07
> Fatal Python error: Py_Initialize: Unable to get the locale encoding
> SystemError: NULL result without error in PyObject_Call

Gotcha! r86341 fixes PyUnicode_EncodeFS(): raise an error if _Py_char2wchar_() fails.

The real problem is that PREFIX is not decoded using _Py_charw2char(), but using a C hack: >L"" PREFIX<. It should use _Py_charw2char() as I did for VPATH in r85800.
msg120821 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-11-08 23:25
I should have worn my Stupid Hat today, because I’ve made stupid commits and emails.  I hadn’t updated my checkout when I made my test.  The bad news is that I still have a bug after svn up:

gcc -pthread   -Xlinker -export-dynamic -o python Modules/python.o libpython3.2dmu.a -lpthread -ldl  -lutil   -lm  
Fatal Python error: Py_Initialize: Unable to get the locale encoding
UnicodeEncodeError: 'filesystemencoding' codec can't encode character '\xe9' in position 35: Invalid or incomplete multibyte or wide character
Aborted
make: *** [sharedmods] Error 134
msg120827 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-11-09 00:30
> The real problem is that PREFIX is not decoded using _Py_charw2char() ...

Fixed by r86345.
msg120829 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-11-09 00:59
Now I get an error in copy_scripts() function of Lib/distutils/command/build_scripts.py: this function adjusts Python shebang of installed Python scripts. The problem is that the shebang contains a non-ASCII character whereas the script is written into the locale encoding, which is ASCII in my test.

test_httpservers has a similar issue: CGIHTTPServerTestCase fails if the Python executable full path is not encodable to utf-8. I fixed simply this issue by skipping the test:

    try:
        # The python executable path is written as the first line of the
        # CGI Python script. The encoding cookie cannot be used, and so the
        # path should be encodable to the default script encoding (utf-8)
        self.pythonexe.encode('utf-8')
    except UnicodeEncodeError:
        self.tearDown()
        raise self.skipTest(
            "Python executable path is not encodable to utf-8")

--

Attached patch, copy_script.patch, fixes this issue by reading the Python script in binary mode. It ensures that the shebang is decodable from utf-8 and the script encoding.

It checks with utf-8 because the shebang is always written before the encoding cookie, and so the parser reads the shebang with the default parser encoding, which is utf-8. Eg. with a shebang not decodable from utf-8:
---
  File "./test.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xff' in file ./blo.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
---

It checks with the script encoding because when the parser hits the encoding cookie, it restarts to read the whole file with the encoding, and so the parser decodes the shebang from the script encoding.
msg121551 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-11-19 16:49
My build error seems actually unrelated to encoding issues.  Working directory is ASCII-only, locale is UTF-8.

$ ./configure --with-pydebug
[snip]
$ make
[snip]
ranlib libpython3.2dm.a
gcc -pthread   -Xlinker -export-dynamic -o python Modules/python.o libpython3.2dm.a -lpthread -ldl  -lutil   -lm  
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Segmentation fault
make: *** [sharedmods] Erreur 139
msg121564 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-11-19 20:11
> My build error seems actually unrelated to encoding issues.  Working
> directory is ASCII-only, locale is UTF-8.
> 
> $ ./configure --with-pydebug
> [snip]
> $ make
> [snip]
> ranlib libpython3.2dm.a
> gcc -pthread   -Xlinker -export-dynamic -o python Modules/python.o
> libpython3.2dm.a -lpthread -ldl  -lutil   -lm Could not find platform
> dependent libraries <exec_prefix>
> Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
> Segmentation fault
> make: *** [sharedmods] Erreur 139

Can you retry in gdb to dump the backtrace?

Try maybe to cleanup your local copy with "make distclean".

As expected, I cannot reproduce your bug. Try to give all commands to 
reproduce the bug, and give as much information as possible.
msg121566 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-11-19 20:21
I will try tomorrow, thanks for reminding me.

 That was a fresh clone.

 I did.
msg121810 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-11-20 23:04
Regarding your fix to copy_script, I will have to ask python-dev about PEP 291 in py3k (i.e., should 3.2 code really be compatible with 2.3?).
msg124571 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-12-23 21:55
I can’t reproduce the crash when building in the source dir (and tests pass except for ctypes because its configure script refuses my directory path), and I can’t build in a subdir*, so this bug looks fixed.

* Error messages:
gcc: Parser/tokenizer_pgen.o: No such file or directory
gcc: Parser/printgrammar.o: No such file or directory
gcc: Parser/pgenmain.o: No such file or directory
msg124597 - (view) Author: Baptiste Carvello (zegreek) Date: 2010-12-24 14:46
Hello,

the patch solves the bug for me as well (using locale "C", the 
filesystem encoding is utf-8). However, I do not understand why the 
patch checks that the shebang line decodes with both utf-8 and the 
file's encoding. The shebang line is only used by the kernel to locate 
the interpreter, so none of these should matter. Or have I misuderstood 
the patch?

Cheers,
Baptiste
msg124646 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-25 21:46
Le vendredi 24 décembre 2010 à 14:46 +0000, Baptiste Carvello a écrit :
> the patch solves the bug for me as well (using locale "C", the 
> filesystem encoding is utf-8). However, I do not understand why the 
> patch checks that the shebang line decodes with both utf-8 and the 
> file's encoding. The shebang line is only used by the kernel to locate 
> the interpreter, so none of these should matter. Or have I misuderstood 
> the patch?

The shebang is read by 3 different functions:

 a) the shell reads the first line: if it starts with "#!", it's a
shebang: read the command and options and execute it
 b) Python searchs a "#cookie:xxx" pattern in the first or the second
line using a binary parser
 c) Python reads the file using the Python encoding: encoding written in
the #coding:xxx header or UTF-8 by default

(a) The shell reads the file as a binary file, it doesn't care of the
encoding. It reads byte strings and pass them to the kernel.

(b) The parser starts with the default encoding, UTF-8. Even if the file
encoding is not UTF-8, all lines (Python only checks the cookie in the
first or the second line) before #coding:xxx cookie are read in UTF-8.
The shebang have to be written to the first line, so the cookie cannot
be written before the shebang => the shebang have to be decodable from
UTF-8

(b) If the file encoding is not UTF-8, a #cookie:xxx is used and the
whole file (including the shebang) have to be decodable from this
encoding => the shebang have to be decodable from the file encoding

So the shebang have to be decodable from UTF-8 and from the file
encoding.

I should maybe add a comment about that in the patch.

Example of (b) issue:
---
$ ./build/scripts-3.2/2to3
  File "./build/scripts-3.2/2to3", line 1
SyntaxError: Non-UTF-8 code starting with '\xff' in
file ./build/scripts-3.2/2to3 on line 1, but no encoding declared; see
http://python.org/dev/peps/pep-0263/ for details
---
The shebang is b'#!/home/haypo/tmp/py3k\xff/bin/python3.2\n', my locale
encoding is UTF-8 and the file encoding has no encoding cookie (it is
encoded to UTF-8).

--

copy_script.patch fixes an issue if the configure prefix is not ASCII
(especially if the prefix is not decodable from UTF-8).
msg124647 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-25 22:16
Update copy_script patch: add comments to explain why the shebang have to be decodable from UTF-8 and from the script encoding.
msg124648 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-12-25 22:35
Baptiste: I meant that I couldn’t reproduce the bug, not that the patch had solved it.

Victor: Your patch uses os.fsencode, so porting to distutils2 won’t be easy.  Tarek and I have instated this policy http://wiki.python.org/moin/Distutils/FixingBugs to make sure we don’t have to reopen 50 bugs when distutils2 gets into the stdlib.

d2 has to stay compatible with 2.4-2.7, and 3.1-3.2 in a near future.
msg124650 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-25 22:56
Le samedi 25 décembre 2010 à 22:35 +0000, Éric Araujo a écrit :
> Victor: Your patch uses os.fsencode, so porting to distutils2 won’t be
> easy.

In Python 3.1, you can replace name=os.fsencode(name) by
name=name.encode(sys.getfilesystemencoding(), 'surrogateescape'). The
mbcs codec doesn't support surrogateescape. In Python 3.2 it does now
raise an error (and so os.fsencode() uses strict error handler if the
encoding is mbcs), whereas Python 3.1 just ignores the error handler (it
uses 'ignore' to encode).

In Python 2, self.executable is already a byte string (you don't need
os.fsencode()).
msg124651 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-12-25 22:58
I suggest you wait for distutils2 to be 3.x-compatible and then adapt your patch to fix the bug when used with 3.2, keeping backward compat.
msg127016 - (view) Author: Nils Philippsen (nils) Date: 2011-01-25 15:33
NB: it's not the shell, but the kernel which interprets the shebang line (and subsequently calls the shell /bin/sh with it if it's missing, causing funny effects when it encounters the first import line and you happen to have ImageMagick installed).
msg135750 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-10 22:23
New changeset 6ad356525381 by Victor Stinner in branch 'default':
Close #10419, issue #6011: build_scripts command of distutils handles correctly
http://hg.python.org/cpython/rev/6ad356525381
msg135755 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-10 23:04
I fixed #10419 in Python 3.2 and 3.3 (I applied my copy_script-2.patch fix). It is now possible to compile and install Python 3.2 and 3.3 with a non-ASCII prefix, so this issue can be done.

If you have issues when compiling Python with a non-ASCII prefix (and a locale encoding different than UTF-8), reopen the issue or open a more specific issue.

--

I don't want to backport fixes of this issue to Python 3.1, because Python 3.1 requires too much effort to handle correctly non-ASCII paths (not only for this specific issue). I consider that Python 3.1 is a stable release and should not be touched too much and non-ASCII path with a locale encoding different than UTF-8 is a corner case. If you have this issue with Python 3.1, please upgrade to Python 3.2 :-)
msg135903 - (view) Author: Baptiste Carvello (zegreek) Date: 2011-05-13 13:11
Indeed, I retried with 534a9e274d88 (that was the tip of 3.2 sometime 
yesterday) and my original problem is solved. Thank you.

While I was at it, I ran "make test",  and got 3 unusual skips and 1 
failure.

The skips are test_sax, test_xml_etree and test_xml_etree_c and they are 
skipped on purpose when the example XML filename is not encodable to 
utf8. No problem here.

The failure is for test_distutils. 3 individual tests are failing: 
test_simple_built, test_debug_mode and test_record. The cause of this 
failure is that the "install" command installs a test distribution to a 
path containing sys.prefix. This is not a problem per se, but later 
test_simple_built tries to zip this distribution, and cannot construct a 
valid archive name. A similar problem happens when test_record tries to 
write the distribution's filenames to a record file (and test_debug_mode 
fails because of test_record).

Imho those failures cannot be fixed, so the only possible improvement is 
to skip those tests. The attached trivial patch does just that, but I'm 
not sure if it's worth patching distutils for that.

Cheers,
Baptiste
msg214727 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2014-03-24 21:19
Victor, was this ticket kept open only for the backport to distutils2?  If everything is fixed in Python stdlib and docs, then it could be closed, as distutils2 development has stopped.
msg214738 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-03-24 21:44
> Victor, was this ticket kept open only for the backport to distutils2?

I have no idea :) It's probably fine now. Open a new and fresh issue if it's not fixed yet, this issue has a too long history.
History
Date User Action Args
2014-03-24 21:44:06hayposetstatus: open -> closed
resolution: fixed
messages: + msg214738
2014-03-24 21:19:29merwoksetversions: - 3rd party
nosy: + dstufft

messages: + msg214727

components: - Distutils2
2013-10-13 17:45:53georg.brandlsetstatus: open
2011-05-13 14:21:43hayposetstatus: closed -> (no value)
resolution: fixed -> (no value)
2011-05-13 13:19:19merwoksetassignee: tarek -> haypo
2011-05-13 13:11:57zegreeksetfiles: + test_distutils_surrogateescape.diff

messages: + msg135903
2011-05-10 23:04:43hayposetstatus: open -> closed
resolution: fixed
messages: + msg135755
2011-05-10 22:23:14hayposetmessages: + msg135750
2011-01-25 15:33:04nilssetnosy: haypo, tarek, merwok, zegreek, nils
messages: + msg127016
2010-12-25 22:58:54merwoksetnosy: haypo, tarek, merwok, zegreek, nils
messages: + msg124651
2010-12-25 22:56:50hayposetnosy: haypo, tarek, merwok, zegreek, nils
messages: + msg124650
2010-12-25 22:35:27merwoksetnosy: haypo, tarek, merwok, zegreek, nils
messages: + msg124648
2010-12-25 22:16:28hayposetfiles: + copy_script-2.patch
nosy: haypo, tarek, merwok, zegreek, nils
messages: + msg124647
2010-12-25 21:46:46hayposetnosy: haypo, tarek, merwok, zegreek, nils
messages: + msg124646
2010-12-24 14:46:33zegreeksetnosy: haypo, tarek, merwok, zegreek, nils
messages: + msg124597
2010-12-23 21:55:12merwoksetnosy: haypo, tarek, merwok, zegreek, nils
messages: + msg124571
2010-11-20 23:04:15merwoksetmessages: + msg121810
2010-11-19 20:21:58merwoksetmessages: + msg121566
2010-11-19 20:11:12hayposetmessages: + msg121564
2010-11-19 16:49:57merwoksetmessages: + msg121551
2010-11-09 00:59:23hayposetfiles: + copy_script.patch

messages: + msg120829
2010-11-09 00:30:10hayposetmessages: + msg120827
2010-11-08 23:26:07merwoksetmessages: - msg120820
2010-11-08 23:25:49merwoksetmessages: + msg120821
2010-11-08 23:21:28merwoksetmessages: + msg120820
2010-11-08 23:07:24hayposetmessages: + msg120818
2010-11-01 21:22:16hayposetmessages: + msg120178
2010-10-29 12:52:26zegreeksetmessages: + msg119894
2010-10-29 12:17:59zegreeksetmessages: + msg119891
2010-10-29 03:39:05merwoksetmessages: + msg119848
2010-10-29 03:37:42merwoksetmessages: + msg119847
2010-10-23 17:04:22hayposetmessages: + msg119449
2010-10-23 08:30:30merwoksetmessages: + msg119420
2010-10-23 00:13:53hayposetmessages: + msg119409
2010-10-22 15:12:22merwoksetmessages: + msg119389
2010-10-22 14:23:12merwoksetmessages: + msg119383
2010-10-22 08:51:53hayposetmessages: + msg119368
2010-10-21 23:43:08merwoksetmessages: + msg119353
2010-10-19 02:03:11hayposetmessages: + msg119102
2010-10-18 20:19:06merwoksetmessages: + msg119074
2010-10-05 11:08:39nilssetnosy: + nils
2010-09-29 23:44:43merwoksetversions: + 3rd party
2010-09-14 12:04:40hayposetmessages: + msg116391
2010-09-14 12:02:45hayposetmessages: + msg116390
2010-09-13 12:17:12zegreeksetmessages: + msg116297
2010-09-13 01:01:19merwoksetnosy: haypo, tarek, merwok, zegreek
dependencies: + sysconfig and distutils.sysconfig differ in subtle ways
messages: + msg116263
components: + Distutils2
2010-09-11 10:21:35zegreeksetmessages: + msg116087
2010-09-10 23:32:15hayposetfiles: + distutils_makefile_encoding.patch

messages: + msg116060
2010-08-21 21:46:31hayposetmessages: + msg114571
2010-08-21 21:42:00hayposetmessages: + msg114568
2010-08-21 17:11:32merwoksetnosy: + haypo, merwok

versions: + Python 2.7, Python 3.2
2009-05-14 21:58:29tareksetmessages: + msg87772
2009-05-14 21:16:37zegreeksetfiles: + textio.diff

messages: + msg87766
2009-05-13 11:03:34zegreeksetfiles: + text_file.diff
keywords: + patch
messages: + msg87675
2009-05-13 11:01:48zegreekcreate