Author zegreek
Recipients tarek, zegreek
Date 2009-05-13.11:01:43
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1242212510.55.0.349129306137.issue6011@psf.upfronthosting.co.za>
In-reply-to
Content
I have tried to build python (version 3.1 beta 1) on linux and install
it to a non-standard prefix which contains non-ascii utf-8 characters
(my locale being utf-8). The build directory's path is ascii-only. The
exact configure line is given in the attached file 'tb.txt'.

Then the 'make' command fails at the stage where python extensions are
built, with the traceback displayed in file tb.txt (in short:
UnicodeDecodeError: 'ascii' codec can't decode byte ... ).

The problem is triggered when 'distutils.sysconfig.get_config_vars'
tries to parse the Makefile. The Makefile is opened with
'distutils.text_file.TextFile', which in turns calls 'io.open' with no
'encoding' parameter. At this stage of the build, the 'locale' module is
not available (due to '_collections' not being), so that
'locale.getprefferedencoding' cannot be called and the encoding falls
back to ascii (a quick look to 'Modules/_io/textio.c' suggests that this
fallback mechanism is already designed for being used at build time).

The solution I propose would be to use 'sys.getfilesystemencoding' as a
fallback first, as it is defined during build time on most systems:
windows, mac and on posix if 'CODESET' exists in 'langinfo.h'. Given
that in build routines, non-ascii characters are only likely to be
encountered in filesystem paths, this seems a reasonable behavior.

The attached patch 'text_file.diff' implements this strategy in
'distutils.text_file', and then calls 'io.open' with the appropriate
'encoding' parameter. It could be argued, however, that this new
fallback is of general interest and should be implemented directly in
'Modules/_io/textio.c'. If you deem so, I could try to come up with a
new patch.

The attached patch solves the problem on my system, and does not
introduce test failures (which is expected, as the new fallback should
only make a difference at build time).

Cheers,
Baptiste
History
Date User Action Args
2009-05-13 11:01:50zegreeksetrecipients: + zegreek, tarek
2009-05-13 11:01:50zegreeksetmessageid: <1242212510.55.0.349129306137.issue6011@psf.upfronthosting.co.za>
2009-05-13 11:01:48zegreeklinkissue6011 messages
2009-05-13 11:01:46zegreekcreate