classification
Title: Failed compile in a non-ASCII path
Type: compile error Stage: needs patch
Components: Distutils Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: tarek Nosy List: eric.araujo, pitrou, tarek, vstinner
Priority: normal Keywords: patch

Created on 2010-05-08 17:38 by pitrou, last changed 2010-05-19 20:49 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
distutils_log_backslashreplace.patch vstinner, 2010-05-14 21:22
setup_stdout_backslashreplace.patch vstinner, 2010-05-14 21:31
distutils_log_backslashreplace-2.patch vstinner, 2010-05-19 12:41
Messages (24)
msg105320 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-08 17:38
Distutils fails logging characters which are not part of the "default encoding":

http://www.python.org/dev/buildbot/builders/AMD64%20Ubuntu%20wide%203.x/builds/1062/steps/compile/logs/stdio

[...]
  File "/home/buildbot/cpython-ucs4-nonascii-\u20ac/3.x.pitrou-ubuntu-wide/build/Lib/distutils/dir_util.py", line 67, in mkpath
    log.info("creating %s", head)
  File "/home/buildbot/cpython-ucs4-nonascii-\u20ac/3.x.pitrou-ubuntu-wide/build/Lib/distutils/log.py", line 40, in info
    self._log(INFO, msg, args)
  File "/home/buildbot/cpython-ucs4-nonascii-\u20ac/3.x.pitrou-ubuntu-wide/build/Lib/distutils/log.py", line 30, in _log
    stream.write('%s\n' % msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 81: ordinal not in range(128)
[54439 refs]
msg105341 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-08 22:50
What is your locale and your locale encoding? distutils use ASCII but I'm not 
sure that your locale encoding is ASCII, because Python fails to start if the 
locale is ASCII and the path contains a non ASCII character.

See also issue #8611: "Python3 doesn't support locale different than utf8 and 
an non-ASCII path (POSIX)".
msg105343 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-08 23:03
As I answered on IRC, this is on a buildbot environment. Compiling from the command line works fine, but not from the buildbot environment.

From the command line, the locale is as following:
$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

But I guess this information is not useful.
msg105344 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-08 23:06
Ok, this might be more interesting:

$ cat | ./python -i
Python 3.2a0 (py3k:81010, May  8 2010, 23:25:47) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
>>> import sys
>>> sys.stdin 
<_io.TextIOWrapper name='<stdin>' encoding='ascii'>
>>> sys.getdefaultencoding()
'utf-8'

As you see, stdin uses ascii...
msg105345 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-08 23:20
Anyway, regardless of the actual stdout encoding, distutils should be able to log messages without crashing, IMO.
msg105346 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-08 23:33
Conditions to reproduce the bug: don't use make -j N (unset MAKEFLAGS), stdout and stderr should be be TTY => use make 2>&1|cat.
msg105347 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-05-08 23:33
Another long-standing encoding bug was fixed: It was impossible to use e.g. an author name with non-ASCII characters. The fix makes distutils use UTF-8. Of course, it’s more complicated to choose an encoding for terminal I/O.
msg105349 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-05-08 23:40
(I wrote before I saw Victor’s reply) Does it work with PYTHONIOENDODING set to UTF-8?
msg105350 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-08 23:46
If the standard output is not a TTY, Python uses ASCII encoding for sys.stdout: ./python -c "import sys;print(sys.stdout.encoding)"|cat => ascii.

This issue remembers me: #8533 (regrtest: use backslashreplace error handler for stdout).
msg105352 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-08 23:57
> If the standard output is not a TTY, Python uses ASCII encoding for 
> sys.stdout

We could perhaps fix this too, if python-dev agrees.
msg105353 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-08 23:59
Attached patch escapes non-ASCII characters of the log message using ASCII+backslashreplace (but keep unicode type).
msg105354 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-05-09 00:02
Wasn’t PYTHONIOENCODING added for such cases?
msg105355 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-09 00:04
> Wasn’t PYTHONIOENCODING added for such cases?

Yes, it was, but it's a very bad workaround. In most if not all cases,
people will set PYTHONIOENCODING to their system's default encoding.
Therefore, they shouldn't have to set an environment variable at all.
msg105356 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-09 00:48
> > If the standard output is not a TTY, Python uses ASCII encoding 
> > for sys.stdout

> We could perhaps fix this too, if python-dev agrees.

Open a new issue please if you consider that as a bug.
msg105357 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-09 00:50
I tried to recompile Python with "export PYTHONIOENCODING=ascii" but it doesn't fail. That's because the Makefile calls "./python -E ./setup.py -q build": -E ignores environment variables.
msg105726 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-14 17:49
New patch: use sys.stdout.encoding instead of ASCII. (If stdout is not a TTY, sys.stdout.encoding is ASCII.)
msg105764 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-14 21:22
Conditions to reproduce the bug:
 - non ASCII directory
 - don't use make -j N (no MAKEFLAGS environment variable)
 - write make output into a pipe, eg. make 2>&1|cat

My 2 last patches are not enough: there are other functions writing non-ASCII strings to log whereas log encoding is ASCII. The right fix is to use the backslashreplace error handler for the log. Two solutions:
 a) replace sys.stdout by a new file using backslashreplace: I tried this solution for regrtest.py: #8533. My patch for regrtest.py doesn't work on Windows because of a newline issue
 b) emulate backslashreplace only in distutils log

I prefer (a) because it can be implemented in setup.py without touching distutils (tarek told me that distutils shouldn't be patched too much).

distutils_log_backslashreplace.patch implements (b).
msg105765 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2010-05-14 21:27
I'll look at that asap. Although, all these patchs should have some tests demonstrating the bugs
msg105766 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-14 21:31
setup_stdout_backslashreplace.patch implements point (a).

I don't know which solution is better. This issue is not specific to compiling CPython, other programs may fail in non-ASCII paths and so solution (b) is maybe better.
msg106056 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-19 12:41
> Although, all these patchs should have some tests demonstrating the bugs

As discussed on IRC, fixing distutils.log package instead of setup.py is better. New patch includes unit tests.
msg106082 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-19 17:00
I commited to patch on distutils.log in r81359 (py3k). I'm waiting for the buildbot before backporting to 3.1.
msg106094 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-19 20:32
> I commited to patch on distutils.log in r81359 (py3k)

Backported to 3.1 as r81363.
msg106098 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2010-05-19 20:46
I thought you had a unit test, I don't see any in your commit
msg106099 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-19 20:49
> I thought you had a unit test, I don't see any in your commit

"patch -p0 < ... && svn ci" doesn't include new files. I forgot it.

r81361 includes the new file.
History
Date User Action Args
2010-05-19 20:49:14vstinnersetmessages: + msg106099
2010-05-19 20:46:58tareksetmessages: + msg106098
2010-05-19 20:33:37vstinnersetstatus: open -> closed
2010-05-19 20:32:57vstinnersetstatus: pending -> open

messages: + msg106094
2010-05-19 17:00:44vstinnersetstatus: open -> pending
resolution: fixed
messages: + msg106082
2010-05-19 12:41:57vstinnersetfiles: + distutils_log_backslashreplace-2.patch

messages: + msg106056
2010-05-19 11:56:24vstinnersetfiles: - distutils_spawn_log.patch
2010-05-14 21:31:40vstinnersetfiles: + setup_stdout_backslashreplace.patch

messages: + msg105766
2010-05-14 21:27:59tareksetmessages: + msg105765
2010-05-14 21:22:44vstinnersetfiles: + distutils_log_backslashreplace.patch

messages: + msg105764
2010-05-14 17:49:19vstinnersetfiles: - distutils_spawn_toascii.patch
2010-05-14 17:49:14vstinnersetfiles: + distutils_spawn_log.patch

messages: + msg105726
2010-05-09 00:50:36vstinnersetmessages: + msg105357
2010-05-09 00:48:23vstinnersetmessages: + msg105356
2010-05-09 00:04:28pitrousetmessages: + msg105355
2010-05-09 00:02:57eric.araujosetmessages: + msg105354
2010-05-08 23:59:44vstinnersetfiles: + distutils_spawn_toascii.patch
keywords: + patch
messages: + msg105353
2010-05-08 23:57:12pitrousetmessages: + msg105352
2010-05-08 23:46:23vstinnersetmessages: + msg105350
2010-05-08 23:40:01eric.araujosetmessages: + msg105349
2010-05-08 23:33:57eric.araujosetnosy: + eric.araujo
messages: + msg105347
2010-05-08 23:33:42vstinnersetmessages: + msg105346
2010-05-08 23:20:52pitrousetmessages: + msg105345
2010-05-08 23:06:17pitrousetmessages: + msg105344
2010-05-08 23:03:21pitrousetmessages: + msg105343
2010-05-08 22:50:11vstinnersetmessages: + msg105341
2010-05-08 17:38:40pitroucreate