msg105320 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-05-08 17:38 |
Distutils fails logging characters which are not part of the "default encoding":
http://www.python.org/dev/buildbot/builders/AMD64%20Ubuntu%20wide%203.x/builds/1062/steps/compile/logs/stdio
[...]
File "/home/buildbot/cpython-ucs4-nonascii-\u20ac/3.x.pitrou-ubuntu-wide/build/Lib/distutils/dir_util.py", line 67, in mkpath
log.info("creating %s", head)
File "/home/buildbot/cpython-ucs4-nonascii-\u20ac/3.x.pitrou-ubuntu-wide/build/Lib/distutils/log.py", line 40, in info
self._log(INFO, msg, args)
File "/home/buildbot/cpython-ucs4-nonascii-\u20ac/3.x.pitrou-ubuntu-wide/build/Lib/distutils/log.py", line 30, in _log
stream.write('%s\n' % msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 81: ordinal not in range(128)
[54439 refs]
|
msg105341 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-08 22:50 |
What is your locale and your locale encoding? distutils use ASCII but I'm not
sure that your locale encoding is ASCII, because Python fails to start if the
locale is ASCII and the path contains a non ASCII character.
See also issue #8611: "Python3 doesn't support locale different than utf8 and
an non-ASCII path (POSIX)".
|
msg105343 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-05-08 23:03 |
As I answered on IRC, this is on a buildbot environment. Compiling from the command line works fine, but not from the buildbot environment.
From the command line, the locale is as following:
$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
But I guess this information is not useful.
|
msg105344 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-05-08 23:06 |
Ok, this might be more interesting:
$ cat | ./python -i
Python 3.2a0 (py3k:81010, May 8 2010, 23:25:47)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
>>> import sys
>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' encoding='ascii'>
>>> sys.getdefaultencoding()
'utf-8'
As you see, stdin uses ascii...
|
msg105345 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-05-08 23:20 |
Anyway, regardless of the actual stdout encoding, distutils should be able to log messages without crashing, IMO.
|
msg105346 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-08 23:33 |
Conditions to reproduce the bug: don't use make -j N (unset MAKEFLAGS), stdout and stderr should be be TTY => use make 2>&1|cat.
|
msg105347 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-05-08 23:33 |
Another long-standing encoding bug was fixed: It was impossible to use e.g. an author name with non-ASCII characters. The fix makes distutils use UTF-8. Of course, it’s more complicated to choose an encoding for terminal I/O.
|
msg105349 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-05-08 23:40 |
(I wrote before I saw Victor’s reply) Does it work with PYTHONIOENDODING set to UTF-8?
|
msg105350 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-08 23:46 |
If the standard output is not a TTY, Python uses ASCII encoding for sys.stdout: ./python -c "import sys;print(sys.stdout.encoding)"|cat => ascii.
This issue remembers me: #8533 (regrtest: use backslashreplace error handler for stdout).
|
msg105352 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-05-08 23:57 |
> If the standard output is not a TTY, Python uses ASCII encoding for
> sys.stdout
We could perhaps fix this too, if python-dev agrees.
|
msg105353 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-08 23:59 |
Attached patch escapes non-ASCII characters of the log message using ASCII+backslashreplace (but keep unicode type).
|
msg105354 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-05-09 00:02 |
Wasn’t PYTHONIOENCODING added for such cases?
|
msg105355 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-05-09 00:04 |
> Wasn’t PYTHONIOENCODING added for such cases?
Yes, it was, but it's a very bad workaround. In most if not all cases,
people will set PYTHONIOENCODING to their system's default encoding.
Therefore, they shouldn't have to set an environment variable at all.
|
msg105356 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-09 00:48 |
> > If the standard output is not a TTY, Python uses ASCII encoding
> > for sys.stdout
> We could perhaps fix this too, if python-dev agrees.
Open a new issue please if you consider that as a bug.
|
msg105357 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-09 00:50 |
I tried to recompile Python with "export PYTHONIOENCODING=ascii" but it doesn't fail. That's because the Makefile calls "./python -E ./setup.py -q build": -E ignores environment variables.
|
msg105726 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-14 17:49 |
New patch: use sys.stdout.encoding instead of ASCII. (If stdout is not a TTY, sys.stdout.encoding is ASCII.)
|
msg105764 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-14 21:22 |
Conditions to reproduce the bug:
- non ASCII directory
- don't use make -j N (no MAKEFLAGS environment variable)
- write make output into a pipe, eg. make 2>&1|cat
My 2 last patches are not enough: there are other functions writing non-ASCII strings to log whereas log encoding is ASCII. The right fix is to use the backslashreplace error handler for the log. Two solutions:
a) replace sys.stdout by a new file using backslashreplace: I tried this solution for regrtest.py: #8533. My patch for regrtest.py doesn't work on Windows because of a newline issue
b) emulate backslashreplace only in distutils log
I prefer (a) because it can be implemented in setup.py without touching distutils (tarek told me that distutils shouldn't be patched too much).
distutils_log_backslashreplace.patch implements (b).
|
msg105765 - (view) |
Author: Tarek Ziadé (tarek) *  |
Date: 2010-05-14 21:27 |
I'll look at that asap. Although, all these patchs should have some tests demonstrating the bugs
|
msg105766 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-14 21:31 |
setup_stdout_backslashreplace.patch implements point (a).
I don't know which solution is better. This issue is not specific to compiling CPython, other programs may fail in non-ASCII paths and so solution (b) is maybe better.
|
msg106056 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-19 12:41 |
> Although, all these patchs should have some tests demonstrating the bugs
As discussed on IRC, fixing distutils.log package instead of setup.py is better. New patch includes unit tests.
|
msg106082 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-19 17:00 |
I commited to patch on distutils.log in r81359 (py3k). I'm waiting for the buildbot before backporting to 3.1.
|
msg106094 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-19 20:32 |
> I commited to patch on distutils.log in r81359 (py3k)
Backported to 3.1 as r81363.
|
msg106098 - (view) |
Author: Tarek Ziadé (tarek) *  |
Date: 2010-05-19 20:46 |
I thought you had a unit test, I don't see any in your commit
|
msg106099 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-19 20:49 |
> I thought you had a unit test, I don't see any in your commit
"patch -p0 < ... && svn ci" doesn't include new files. I forgot it.
r81361 includes the new file.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:00 | admin | set | github: 52909 |
2010-05-19 20:49:14 | vstinner | set | messages:
+ msg106099 |
2010-05-19 20:46:58 | tarek | set | messages:
+ msg106098 |
2010-05-19 20:33:37 | vstinner | set | status: open -> closed |
2010-05-19 20:32:57 | vstinner | set | status: pending -> open
messages:
+ msg106094 |
2010-05-19 17:00:44 | vstinner | set | status: open -> pending resolution: fixed messages:
+ msg106082
|
2010-05-19 12:41:57 | vstinner | set | files:
+ distutils_log_backslashreplace-2.patch
messages:
+ msg106056 |
2010-05-19 11:56:24 | vstinner | set | files:
- distutils_spawn_log.patch |
2010-05-14 21:31:40 | vstinner | set | files:
+ setup_stdout_backslashreplace.patch
messages:
+ msg105766 |
2010-05-14 21:27:59 | tarek | set | messages:
+ msg105765 |
2010-05-14 21:22:44 | vstinner | set | files:
+ distutils_log_backslashreplace.patch
messages:
+ msg105764 |
2010-05-14 17:49:19 | vstinner | set | files:
- distutils_spawn_toascii.patch |
2010-05-14 17:49:14 | vstinner | set | files:
+ distutils_spawn_log.patch
messages:
+ msg105726 |
2010-05-09 00:50:36 | vstinner | set | messages:
+ msg105357 |
2010-05-09 00:48:23 | vstinner | set | messages:
+ msg105356 |
2010-05-09 00:04:28 | pitrou | set | messages:
+ msg105355 |
2010-05-09 00:02:57 | eric.araujo | set | messages:
+ msg105354 |
2010-05-08 23:59:44 | vstinner | set | files:
+ distutils_spawn_toascii.patch keywords:
+ patch messages:
+ msg105353
|
2010-05-08 23:57:12 | pitrou | set | messages:
+ msg105352 |
2010-05-08 23:46:23 | vstinner | set | messages:
+ msg105350 |
2010-05-08 23:40:01 | eric.araujo | set | messages:
+ msg105349 |
2010-05-08 23:33:57 | eric.araujo | set | nosy:
+ eric.araujo messages:
+ msg105347
|
2010-05-08 23:33:42 | vstinner | set | messages:
+ msg105346 |
2010-05-08 23:20:52 | pitrou | set | messages:
+ msg105345 |
2010-05-08 23:06:17 | pitrou | set | messages:
+ msg105344 |
2010-05-08 23:03:21 | pitrou | set | messages:
+ msg105343 |
2010-05-08 22:50:11 | vstinner | set | messages:
+ msg105341 |
2010-05-08 17:38:40 | pitrou | create | |