classification
Title: sys.displayhook: use backslashreplace error handler if repr(value) is not encodable to sys.stdout
Type: Stage:
Components: Unicode Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, belopolsky, ezio.melotti, lemburg, vstinner
Priority: normal Keywords: patch

Created on 2010-12-02 01:43 by vstinner, last changed 2010-12-04 17:25 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
displayhook_unencodable.patch vstinner, 2010-12-02 01:43
Messages (4)
msg123031 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-02 01:43
On Windows, the Python interpreter fails to display a result if stdout encoding is unable to encode it. This problem exists since Python 3.0. Eg. see issue #1602.

This problem is not specific to Windows. Even if stdout encoding is UTF-8 (which is the default encoding of Mac OS X and most Linux distributions), it fails on surrogate characters (because the UTF-8 encoder refuses surrogate characters in Python 3). Eg. see issue #5110.

Even if a Python (core? :-)) developer can see this behaviour as expected, it looks like different users (including me) don't like it and would prefer to see the result instead of an unicode exception. The problem is that we don't know directly (except for simple commands) if the error comes from the command or if printing the result failed.

This issue is specific to sys.displayhook, the callback used by the Python interpreter to display the result of a command. It doesn't concern print() or sys.stdout.write().

--

The best solution would be to check if the terminal is able to render a character, but this is not possible for technical reasons. The best that we can do is to catch the UnicodeEncodeError and use another error handler (than sys.stdout.errors) which doesn't fail. 'backslashreplace' is a good candidate.

Ezio Melotti implemented this solution and attached a patch to issue #9198.

I wrote a new version of his patch, changes:

 - Create a subfunction (for better readability)
 - Clear the UnicodeEncodeError before calling sys_displayhook_unencodable() (anyway, the exception will be lost on next error, eg. if PyObject_GetAttrString() fails)
 - Clear the (AttributeError) exception if PyObject_GetAttrString(outf, "buffer") fails
 - Add an unit test: test ASCII, ISO-8859-1 and UTF-8 with non-ASCII, surrogate and non-BMP (printable or not) characters
 - Complete and update sys.displayhook documentation
 - Fix a refleak if stdout_encoding_str == NULL
 - Use PyObject_CallMethod() instead of PyTuple_Pack() + PyEval_CallObject() for a shorter (and more readable) code

--

I don't know how to test the case: sys.stdout.write(repr(value)) fails and sys.stdout has no buffer attribute. A mockup should maybe be used in the unit test?
msg123032 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-02 01:48
This issue is opposed to the PEP 3128:

<< Default error-handler of sys.stdout should be 'backslashreplace'.

Stuff written to stdout might be consumed by another program that might misinterpret the escapes. For interactive session, it is possible to make 'backslashreplace' error-handler to default, but may add confusion of the kind "it works in interactive mode but not when redirecting to a file". >>

But if you read #1602, #5110 and #9198, you will see that not everybody agrees with the PEP.
msg123034 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-02 01:52
> This issue is opposed to the PEP 3128:
>
> << Default error-handler of sys.stdout should be 'backslashreplace'....

Oops sorry, no it is not opposed to the PEP (this issue doesn't propose to change the default error handler of sys.stdout).
msg123374 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-04 17:25
Commited to Python 3.2 (r87054).
History
Date User Action Args
2010-12-04 17:25:50vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg123374
2010-12-02 01:52:40vstinnersetmessages: + msg123034
2010-12-02 01:49:27vstinnersetnosy: + lemburg, amaury.forgeotdarc
2010-12-02 01:48:10vstinnersetmessages: + msg123032
2010-12-02 01:43:25vstinnercreate