New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Printing Unicode chars from the interpreter in a non-UTF8 terminal raises an error (Py3) #49360
Comments
In Py2.x
>>> u'\2620'
outputs u'\2620' whereas
>>> print u'\2620'
raises an error.
Instead, in Py3.x, both
>>> '\u2620'
and
>>> print('\u2620')
raise an error if the terminal doesn't use an encoding able to display
the character (e.g. the windows terminal used for these examples). This is caused by the new string representation defined in the PEP-31381. Consider also the following example:
Py2:
>>> [u'\u2620']
[u'\u2620']
Py3:
>>> ['\u2620']
UnicodeEncodeError: 'charmap' codec can't encode character '\u2620' in
position 9: character maps to <undefined>
This means that there is no way to print lists (or other objects) that
contain characters that can't be encoded.
Two workarounds may be:
1) encode all the elements of the list, but it's not practical;
2) use ascii(), but it adds extra "" around the output and escape
backslashes and apostrophes (and it won't be possible to use _[0] in the
next line).
Also note that in Py3
>>> ['\ud800']
['\ud800']
>>> _[0]
'\ud800'
works, because U+D800 belongs to the category "Cs (Other, Surrogate)"
and it is escaped[2].
The best solution is probably to change the default error-handler of the
Python3 interactive interpreter to 'backslashreplace' in order to avoid
this behavior, but I don't know if it's possible only for ">>> foo" and
not for ">>> print(foo)" (print() should still raise an error as it does
in Py2). This proposal has already been refused in the PEP-31383 but there are I think this should be rediscussed and possibly changed, because, even |
To be clear, this issue only affects the interpreter.
It doesn't ass extra "" if you replace repr() by ascii() in the
Hum, it implies that sys.stdout has a different behaviour in the |
You change change the display hook with a site.py script (which have import sys
def hook(message):
print(ascii(message))
sys.displayhook = hook Example (run python in an empty environment to get ASCII charset): $ env -i PYTHONPATH=$PWD ./python
Python 3.1a0 (py3k:69105M, Jan 30 2009, 10:36:27)
>>> import sys
>>> sys.stdout.encoding
'ANSI_X3.4-1968'
>>> "\xe9"
'\xe9'
>>> print("\xe9")
Traceback (most recent call last):
(...)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' (...) |
This seems to solve the problem, but apparently the interactive "_" |
Oh yeah, original sys.displayhook uses a special hack for the _ global import sys
import builtins
def hook(message):
if message is None:
return
builtins._ = message
print(ascii(message))
sys.displayhook = hook |
Here is a patch to use ascii() directly in sys_displayhook() (with an |
Victor, I'm not sure whether you are proposing that Overall, I fail to see the bug in this report. Python 3.0 works as |
This seems to fix the problem: import sys
import builtins
def hook(message):
if message is None:
return
builtins._ = message
try:
print(repr(message))
except UnicodeEncodeError:
print(ascii(message))
sys.displayhook = hook Just to clarify:
* The current Py3 behavior works fine in UTF8 terminals
* It doesn't work on non-UTF8 terminals if they can't encode the chars
(they raise an error)
* It only affects the interactive interpreter
* This new patch escapes the chars instead of raise an error only on
non-UTF8 terminal and only when printed as ">>> foo" (without print())
and leaves the other behaviors unchanged
* This is related to Py3 only Apparently the patch provided by Victor always escapes the non-ascii This only changes the behavior of ">>> foo", so it can not lead to
confusion ("It works in the interpreter but not in the script"). In a
script one can't write "foo" alone but "print(foo)" and the behavior of
"print(foo)" is the same in both the interpreter and the scripts (with
the patch applied):
>>> ['\u2620']
['\u2620']
>>> print(['\u2620'])
UnicodeEncodeError: 'charmap' codec can't encode character '\u2620' in
position 2: character maps to <undefined> I think that the PEP-3138 didn't consider this issue. Its purpose is to This is an improvement and I can't see any negative side-effect. Attached there's a txt with more example, on Py2 and Py3, on |
The idea is to avoid unicode error (by replacing not printable characters by It's just to make Python3 interpreter a little bit more "user friendly" on Problem: use different (encoding) rule for the display hook and for print()
may disturb new users (Why does ">>> chr(...)" work whereas ">>>
print(chr(...))" fails?). |
This is the same behavior that Python2.x has (with the only difference
that Py2 always show the char as u'\uXXXX' if >0x7F whereas Py3 /tries/
to display it):
>>> unichr(0x0100)
u'\u0100'
>>> print unichr(0x0100)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0100' in
position 0: character maps to <undefined> |
I've also noticed that if an error contains non-encodable characters,
they are escaped:
>>> raise ValueError("\u2620 can't be printed here, but '\u00e8' works
fine!")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: \u2620 can't be printed here, but 'è' works fine!
but:
>>> "\u2620 can't be printed here, but '\u00e8' works fine!"
UnicodeEncodeError: 'charmap' codec can't encode character '\u2620' in
position 1: character maps to <undefined>
The mechanism used to escape errors is even better than my patch,
because it escapes only the chars that can't be encoded, instead of
escaping every non-ascii chars when at least one char can't be encoded:
>>> "\u2620 can't be printed here, but '\u00e8' works fine!"
"\u2620 can't be printed here, but '\xe8' works fine!" I wonder if we can reuse the same mechanism here. By the way, the patch I proposed in msg80852 is just a proof of concept, |
martin> IIUC, this patch breaks PEP-3138, After reading the PEP-3138, it's clear that this issue is not bug, and Windows user who want to get the Python2 behaviour can use my display We can not fix this issue, so I choose to close it. If anyone wants to |
In the first message I said that this breaks the PEP-3138 because I sys.displayhook provides a way to change the behavior of the interactive
interpreter only when ">>> foo" is used. The PEP doesn't seem to say
anything about how ">>> foo" should behave. Moreover, in the alternate solutions 1 they considered to use This is exactly the behavior I intended to have, and, being a unique |
My proposal to make backslashreplace a default error handler Does something like |
What I'm proposing is not to change the default error handler to |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: