Author ezio.melotti
Recipients ezio.melotti
Date 2009-01-30.15:26:45
SpamBayes Score 1.1368e-11
Marked as misclassified No
Message-id <>
In Py2.x
>>> u'\2620'
outputs u'\2620' whereas
>>> print u'\2620'
raises an error.

Instead, in Py3.x, both
>>> '\u2620'
>>> print('\u2620')
raise an error if the terminal doesn't use an encoding able to display
the character (e.g. the windows terminal used for these examples).

This is caused by the new string representation defined in the PEP3138[1].

Consider also the following example:
>>> [u'\u2620']
>>> ['\u2620']
UnicodeEncodeError: 'charmap' codec can't encode character '\u2620' in
position 9: character maps to <undefined>

This means that there is no way to print lists (or other objects) that
contain characters that can't be encoded.
Two workarounds may be:
1) encode all the elements of the list, but it's not practical;
2) use ascii(), but it adds extra "" around the output and escape
backslashes and apostrophes (and it won't be possible to use _[0] in the
next line).
Also note that in Py3
>>> ['\ud800']
>>> _[0]
works, because U+D800 belongs to the category "Cs (Other, Surrogate)"
and it is escaped[2].

The best solution is probably to change the default error-handler of the
Python3 interactive interpreter to 'backslashreplace' in order to avoid
this behavior, but I don't know if it's possible only for ">>> foo" and
not for ">>> print(foo)" (print() should still raise an error as it does
in Py2).

This proposal has already been refused in the PEP3138[3] but there are
no links to the discussion that led to this decision.

I think this should be rediscussed and possibly changed, because, even
if can't see the "listOfJapaneseStrings"[4], I still prefer to see a
sequence of escaped chars than a UnicodeEncodeError.

Date User Action Args
2009-01-30 15:26:49ezio.melottisetrecipients: + ezio.melotti
2009-01-30 15:26:49ezio.melottisetmessageid: <>
2009-01-30 15:26:47ezio.melottilinkissue5110 messages
2009-01-30 15:26:45ezio.melotticreate