Message 80820 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti
Date	2009-01-30.15:26:45
SpamBayes Score	1.1368018e-11
Marked as misclassified	No
Message-id	<1233329209.55.0.55287436123.issue5110@psf.upfronthosting.co.za>
In-reply-to

Content
In Py2.x >>> u'\2620' outputs u'\2620' whereas >>> print u'\2620' raises an error. Instead, in Py3.x, both >>> '\u2620' and >>> print('\u2620') raise an error if the terminal doesn't use an encoding able to display the character (e.g. the windows terminal used for these examples). This is caused by the new string representation defined in the PEP3138[1]. Consider also the following example: Py2: >>> [u'\u2620'] [u'\u2620'] Py3: >>> ['\u2620'] UnicodeEncodeError: 'charmap' codec can't encode character '\u2620' in position 9: character maps to <undefined> This means that there is no way to print lists (or other objects) that contain characters that can't be encoded. Two workarounds may be: 1) encode all the elements of the list, but it's not practical; 2) use ascii(), but it adds extra "" around the output and escape backslashes and apostrophes (and it won't be possible to use _[0] in the next line). Also note that in Py3 >>> ['\ud800'] ['\ud800'] >>> _[0] '\ud800' works, because U+D800 belongs to the category "Cs (Other, Surrogate)" and it is escaped[2]. The best solution is probably to change the default error-handler of the Python3 interactive interpreter to 'backslashreplace' in order to avoid this behavior, but I don't know if it's possible only for ">>> foo" and not for ">>> print(foo)" (print() should still raise an error as it does in Py2). This proposal has already been refused in the PEP3138[3] but there are no links to the discussion that led to this decision. I think this should be rediscussed and possibly changed, because, even if can't see the "listOfJapaneseStrings"[4], I still prefer to see a sequence of escaped chars than a UnicodeEncodeError. [1]: http://www.python.org/dev/peps/pep-3138/ [2]: http://www.python.org/dev/peps/pep-3138/#specification [3]: http://www.python.org/dev/peps/pep-3138/#rejected-proposals [4]: http://www.python.org/dev/peps/pep-3138/#motivation

In Py2.x
>>> u'\2620'
outputs u'\2620' whereas
>>> print u'\2620'
raises an error.

Instead, in Py3.x, both
>>> '\u2620'
and
>>> print('\u2620')
raise an error if the terminal doesn't use an encoding able to display
the character (e.g. the windows terminal used for these examples).

This is caused by the new string representation defined in the PEP3138[1].

Consider also the following example:
Py2:
>>> [u'\u2620']
[u'\u2620']
Py3:
>>> ['\u2620']
UnicodeEncodeError: 'charmap' codec can't encode character '\u2620' in
position 9: character maps to <undefined>

This means that there is no way to print lists (or other objects) that
contain characters that can't be encoded.
Two workarounds may be:
1) encode all the elements of the list, but it's not practical;
2) use ascii(), but it adds extra "" around the output and escape
backslashes and apostrophes (and it won't be possible to use _[0] in the
next line).
 
Also note that in Py3
>>> ['\ud800']
['\ud800']
>>> _[0]
'\ud800'
works, because U+D800 belongs to the category "Cs (Other, Surrogate)"
and it is escaped[2].

The best solution is probably to change the default error-handler of the
Python3 interactive interpreter to 'backslashreplace' in order to avoid
this behavior, but I don't know if it's possible only for ">>> foo" and
not for ">>> print(foo)" (print() should still raise an error as it does
in Py2).

This proposal has already been refused in the PEP3138[3] but there are
no links to the discussion that led to this decision.

I think this should be rediscussed and possibly changed, because, even
if can't see the "listOfJapaneseStrings"[4], I still prefer to see a
sequence of escaped chars than a UnicodeEncodeError.

[1]: http://www.python.org/dev/peps/pep-3138/
[2]: http://www.python.org/dev/peps/pep-3138/#specification
[3]: http://www.python.org/dev/peps/pep-3138/#rejected-proposals
[4]: http://www.python.org/dev/peps/pep-3138/#motivation

History
Date	User	Action	Args
2009-01-30 15:26:49	ezio.melotti	set	recipients: + ezio.melotti
2009-01-30 15:26:49	ezio.melotti	set	messageid: <1233329209.55.0.55287436123.issue5110@psf.upfronthosting.co.za>
2009-01-30 15:26:47	ezio.melotti	link	issue5110 messages
2009-01-30 15:26:45	ezio.melotti	create