Message 109702 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	Rhamphoryncus, amaury.forgeotdarc, ezio.melotti, lemburg, loewis, vstinner
Date	2010-07-09.09:49:04
SpamBayes Score	4.413009e-11
Marked as misclassified	No
Message-id	<1278668949.82.0.0997720973536.issue9198@psf.upfronthosting.co.za>
In-reply-to

Content
Here is a patch to "fix" sys_displayhook (note: the patch is just a proof of concept -- it seems to work fine but I still have to clean it up, add comments, rename and reorganize some vars and add tests). This is an example output while using iso-8859-1 as IO encoding: wolf@linuxvm:~/dev/py3k$ PYTHONIOENCODING=iso-8859-1 ./python Python 3.2a0 (py3k:82643:82644M, Jul 9 2010, 11:39:25) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys; sys.stdout.encoding, sys.stdin.encoding ('iso-8859-1', 'iso-8859-1') >>> 'ascii string' 'ascii string' # works fine >>> 'some accented chars: öäå' 'some accented chars: öäå' # works fine - these chars are encodable >>> 'a snowman: \u2603' 'a snowman: \u2603' # non-encodable - the char is escaped instead of raising an error >>> 'snowman: \u2603, and accented öäå' 'snowman: \u2603, and accented öäå' # only non-encodable chars are escaped >>> # the behavior of print is still the same: >>> print('some accented chars: öäå') some accented chars: öäå >>> print('a snowman: \u2603') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'latin-1' codec can't encode character '\u2603' in position 11: ordinal not in range(256) ------------------------------------- While testing the patch with PYTHONIOENCODING=iso-8859-1 I also found this weird issue that however is not related to the patch, since I managed to reproduce on a clean py3k using PYTHONIOENCODING=iso-8859-1: >>> 'òàùèì óáúéí öäüëï' 'ò�\xa0ùèì óáúé�\xad öäüëï' >>> 'òàùèì óáúéí öäüëï'.encode('iso-8859-1') b'\xc3\xb2\xc3\xa0\xc3\xb9\xc3\xa8\xc3\xac \xc3\xb3\xc3\xa1\xc3\xba\xc3\xa9\xc3\xad \xc3\xb6\xc3\xa4\xc3\xbc\xc3\xab\xc3\xaf' >>> 'òàùèì'.encode('utf-8') b'\xc3\x83\xc2\xb2\xc3\x83\xc2\xa0\xc3\x83\xc2\xb9\xc3\x83\xc2\xa8\xc3\x83\xc2\xac' I think there might be some conflict between the IO encoding that I specified and the one that my terminal actually uses, but I couldn't figure out what's going on exactly (it also weird that only 'à' and 'í' are not displayed correctly). Unless this behavior is expected I'll open another issue about it.

Here is a patch to "fix" sys_displayhook (note: the patch is just a proof of concept -- it seems to work fine but I still have to clean it up, add comments, rename and reorganize some vars and add tests).
This is an example output while using iso-8859-1 as IO encoding:

wolf@linuxvm:~/dev/py3k$ PYTHONIOENCODING=iso-8859-1 ./python
Python 3.2a0 (py3k:82643:82644M, Jul  9 2010, 11:39:25)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys; sys.stdout.encoding, sys.stdin.encoding
('iso-8859-1', 'iso-8859-1')
>>> 'ascii string'
'ascii string'  # works fine
>>> 'some accented chars: öäå'
'some accented chars: öäå'  # works fine - these chars are encodable
>>> 'a snowman: \u2603'
'a snowman: \u2603'  # non-encodable - the char is escaped instead of raising an error
>>> 'snowman: \u2603, and accented öäå'
'snowman: \u2603, and accented öäå' # only non-encodable chars are escaped
>>> # the behavior of print is still the same:
>>> print('some accented chars: öäå') 
some accented chars: öäå
>>> print('a snowman: \u2603')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2603' in position 11: ordinal not in range(256)

-------------------------------------

While testing the patch with PYTHONIOENCODING=iso-8859-1 I also found this weird issue that however is *not* related to the patch, since I managed to reproduce on a clean py3k using PYTHONIOENCODING=iso-8859-1:
>>> 'òàùèì  óáúéí  öäüëï'
'ò�\xa0ùèì  óáúé�\xad  öäüëï'
>>> 'òàùèì  óáúéí  öäüëï'.encode('iso-8859-1')
b'\xc3\xb2\xc3\xa0\xc3\xb9\xc3\xa8\xc3\xac  \xc3\xb3\xc3\xa1\xc3\xba\xc3\xa9\xc3\xad  \xc3\xb6\xc3\xa4\xc3\xbc\xc3\xab\xc3\xaf'
>>> 'òàùèì'.encode('utf-8')
b'\xc3\x83\xc2\xb2\xc3\x83\xc2\xa0\xc3\x83\xc2\xb9\xc3\x83\xc2\xa8\xc3\x83\xc2\xac'

I think there might be some conflict between the IO encoding that I specified and the one that my terminal actually uses, but I couldn't figure out what's going on exactly (it also weird that only 'à' and 'í' are not displayed correctly). Unless this behavior is expected I'll open another issue about it.

History
Date	User	Action	Args
2010-07-09 09:49:10	ezio.melotti	set	recipients: + ezio.melotti, lemburg, loewis, amaury.forgeotdarc, Rhamphoryncus, vstinner
2010-07-09 09:49:09	ezio.melotti	set	messageid: <1278668949.82.0.0997720973536.issue9198@psf.upfronthosting.co.za>
2010-07-09 09:49:08	ezio.melotti	link	issue9198 messages
2010-07-09 09:49:06	ezio.melotti	create