classification
Title: console w/ cp65001 displays extra characters for non-ascii strings.
Type: behavior Stage:
Components: IO, Unicode, Windows Versions: Python 3.3
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, loewis, metolone, vstinner
Priority: normal Keywords:

Created on 2012-03-08 04:35 by metolone, last changed 2012-04-05 11:46 by vstinner. This issue is now closed.

Messages (3)
msg155149 - (view) Author: Mark Tolonen (metolone) Date: 2012-03-08 04:35
This is on Windows 7 SP1.  Run 'chcp 65001' then Python from a console.  Note the extra characters when non-ASCII characters are in the string.  At a guess it appears to be using the UTF-8 byte length of the internal representation instead of the character count.

Python 3.3.0a1 (default, Mar  4 2012, 17:27:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('hello')
hello
>>> print('p\u012bny\u012bn')
pīnyīn
n
>>> print('\u012b'*10)
īīīīīīīīīī
�īīīī
�ī
msg155158 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-03-08 11:48
See the issue #1602.
msg157568 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-05 11:46
I'm quite sure that this issue is a duplicate of #1602.
History
Date User Action Args
2012-04-05 11:46:33vstinnersetstatus: open -> closed
resolution: duplicate
messages: + msg157568
2012-03-08 11:48:57vstinnersetmessages: + msg155158
2012-03-08 10:28:00ezio.melottisetnosy: + loewis, vstinner
2012-03-08 04:35:32metolonecreate