classification
Title: windows console doesn't print or input Unicode
Type: behavior Stage: needs patch
Components: Unicode, Windows Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: David.Sankel, Drekin, akira, amaury.forgeotdarc, christian.heimes, christoph, davidsarah, ezio.melotti, flox, giampaolo.rodola, hippietrail, lemburg, mark, mhammond, ncoghlan, pitrou, santa4nt, smerlin, sorin, steve.dower, terry.reedy, tim.golden, tzot, v+python
Priority: high Keywords: patch

Created on 2007-12-12 09:56 by mark, last changed 2014-07-28 12:45 by haypo.

Files
File name Uploaded Description Edit
sys_write_stdout.patch haypo, 2010-11-04 15:15 review
unicode2.py v+python, 2011-01-09 06:52
doc-patch.diff davidsarah, 2011-01-12 05:32 Proposed changes to user-visible documentation review
unicode3.py haypo, 2011-10-19 11:55
win_console.patch haypo, 2011-10-19 20:57 review
test_win_console.py haypo, 2011-10-19 20:58
streams.py Drekin, 2014-07-26 20:33
Messages (95)
msg58487 - (view) Author: Mark Summerfield (mark) Date: 2007-12-12 09:56
I am not sure if this is a Python bug or simply a limitation of cmd.exe.

I am using Windows XP Home.
I run cmd.exe with the /u option and I have set my console font to
"Lucida Console" (the only TrueType font offered), and I run chcp 65001
to set the utf8 code page.
When I run the following program:

for x in range(32, 2000):
    print("{0:5X} {0:c}".format(x))

one blank line is output.

But if I do chcp 1252 the program prints up to 7F before hitting a
unicode encoding error.

This is different behaviour from Python 2.5.1 which (with a suitably
modified print line) after chcp 65001 prints up to 7F and then fails
with "IOError: [Errno 0] Error".
msg58621 - (view) Author: Mark Summerfield (mark) Date: 2007-12-14 11:31
I've looked into this a bit more, and from what I can see, code page
65001 just doesn't work---so it is a Windows problem not a Python problem.
A possible solution might be to read/write UTF16 which "managed" Windows
applications can do.
msg58651 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-15 02:08
We are aware of multiple Windows related problems. We are planing to
rewrite parts of the Windows specific API to use the widechar variants.
Maybe that will help.
msg87086 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-03 23:57
Yes, it is a Windows problem. There simply doesn't seem to be a true
Unicode codepage for command-line apps. Recommend closing.
msg88059 - (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) * Date: 2009-05-19 00:08
Just in case it helps, this behaviour is on Win XP Pro, Python 2.5.1:

First, I added an alias for 'cp65001' to 'utf_8' in
Lib/encodings/aliases.py .

Then, I opened a command prompt with a bitmap font.

c:\windows\system32>python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print u"\N{EM DASH}"
—

I switched the font to Lucida Console, and retried (without exiting the
python interpreter, although the behaviour is the same when exiting and
entering again: )

>>> print u"\N{EM DASH}"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 13] Permission denied

Then I tried (by pressing Alt+0233 for é, which is invalid in my normal
cp1253 codepage):

>>> print u"née"

and the interpreter exits without any information. So it does for:

>>> a=u"née"

Then I created a UTF-8 text file named 'test65001.py':

# -*- coding: utf_8 -*-
a=u"néeα"
print a

and tried to run it directly from the command line:

c:\windows\system32>python d:\src\PYTHON\test65001.py
néeαTraceback (most recent call last):
  File "d:\src\PYTHON\test65001.py", line 4, in <module>
    print a
IOError: [Errno 2] No such file or directory

You see? It printed all the characters before failing.

Also the following works:

c:\windows\system32>echo heéε
heéε

and

c:\windows\system32>echo heéε >D:\src\PYTHON\dummy.txt

creates successfully a UTF-8 file (without any UTF-8 BOM marks at the
beginning).

So it's possible that it is a python bug, or at least something can be
done about it.
msg88077 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-05-19 09:46
an immediate thing to do is to declare cp65001 as an encoding:

Index: Lib/encodings/aliases.py
===================================================================
--- Lib/encodings/aliases.py    (revision 72757)
+++ Lib/encodings/aliases.py    (working copy)
@@ -511,6 +511,7 @@
     'utf8'               : 'utf_8',
     'utf8_ucs2'          : 'utf_8',
     'utf8_ucs4'          : 'utf_8',
+    'cp65001'            : 'utf_8',

     ## uu_codec codec
     #'uu'                 : 'uu_codec',

This is not enough unfortunately, because the win32 API function
WriteFile() returns the number of characters written, not the number of
(utf8) bytes:

>>> print("\u0124\u0102" + 'abc')
ĤĂabc
c
[44420 refs]
>>>

Additionally, there is a bug in the ReadFile, which returns an empty
string (and no error) when a non-ascii character is entered, which is
the behavior of an EOF condition...

Maybe the solution is to use the win32 console API directly...
msg92854 - (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) * Date: 2009-09-19 00:38
Another note:
if one creates a dummy Stream object (having a softspace attribute and a
write method that writes using os.write, as in
http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462
) to replace sys.stdout and sys.stderr, then writes occur correctly,
without issues. Pre-requisites:
chcp 65001, Lucida Console font and cp65001 as an alias for UTF-8 in
encodings/aliases.py
This is Python 2.5.4 on Windows.
msg94445 - (view) Author: Glenn Linderman (v+python) Date: 2009-10-25 00:06
With Python 3.1.1, the following batch file seems to be necessary to use
UTF-8 successfully from an XP console:

set PYTHONIOENCODING=UTF-8
cmd /u /k chcp 65001
set PYTHONIOENCODING=
exit

the cmd line seems to be necessary because of Windows having
compatibility issues, but it seems that Python should notice the cp65001
and not need the PYTHONIOENCODING stuff.
msg94480 - (view) Author: Mark Summerfield (mark) Date: 2009-10-26 09:07
Glenn Linderman's fix pretty well works for me on XP Home. I can print
every Unicode character up to and including U+D7FF (although most just
come out as rectangles, at least I don't get encoding errors).

It fails at U+D800 with message:

UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in
position 17: surrogates not allowed

I also tried U+D801 and got the same error.

Nonetheless, this is *much* better than before.
msg94483 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-10-26 09:19
Mark Summerfield wrote:
> 
> Mark Summerfield <mark@qtrac.eu> added the comment:
> 
> Glenn Linderman's fix pretty well works for me on XP Home. I can print
> every Unicode character up to and including U+D7FF (although most just
> come out as rectangles, at least I don't get encoding errors).
> 
> It fails at U+D800 with message:
> 
> UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in
> position 17: surrogates not allowed
> 
> I also tried U+D801 and got the same error.

That's normal and expected: D800 is the start of the surrogate
ranges which are only allows in pairs in UTF-8.
msg94496 - (view) Author: Glenn Linderman (v+python) Date: 2009-10-26 17:06
The choice of the Lucida Consola or the Consolas font cures most of the
rectangle problems.  Those are just a limitation of the selected font
for the console window.
msg108173 - (view) Author: Christoph Burgmer (christoph) Date: 2010-06-19 12:04
Will this bug be tackled or Python2.7?

And is there a way to get hold of the access denied error?

Here are my steps to reproduce:

I started the console with "cmd /u /k chcp 65001"
_______________________________________________________________________
Aktive Codepage: 65001.

C:\Dokumente und Einstellungen\root>set PYTHONIOENCODING=UTF-8

C:\Dokumente und Einstellungen\root>d:

D:\>cd Python31

D:\Python31>python
Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("\u573a")
场
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 13] Permission denied
>>>
_______________________________________________________________________

I see a rectangle on screen but obviously c&p works.
msg108228 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-06-20 09:00
> Maybe the solution is to use the win32 console API directly...

Yes, it is the best solution because it avoids the horrible mbcs encoding.

About cp65001: it is not *exactly* the same encoding than utf-8 and so it cannot be used as an alias to utf-8: see issue #6058.
msg116801 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-09-18 15:39
@Brian/Tim what's your take on this?
msg120414 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-11-04 15:09
I wrote a small function to call WriteConsoleOutputA() and  WriteConsoleOutputW() in Python to do some tests. It works correclty, except if I change the code page using chcp command. It looks like the problem is that the chcp command changes the console code page and the ANSI code page, but it should only changes the ANSI code page (and not the console code page).


chcp command
============

The chcp command changes the console code page, but in practice, the console still expects the OEM code page (eg. cp850 on my french setup). Example:

C:\...> python.exe -c "import sys; print(sys.stdout.encoding")
cp850
C:\...> chcp 65001
C:\...> python.exe
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp65001
C:\...> SET PYTHONIOENCODING=utf-8
C:\...> python.exe
>>> import sys
>>> sys.stdout.write("\xe9\n")
é
2
>>> sys.stdout.buffer.write("\xe9\n".encode("utf8"))
é
3
>>> sys.stdout.buffer.write("\xe9\n".encode("cp850"))
é
2

os.device_encoding(1) uses GetConsoleOutputCP() which gives 65001. It should maybe use GetOEMCP() instead? Or chcp command should be fixed?

Set the console code page looks to be a bad idea, because if I type "é" using my keyboard, a random character (eg. U+0002) is displayed instead...


WriteConsoleOutputA() and WriteConsoleOutputW()
===============================================

Without touching the code page
------------------------------

If the character can be rendered by the current font (eg. U+00E9): WriteConsoleOutputA() and WriteConsoleOutputW() work correctly.

If the character cannot be rendered by the current font, but there is a replacment character (eg. U+0141 replaced by U+0041): WriteConsoleOutputA() cannot be used (U+0141 cannot be encoded to the code page), WriteConsoleOutputW() writes U+0141 but the console contains U+0041 (I checked using ReadConsoleOutputW()) and U+0041 is displayed. It works like the mbcs encoding, the behaviour looks correct.

If the character cannot be rendered by the current font, but there is a replacment character (eg. U+042D): WriteConsoleOutputA() cannot be used (U+042D cannot be encoded to the code page), WriteConsoleOutputW() writes U+042D but U+003d (?) is displayed instead. The behaviour looks correct.

chcp 65001
----------

Using "chcp 65001" command (+ "set PYTHONIOENCODING=utf-8" to avoid the fatal error), it becomes worse: the result depends on the font...

Using raster font:
 - (ANSI) write "\xe9".encode("cp850") using WriteConsoleOutputA() displays U+00e9 (é), whereas the console output code page is cp65001 (I checked using GetConsoleOutputCP())
 - (ANSI) write "\xe9".encode("utf-8") using WriteConsoleOutputA() displays é (mojibake!)
 - (UNICODE) write "\xe9" using WriteConsoleOutputW() displays... a random character (U+0002, U+0008, U+0069, U+00b0, ...)

Using Lucida (TrueType font): 
 - (ANSI) write "\xe9".encode("cp850") using WriteConsoleOutputA() displays U+0000 !?
 - (UNICODE) write "\xe9" using WriteConsoleOutputW() works correctly (display U+00e9), even with "\u0141", it works correctly (display U+0141)
msg120415 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-11-04 15:15
sys_write_stdtout.patch: Create sys.write_stdout() test function to call WriteConsoleOutputA() or WriteConsoleOutputW() depending on the input types (bytes or str).
msg120416 - (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) * Date: 2010-11-04 15:22
http://blogs.msdn.com/b/michkap/archive/2008/03/18/8306597.aspx

If you want any kind of Unicode output in the console, the font must be an “official” MS console TTF (“official” as defined by the Windows version); I believe only Lucida Console and Consolas are the ones with all MS private settings turned on inside the font file.
msg120700 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-11-08 01:26
I don't understand exactly the goal of this issue. Different people described various bugs of the Windows console, but I don't see any problem with Python here. It looks like it's just not possible to display correctly unicode with the Windows console (the whole unicode charset, not the current code page subset).

- 65001 code page: it's not the same encoding than utf-8 and so it cannot be set as an alias to utf-8 (see #6058) => nothing to do, or maybe document that PYTHONIOENCODING=utf-8 workaround... But if you do that, you may get strange errors when writing to stdout or stderr like "IOError: [Errno 13] Permission denied" or "IOError: [Errno 2] No such file or directory" ...
- chcp command sets the console encoding, which is stupid because the console still expects text encoded to the previous code page => Windows (chcp command) bug, chcp command should not be used (it doesn't solve any problem, it just makes the situation worse)
- use the console API instead of read()/write() to fix this issue: it doesn't work, the console is completly buggy (msg120414) => Windows (console) bug
- use "Lucida Console" font avoids some issue => I don't think that the Python interpreter should configure the console (using SetCurrentConsoleFontEx?), it's not the role of Python

To me, there is nothing to do, and so I close the bug.

If you would like to fix a particular Python bug, open a new specific issue. If you consider that I'm wrong, Python should fix this issue and you know how, please reopen it.
msg125823 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-09 05:32
It is certainly possible to write Unicode to the console successfully using WriteConsoleW. This works regardless of the console code page, including 65001. The code <a href="http://tahoe-lafs.org/trac/tahoe-lafs/browser/src/allmydata/windows/fixups.py">here</a> does so (it's for Python 2.x, but you'd be calling WriteConsoleW from C anyway).

WriteConsoleW has one bug that I know of, which is that it <a href="http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232">fails when writing more than 26608 characters at once</a>. That's easy to work around by limiting the amount of data passed in a single call.

Fonts are not Python's problem, but encoding is. It doesn't make sense to fail to output the right characters just because some users might not have selected fonts that can display those characters. This bug should be reopened.

(For completeness, it is possible to display Unicode on the console using fonts other than Lucida Console and Consolas, but it <a href="http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/3259271#3259271">requires a registry hack</a>.)
msg125824 - (view) Author: Glenn Linderman (v+python) Date: 2011-01-09 06:52
Interesting!

I was able to tweak David-Sarah's code to work with Python 3.x, mostly doing things that 2to3 would probably do: changing  unicode() to str(), dropping u from u'...', etc.

I skipped the unmangling of command-line arguments, because it produced an error I didn't understand, about needing a buffer protocol.  But I'll attach David-Sarah's code + tweaks + a test case showing output of the Cyrillic alphabet to a console with code page 437 (at least, on my Win7-64 box, that is what it is).

Nice work, David-Sarah.  I'm quite sure this is not in a form usable inside Python 3, but it shows exactly what could be done inside Python 3 to make things work... and gives us a workaround if Python 3 is not fixed.
msg125826 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-09 07:28
Glenn Linderman wrote:
> I skipped the unmangling of command-line arguments, because it produced an error I didn't understand, about needing a buffer protocol.

If I understand correctly, that part isn't needed on Python 3 because issue2128 is already fixed there.
msg125833 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-01-09 09:03
> It is certainly possible to write Unicode to the console 
> successfully using WriteConsoleW

Did you tried with characters not encodable to the code page and with character that cannot be rendeded by the font?

See msg120414 for my tests with WriteConsoleOutputW.
msg125852 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-09 19:23
haypo wrote:
> davidsarah wrote:
>> It is certainly possible to write Unicode to the console 
>> successfully using WriteConsoleW
>
> Did you tried with characters not encodable to the code page and with character that cannot be rendeded by the font?

Yes, characters not encodable to the code page do work (as confirmed by Glenn Linderman, since code page 437 does not include Cyrillic).

Characters that cannot be rendered by the font print as missing-glyph boxes, as expected. They don't cause any other problem, and they can be cut-and-pasted to other Unicode-aware applications, showing up as the original characters.

> See msg120414 for my tests with WriteConsoleOutputW

Even if it handled encoding correctly, WriteConsoleOutputW (http://msdn.microsoft.com/en-us/library/ms687404%28v=vs.85%29.aspx) would not be the right API to use in any case, because it prints to a rectangle of characters without scrolling. WriteConsoleW does scroll in the same way that printing to a console output stream normally would. (Redirection to a non-console stream can be detected and handled differently, as the code in unicode2.py does.)
msg125877 - (view) Author: Glenn Linderman (v+python) Date: 2011-01-10 02:27
I would certainly be delighted if someone would reopen this issue, and figure out how to translate unicode2.py to Python internals so that Python's console I/O on Windows would support Unicode "out of the box".

Otherwise, I'll have to include the equivalent of unicode2.py in all my Python programs, because right now, I'm including instructions for the use to (1) choose Lucida or Consolas font if they can't figure out any other font that gets rid of the square boxes (2) chcp 65001 (3) set PYTHONIOENCODING=UTF-8

Having this capability inside Python (or my programs) will enable me to eliminate two-thirds of the geeky instructions for my users.  But it seems like a very appropriate capability to have within Python, especially Python 3.x with its preference and support Unicode in so many other ways.
msg125889 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-10 09:50
I'll have a look at the Py3k I/O internals and see what I can do.
(Reopening a bug appears to need Coordinator permissions.)
msg125890 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2011-01-10 10:05
Reopening as there seems to be some possibility of progress
msg125898 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-01-10 11:50
The script unicode2.py uses the console STD_OUTPUT_HANDLE iff sys.stdout.fileno()==1.
But is it always the case? What about pythonw.exe? 
Also some applications may redirect fd=1: I'm sure that py.test does this http://pytest.org/capture.html#setting-capturing-methods-or-disabling-capturing and IIRC Apache also redirects file descriptors.
msg125899 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-01-10 12:33
amaury> The script unicode2.py uses the console STD_OUTPUT_HANDLE iff
amaury> sys.stdout.fileno()==1

Interesting article about the Windows console:
http://blogs.msdn.com/b/michkap/archive/2008/03/18/8306597.aspx

There is an example which has many tests to check that stdout is the windows 
console (and not a pipe or something else).
msg125938 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-10 22:15
> The script unicode2.py uses the console STD_OUTPUT_HANDLE iff sys.stdout.fileno()==1.

You may have missed "if not_a_console(hStdout): real_stdout = False".
not_a_console uses GetFileType and GetConsoleMode to check whether that handle is directed to something other than a console.

> But is it always the case?

The technique used here for detecting a console is almost the same as the code for IsConsoleRedirected at http://blogs.msdn.com/b/michkap/archive/2010/05/07/10008232.aspx , or in WriteLineRight at http://blogs.msdn.com/b/michkap/archive/2010/04/07/9989346.aspx (I got it from that blog, can't remember exactly which page).

[This code will give a false positive in the strange corner case that stdout/stderr is redirected to a console *input* handle. It might be better to use GetConsoleScreenBufferInfo instead of GetConsoleMode, as suggested by http://stackoverflow.com/questions/3648711/detect-nul-file-descriptor-isatty-is-bogus/3650507#3650507 .]

> What about pythonw.exe?

I just tested that, using pythonw run from cmd.exe with stdout redirected to a file; it works as intended. It also works (for both console and non-console cases) when the handles are inherited from a parent process.

Incidentally, what's the earliest supported Windows version for Py3k? I see that http://www.python.org/download/windows/ mentions Windows ME. I can fairly easily make it fall back to never using WriteConsoleW on Windows ME, if that's necessary.
msg125942 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-10 22:21
Note: Michael Kaplan's code checks whether GetConsoleMode failed due to ERROR_INVALID_HANDLE. My code intentionally doesn't do that, because it is correct and conservative to fall back to the non-console behaviour when there is *any* error from GetConsoleMode. (It could also fail due to not having the GENERIC_READ right on the handle, for example.)
msg125947 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-01-10 22:47
Even if python.exe starts normally, py.test for example uses os.dup2() to redirect the file descriptors 1 and 2 to temporary files. sys.stdout.fileno() is still 1, the STD_OUTPUT_HANDLE did not change, but normal print() now goes to a file; but the proposed script won't detect this and will write to the console...
Somehow we should extract the file handle from the file descriptor, with a call to _get_osfhandle() for example.
msg125956 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-10 23:38
"... os.dup2() ..."

Good point, thanks.

It would work to change os.dup2 so that if its second argument is 0, 1, or 2, it calls _get_osfhandle to get the Windows handle for that fd, and then reruns the console-detection logic. That would even allow Unicode output to work after redirection to a different console.

Programs that directly called the CRT dup2 or SetStdHandle would bypass this. Can we consider such programs to be broken? Methinks a documentation patch for os.dup2 would be sufficient, something like:

"When fd1 refers to the standard input, output, or error handles (0, 1 and 2 respectively), this function also ensures that state associated with Python's initial sys.{stdin,stdout,stderr} streams is correctly updated if needed. It should therefore be used in preference to calling the C library's dup2, or similar APIs such as SetStdHandle on Windows."
msg126286 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-14 19:00
http://www.python.org/dev/peps/pep-0011/ says

Name:             Win9x, WinME, NT4
    Unsupported in:   Python 2.6 (warning in 2.5 installer)
    Code removed in:  Python 2.6

Only xp+ now. email sent to webmaster@...

Even if the best fix only applies to win7, please include it.
msg126288 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2011-01-14 19:06
I think we even agreed to drop 2000, although the PEP hasn't been updated and I couldn't find the supposed email where this was said.

For implementing functionality that isn't supported on all Windows versions or architectures, you can look at PC/winreg.c for a few examples. DisableReflectionKey is a good example off the top of my head.
msg126303 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-01-14 23:31
Here are some results of my test of unicode2.py. I'm testing py3k on Windows XP, OEM: cp850, ANSI: cp1252.

Raster fonts
------------

With a fresh console, unicode2.py displays "?????????????????". input() accepts characters encodable to the OEM code page.

If I set the code page to 65001 (chcp program+set PYTHONIOENCODING=utf-8; or SetConsoleCP() + SetConsoleOutputCP()), it displays weird characters. input() accepts ASCII characters, but non-ASCII characters (encodable to the console and OEM code pages) display weird characters (smileys! control characters?).

Lucida console
--------------

With my system code page (OEM: cp850), characters not encodable to the code pages are displayed correctly. I can type some non-ASCII characters (encodable to the code page). If I copy/paste characters non encodable to the code page, there are replaced by similar glyph (eg. Ł => L) or ? (€ => ?).

If I set the code page to 65001, all characters are still correctly displayed. But I cannot type non-ASCII characters anymore: input() fails with EOFError (I suppose that Python gets control characters).

Redirect output to a pipe
-------------------------

I patched unicode2.py to use sys.stdout.buffer instead of sys.stdout for UnicodeOutput stream. I also patched UnicodeOutput to replace \n by \r\n. 

It works correctly with any character. No UTF-8 BOM is written. But "Here 1" is written at the end. I suppose that sys.stdout should be flushed before the creation of UnicodeOutput.

But it always use UTF-8. I don't know if UTF-8 is well supported by any application on Windows.

Without unicode2.py, only characters encodable to OEM code page are supported, and \n is used as end of line string.

Let's try to summarize
----------------------

Tests:
 d1) Display characters encodable to the console code page
 t1) Type characters encodable to the console code page
 d2) Display characters not encodable to any code page
 t2) Type characters not encodable to any code page

I'm using Windows with OEM=cp850 and ANSI=cp1252. For test (t2), I copy €-Ł and paste it to the console (right click on the window title > Edit > Paste).

Raster fonts, console=cp850:

d1) ok
t1) ok
d2) FAIL: €-Ł is displayed ?-L
t2) FAIL: €-Ł is read as ?-L

Raster fonts, console=cp65001:

d1) FAIL: é is displayed as 2 strange glyphs
t1) FAIL: EOFError
d2) FAIL: only display unreadable glyphs
t2) FAIL: EOFError

Lucida console, console=cp850:

d1) ok
t1) ok
d2) ok
t2) FAIL: €-Ł is read as ?-L

Lucida console, console=cp65001:

d1) ok
t1) FAIL: EOFError
d2) ok
t2) FAIL: EOFError

So, setting the console code page to 65001 doesn't solve any issue, but it breaks the input (input with the keyboard or pasting text).

With Raster fonts or Lucida console, it's possible to display characters encodable to the code page. But it is not new, it's already possible with Python 3. But for characters not encodable to the code page, it works with unicode2.py and Lucida console, with is something new :-)

For the input, I suppose that we need also to use a Windows console function, to support unencodable characters.
msg126304 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-01-14 23:38
> ..., because right now, I'm including instructions for the use to 
> (1) choose Lucida or Consolas font if they can't figure out 
>     any other font that gets rid of the square boxes 
> (2) chcp 65001 
> (3) set PYTHONIOENCODING=UTF-8

Why do you set the code page to 65001? In all my tests (on Windows XP), it always break the standard input.
msg126308 - (view) Author: Glenn Linderman (v+python) Date: 2011-01-15 01:46
Victor said:
Why do you set the code page to 65001? In all my tests (on Windows XP), it always break the standard input.

My response:
Because when I searched Windows for Unicode and/or UTF-8 stuff, I found 65001, and it seems like it might help, and it does a bit.  And then I find PYTHONIOENCODING, and that helps some.  And that got me something that works better enough than what I had before, so I quit searching.

You did a better job of analyzing and testing all the cases.  I will have to go subtract the 65001 part, and confirm your results, maybe it is useless now that other pieces of the puzzle are in place.  Certainly with David-Sarah's code it seems to not be needed, whether it was a necessary part of the previous workaround I am not sure, because of the limited number of cases I tried (trying to find something that worked well enough, but not having enough knowledge to find David-Sarah's solution, nor a good enough testing methodology to try the pieces independently.

Thank your for your interest in this issue.
msg126319 - (view) Author: sorin (sorin) Date: 2011-01-15 08:51
remeber that cp65001 cannot be set on windows. Also please read http://blogs.msdn.com/b/michkap/archive/2010/10/07/10072032.aspx and contact the author, Michael Kaplan from Microsoft, if you have more questions. I'm sure he will be glad to help.
msg127782 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-02-03 02:47
Feedback from Julie Solon of Microsoft:

> These console functions share a per-process heap that is 64K. There is some overhead, the heap can get fragmented, and calls from multiple threads all affect how much is available for this buffer. 

> I am working to update the documentation for this function [WriteConsoleW] and other affected functions with information along these lines, and will post it within the next week or two.

I replied thanking her and asking for clarification:

When you say that the heap can get fragmented, is this true only when
there are concurrent calls to the console functions, or can it occur
even with single-threaded use? I'm trying to determine whether acquiring
a process-global lock while calling these functions would be sufficient
to ensure that the available heap space will not be unexpectedly low.
(This assumes that the functions not used outside the lock by other
libraries in the same process.)

ReadConsoleW seems also to be affected, incidentally.

I've asked for clarification about whether acquiring a process-global lock when using these functions ...
Julie
msg131657 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-03-21 14:25
I did some tests with WriteConsoleW():
 - with raster fonts, U+00E9 is displayed as é, U+0141 as L and U+042D as ? => good (work as expected)
 - with TrueType font (Lucida), U+00E9 is displayed as é, U+0141 as Ł and U+042D as Э => perfect! (all characters are rendered correctly)

Now I agree that WriteConsoleW() is the best solution to fix this issue.

My test code (added to Python/sysmodule.c):
---------
static PyObject *
sys_write_stdout(PyObject *self, PyObject *args)
{
    PyObject *textobj;
    wchar_t *text;
    DWORD written, total;
    Py_ssize_t len, chunk;
    HANDLE console;
    BOOL ok;

    if (!PyArg_ParseTuple(args, "U:write_stdout", &textobj))
        return NULL;

    console = GetStdHandle(STD_OUTPUT_HANDLE);
    if (console == INVALID_HANDLE_VALUE) {
        PyErr_SetFromWindowsErr(GetLastError());
        return NULL;
    }

    text = PyUnicode_AS_UNICODE(textobj);
    len = PyUnicode_GET_SIZE(textobj);
    total = 0;
    while (len != 0) {
        if (len > 10000)
            /* WriteConsoleW() is limited to 64 KB (32,768 UTF-16 units), but
               this limit depends on the heap usage. Use a safe limit of 10,000
               UTF-16 units.
               http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232 */
            chunk = 10000;
        else
            chunk = len;
        ok = WriteConsoleW(console, text, chunk, &written, NULL);
        if (!ok) 
            break;
        text += written;
        len -= written;
        total += written;
    }
    return PyLong_FromUnsignedLong(total);
}
---------


The question is now how to integrate WriteConsoleW() into Python without breaking the API, for example:
 - Should sys.stdout be a TextIOWrapper or not?
 - Should sys.stdout.fileno() returns 1 or raise an error?
 - What about sys.stdout.buffer: should sys.stdout.buffer.write() calls WriteConsoleA() or sys.stdout should not have a buffer attribute? I think that many modules and programs now rely on sys.stdout.buffer to write directly bytes into stdout. There is at least python -m base64.
 - Should we use ReadConsoleW() for stdin?
msg131854 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-23 04:54
(For anyone wondering about the hold-up on this bug, I ended up switching to Ubuntu. Not to worry, I now have Python 3 building in XP under VirtualBox -- which is further than I ever got with my broken Vista install :-/ It seems to behave identically to native XP as far as this bug is concerned.)

Victor STINNER wrote:
> The question is now how to integrate WriteConsoleW() into Python without breaking the API, for example:
> - Should sys.stdout be a TextIOWrapper or not?

It pretty much has to be a TextIOWrapper for compatibility. Also it's easier to implement it that way, because the text stream object has to be able to fall back to using the buffer if the fd is redirected.

> - Should sys.stdout.fileno() returns 1 or raise an error?

Return sys.stdout.buffer.fileno(), which is 1 unless redirected.

This is the Right Thing because in Windows, fds are an abstraction of the C runtime library, and the C runtime allows an fd to be associated with a console. In that case, from the application's point of view it is still writing to the same fd. In fact, we'd be implementing this by calling the WriteConsoleW win32 API directly in order to avoid bugs in the CRT's Unicode support, but that's an implementation detail.

> - What about sys.stdout.buffer: should sys.stdout.buffer.write() calls WriteConsoleA() or sys.stdout should not have a buffer attribute?

I was thinking that sys.std{out,err}.buffer would still be set up exactly as they are now. Then if an app writes to that buffer, it will get interleaved with any writes via the text stream. (The writes to the buffer go to the underlying fd, which probably ends up calling WriteFile at the win32 level.)

> I think that many modules and programs now rely on sys.stdout.buffer to write directly bytes into stdout. There is at least python -m base64.

That would just work. The only caveat would be that if you write a partial line to the buffer object (or if you set the buffer object to be fully buffered and write to it), and then write to the text stream, the buffer wouldn't be flushed before the text is written. I think that is fine as long as it is documented.

If an app sets the .buffer attribute of sys.std{out,err}, it would fall back to using that buffer in the same way as when the fd is redirected.

> - Should we use ReadConsoleW() for stdin?

Yes. I'll probably start with a patch that just handles std{out,err}, though.
msg132060 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-25 00:39
I wrote:
> The only caveat would be that if you write a partial line to the buffer object (or if you set the buffer object to be fully buffered and write to it), and then write to the text stream, the buffer wouldn't be flushed before the text is written.

Actually it looks like that already happens (because the sys.std{out,err} TextIOWrappers are line-buffered separately to their underlying buffers), so it would not be an incompatibility:

$ python3 -c 'import sys; sys.stdout.write("foo"); sys.stdout.buffer.write(b"bar"); sys.stdout.write("baz\n")'
barfoobaz
msg132061 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-25 00:54
I wrote:
$ python3 -c 'import sys; sys.stdout.write("foo"); sys.stdout.buffer.write(b"bar"); sys.stdout.write("baz\n")'
barfoobaz

Hmm, the behaviour actually would differ here: the proposed implementation would print

foobaz
bar

(the "foobaz\n" is written by a call to WriteConsoleW and then the "bar" gets flushed to stdout when the process exits).

But since the naive expectation is "foobarbaz\n" and you already have to flush after each call in order to get that, I think this change in behaviour would be unlikely to affect correct applications.
msg132062 - (view) Author: Glenn Linderman (v+python) Date: 2011-03-25 00:59
Presently, a correct application only needs to flush between a sequence of writes and a sequence of buffer.writes.

Don't assume the flush happens after every write, for a correct application.
msg132064 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-25 01:21
Glenn Linderman wrote:
> Presently, a correct application only needs to flush between a sequence of writes and a sequence of buffer.writes.

Right. The new requirement would be that a correct app also needs to flush between a sequence of buffer.writes (that end in an incomplete line, or always if PYTHONUNBUFFERED or python -u is used), and a sequence of writes.

> Don't assume the flush happens after every write, for a correct application.

It's rather hard to implement this without any change in behaviour. Or rather, it isn't hard if the TextIOWrapper were to flush its underlying buffer before each time it writes to the console, but I'd be concerned about the extra overhead of that call. I'd prefer not to do that unless the new requirement above leads to incompatibilities in practice.
msg132065 - (view) Author: Glenn Linderman (v+python) Date: 2011-03-25 01:30
Would it suffice if the new scheme internally flushed after every buffer.write?  It wouldn't be needed after write, because the correct application would already do one there?

Am I off-base in supposing that the performance of buffer.write is expected to include a flush (because it isn't expected to be buffered)?
msg132067 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-03-25 02:12
Le vendredi 25 mars 2011 à 00:54 +0000, David-Sarah Hopwood a écrit :
> David-Sarah Hopwood <david-sarah@jacaranda.org> added the comment:
> 
> I wrote:
> $ python3 -c 'import sys; sys.stdout.write("foo");
> sys.stdout.buffer.write(b"bar"); sys.stdout.write("baz\n")'
> barfoobaz
> 
> Hmm, the behaviour actually would differ here: the proposed
> implementation would print
> 
> foobaz
> bar
> 
> (the "foobaz\n" is written by a call to WriteConsoleW and then the
> "bar" gets flushed to stdout when the process exits).
> 
> But since the naive expectation is "foobarbaz\n" and you already have
> to flush after each call in order to get that, I think this change in
> behaviour would be unlikely to affect correct applications.

I would not call this "naive". "foobaz\nbar" is really weird. I think
that sys.stdout and sys.stdout.buffer will both have to flush after each
write, or they may be desynchronized.

Some developers already think that adding sys.stdout.flush() after
print("Processing.. ", end='') is too hard (#11633). So I cannot imagine
how they would react if they will have to do it explicitly after all
print, sys.stdout.write() and sys.stdout.buffer.write().
msg132184 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-25 23:37
First a minor correction:
> The new requirement would be that a correct app also needs to flush between a sequence of buffer.writes (that end in an incomplete line, or always if PYTHONUNBUFFERED or python -u is used), and a sequence of writes.

That should be "and only if PYTHONUNBUFFERED or python -u is not used".

I also said:
> If an app sets the .buffer attribute of sys.std{out,err}, it would fall back to using that buffer in the same way as when the fd is redirected.

but the .buffer attribute is readonly, so this case can't occur.

Glenn Linderman wrote:
> Would it suffice if the new scheme internally flushed after every buffer.write?  It wouldn't be needed after write, because the correct application would already do one there?

Yes, that would be sufficient.

> Am I off-base in supposing that the performance of buffer.write is expected to include a flush (because it isn't expected to be buffered)?

It is expected to be line-buffered. So an app might expect that printing characters one-at-a-time will have reasonable performance.

In any case, given that the buffer of the initial std{out,err} will always be a BufferedWriter object (since .buffer is readonly), it would be possible for the TextIOWriter to test a dirty flag in the BufferedWriter, in order to check efficiently whether the buffer needs flushing on each write. I've looked at the implementation complexity cost of this, and it doesn't seem too bad.

A similar issue arises for stdin: to maintain strict compatibility, every read from a TextIOWrapper attached to an input console would have to drain the buffer of its buffer object, in case the app has read from it. This is a bit tricky because the bytes drained from the buffer have to be converted to Unicode, so what happens if they end part-way through a multibyte character? Ugh, I'll have to think about that one.

Victor STINNER wrote:
> Some developers already think that adding sys.stdout.flush() after
print("Processing.. ", end='') is too hard (#11633).

IIUC, that bug is about the behaviour of 'print', and didn't suggest to change the fact that sys.stdout is line-buffered.


By the way, are these changes going to be in a major release? If I understand correctly, the layout of structs (for standard library types not prefixed with '_', such as 'buffered' in bufferedio.c or 'textio' in textio.c) can change with major releases but not with minor releases, correct?
msg132191 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-26 00:18
I wrote:
> A similar issue arises for stdin: to maintain strict compatibility, every read from a TextIOWrapper attached to an input console would have to drain the buffer of its buffer object, in case the app has read from it. This is a bit tricky because the bytes drained from the buffer have to be converted to Unicode, so what happens if they end part-way through a multibyte character? Ugh, I'll have to think about that one.

It seems like there is no correct way for an app to read from both sys.stdin, and sys.stdin.buffer (even without these console changes). It must choose one or the other.
msg132208 - (view) Author: Glenn Linderman (v+python) Date: 2011-03-26 01:45
David-Sarah said:
In any case, given that the buffer of the initial std{out,err} will always be a BufferedWriter object (since .buffer is readonly), it would be possible for the TextIOWriter to test a dirty flag in the BufferedWriter, in order to check efficiently whether the buffer needs flushing on each write. I've looked at the implementation complexity cost of this, and it doesn't seem too bad.

So if flush checks that bit, maybe TextIOWriter could just call buffer.flush, and it would be fast if clean and slow if dirty?  Calling it at the beginning of a Text level write, that is, which would let the char-at-a-time calls to buffer.write be fast.

And I totally agree with msg132191
msg132266 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-26 19:22
Glenn wrote:
> So if flush checks that bit, maybe TextIOWriter could just call buffer.flush, and it would be fast if clean and slow if dirty?

Yes. I'll benchmark how much overhead is added by the calls to flush; there's no point in breaking the abstraction boundary of BufferedWriter if it doesn't give a significant performance benefit. (I suspect that it might not, because Windows is very slow at scrolling a console, which might make the cost of flushing insignificant in comparison.)
msg132268 - (view) Author: Glenn Linderman (v+python) Date: 2011-03-26 19:27
David-Sarah wrote:
Windows is very slow at scrolling a console, which might make the cost of flushing insignificant in comparison.)

Just for the record, I noticed a huge speedup in Windows console scrolling when I switched from WinXP to Win7 on a faster computer :)
How much is due to the XP->7 switch and how much to the faster computer, I cannot say, but it seemed much more significant than other speedups in other software.  The point?  Benchmark it on Win7, not XP.
msg145898 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-10-19 11:52
I done more tests on the Windows console. I focused my tests on output.

To sum up, if we implement sys.stdout using WriteConsoleW() and sys.stdout.buffer.raw using WriteConsoleA():

 - print() will not fail anymore on unencodable characters, because the string is no longer encoded to the console code page
 - if you set the console font to a TrueType font, most characters will be displayed correctly
 - you don't need to change the (console) code page to CP_UTF8 (65001) anymore if you just use print()
 - you still need cp65001 if the output (stdout and/or stderr) is redirected or if you use directly sys.stdout.buffer or sys.stderr.buffer

Other facts:

 - locale.getpreferredencoding() returns the ANSI code page
 - sys.stdin.encoding is the console encoding (GetConsoleCP())
 - sys.stdout.encoding and sys.stderr.encoding are the console output code page (GetConsoleOutputCP())
 - sys.stdout is not a TTY if the output is redirect, e.g. "python script.py|more"
 - sys.stderr is not a TTY if the output is redirect, e.g. "python script.py 2>&1|more" (this example redirects stdout and stderr, I don't know how to redirect only stderr)
 - WriteConsoleW() is not affected by the console output code page (GetConsoleOutputCP)
 - WriteConsoleA() is indirectly affected by the console output code page: if a string cannot be encoded to the console output code page (e.g. sys.stdout.encoding), you cannot call WriteConsoleA with the result...
 - If the console font is a raster font and and the font doesn't contain a character, the console tries to find a similar glyph, or it falls back to the character '?'
 - If the console font is a TrueType font, it is able to display most Unicode characters
msg145899 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-10-19 11:55
unicode3.py replaces sys.stdout, sys.stdout.buffer, sys.stderr and sys.stderr.buffer to use WriteConsoleW() and WriteConsoleA(). It displays also a lot of information about encodings and displays some characters (I wrote my tests for cp850, cp1252 and cp65001).
msg145963 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-10-19 20:57
win_console.patch: a more complete prototype

 * patch the site module to replace sys.stdout and sys.stderr by UnicodeConsole and BytesConsole classes which use WriteConsoleW and WriteConsoleA
 * UnicodeConsole inherits from io.TextIOBase and BytesConsole inherits from io.RawIOBase
 * Revert the workaround for WriteConsoleA bug from io.FileIO

sys.stdout and/or sys.stderr are only replaced if there are not redirected.
msg145964 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-10-19 20:58
test_win_console.py: Small script to test win_console.patch. Write some characters into sys.stdout.buffer (WriteConsoleA) and sys.stdout (WriteConsoleW). The test is written for cp850, cp1252 and cp65001 code pages.
msg146471 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-10-26 23:50
I added a cp65001 codec to Python 3.3: see issue #13216.
msg148990 - (view) Author: Matt Mackall (Matt.Mackall) Date: 2011-12-07 21:18
The underlying cause of Python's write exceptions with cp65001 is:

The ANSI C write() function as implemented by the Windows console returns the number of _characters_ written rather than the number of _bytes_, which Python reasonably interprets as a "short write error". It then consults errno, which gives the effectively random error message seen.

This can be bypassed by using os.write(sys.stdout.fileno(), utf8str), which will a) succeed and b) return a count <= len(utf8str).

With os.write() and an appropriate font, the Windows console will correctly display a large number of characters.

Possible workaround: clear errno before calling write, check for non-zero errno after. The vast majority of (non-Python) applications never check the return value of write, so don't encounter this problem.
msg157569 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-04-05 11:47
The issue #14227 has been marked as a duplicate of this issue. Copy of msg155149:

This is on Windows 7 SP1.  Run 'chcp 65001' then Python from a console.  Note the extra characters when non-ASCII characters are in the string.  At a guess it appears to be using the UTF-8 byte length of the internal representation instead of the character count.

Python 3.3.0a1 (default, Mar  4 2012, 17:27:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('hello')
hello
>>> print('p\u012bny\u012bn')
pīnyīn
n
>>> print('\u012b'*10)
īīīīīīīīīī
�īīīī
�ī
msg160812 - (view) Author: Glenn Linderman (v+python) Date: 2012-05-16 08:57
Has something incompatible changed between 3.2.2 and 3.2.3 with respect to this bug?

I have a program that had an earlier version of the workaround (Michael's original, I think), and it worked fine, then I upgraded from 3.2.2 to 3.2.3 due to testing for issue 14811 and then the old workaround started complaining about no attribute 'errors'.

So I grabbed unicode3.py, but it does the same thing:

AttributeError: 'UnicodeConsole' object has no attribute 'errors'

I have no clue how to fix this, other than going back to Python 3.2.2...
msg160813 - (view) Author: Glenn Linderman (v+python) Date: 2012-05-16 08:58
Oh, and is this issues going to be fixed for 3.3, so we don't have to use the workaround in the future?
msg160897 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-05-16 17:54
Glenn, I do not know what you are using the interactive interpreter for, but for the unicode BMP, the Idle shell generally works better. I only use CommandPrompt for cross-checking behavior.
msg161151 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2012-05-19 18:55
Not sure whether a solution has already been proposed because the issue is very long, but I just bumped into this on Windows and come up with this:


from __future__ import print_function
import sys

def safe_print(s):
    try:
        print(s)
    except UnicodeEncodeError:
        if sys.version_info >= (3,):
            print(s.encode('utf8').decode(sys.stdout.encoding))
        else:
            print(s.encode('utf8'))

safe_print(u"\N{EM DASH}")


Couldn't python do the same thing internally?
msg161153 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2012-05-19 19:25
Giampaolo: See #msg120700 for why that won't work, and the subsequent comments for what will work instead (basically, using WriteConsoleW and a workaround for a Windows API bug). Also see the prototype win_console.patch from Victor Stinner: #msg145963
msg161308 - (view) Author: Glenn Linderman (v+python) Date: 2012-05-21 23:58
I actually had to go back to 3.1.2 to get it to run, I guess I had never run with Unicode output after installing 3.2.  So it isn't an incompatibility between 3.2.2 and 3.2.3, but more likely a change between 3.1 and 3.2 that invalidates this patch and workaround.  At least it is easier to keep 3.1.x and 3.2.x on the same system!

Terry, applications for non-programmers that want to emit Unicode on the console... so the IDLE shell isn't appropriate.
msg161651 - (view) Author: Glenn Linderman (v+python) Date: 2012-05-26 08:58
A little more empirical info: the missing "errors" attribute doesn't show up except for input. print works fine.
msg164572 - (view) Author: Glenn Linderman (v+python) Date: 2012-07-03 05:14
For the win_console.patch, it seems like adding the line

self.errors='strict'

inside UnicodeOutput.__init__ resolves the problem with input causing exceptions.

Not sure if the sys_write_stdout.patch has the same sort of problem. Sure home this issue makes it into 3.3.
msg164578 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-07-03 05:56
3.3b0, Win7, 64 bit. Original test script stops at
File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
  return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x80' in position 6:

I am slightly puzzled because cp437 is an extended ascii codepage and there *is* a character for 0x80
https://en.wikipedia.org/wiki/Code_page_437

If I add .encode('latin1'), it does not print the pentagon for 0x7e, but does print \x7e to \xff.

Someone wrote elsewhere that 3.3 could use cp65001. True?
msg164580 - (view) Author: Glenn Linderman (v+python) Date: 2012-07-03 06:47
My fix for this "errors" error, might be similar to what is needed for issue 12967, although I don't know if my fix is really correct... just that it gets past the error, and 'strict' is the default for TextIOWrapper.

I'm not at all sure why there is now (since 3.2) an interaction between input on stdin and the particulars of the output class for stdout. But I'm not at all an expert in Python internals or Python IO.

I'm not sure whether or not you applied the patch to your b0, if not, that is what I'm running, too... but using the win_console.patch as supporting code.  The original test script didn't use the supporting code.

If you did patch your b0 bwith unicode3.py, then you shouldn't need to do a chcp to write any Unicode characters; someone reported that doing a chcp caused problems, but I don't know how to apply the patch or build a Python with it, so can't really test all the cases. Victor did add a cp65001 codec using a different issue, not sure how that is relevant here, other than for the tests he wrote.
msg164618 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-07-03 19:44
I was reporting stock, as distributed 3.3b0.

Is unicode3.py something to run once or import in each app that wants unicode output? Either way, if it is possible to fix the console, why is it not distribute it with the fix?

>Terry, applications for non-programmers that want to emit Unicode on the console... so the IDLE shell isn't appropriate.

Someone just posted on python-list about a problem with that.

Hmm. Maybe IDLE should gain a batch-mode console window -- basically a stripped down version of the current shell -- a minimal auto-gui for apps.
msg164619 - (view) Author: Glenn Linderman (v+python) Date: 2012-07-03 19:54
Terry said:
Is unicode3.py something to run once or import in each app that wants unicode output? 

I say:
The latter... import it.

Terry said:
Either way, if it is possible to fix the console, why is it not distribute it with the fix?

I say:
Not sure what you are asking here. Yes it is possible to fix the console, but this fix depends on the version-specific internals of the Python IO system... so unicode3.py works with Python 3.1, but not Python 3.2 or 3.3.  I haven't tested to see if my patched unicode3.py still works on Python 3.1 (I imagine it would, due to the nature of the fix just adding something that Python 3.1 probably would ignore.

So my opinion is the fix is better done inside Python than inside the application.
msg170899 - (view) Author: (Drekin) Date: 2012-09-21 16:20
Hello, I'm trying to handle Unicode input and output in Windows console and found this issue. Will this be solved in 3.3 final? I tried to write a solution (file attached) based on solution here – rewriting sys.stdin and sys.stdout so it uses ReadConsoleW and WriteConsoleW.

Output works well, but there are few problems with input. First, the Python interactive interpreter actually doesn't use sys.stdin but standard C stdin. It's implemented over file pointer (PyRun_InteractiveLoopFlags, PyRun_InteractiveOneFlags in pythonrun). But still the interpreter uses sys.stdin.encoding (assigning sys.stdin something, that doesn't have encoding==None freezes the interpreter). Wouldn't it make more sense if it used sys.__stdin__.encoding?

However, input() (which uses stdin.readline) works as expected. There's a small problem with KeyboardInterrupt. Since signals are processed asynchronously, it's raised at random place and it behaves wierdly. time.sleep(0.01) after the C call works well, but it's an ugly solution.

When code.interact() is used instead of standard interpreter, it works as expected. Is there a way of changing the intepreter loop? Some hook which calls code.interact() at the right place? The patch can be applied in site or sitecustomized, but calling code.iteract() there obviously doesn't work.

Some other remarks:
- When sys.stdin or sys.stdout doesn't define encoding and errors, input() raises TypeError: bad argument type for built-in operation.
- input() raises KeyboardInterrupt on Ctrl-C in Python 3.2 but not in Python 3.3rc2.
msg170915 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-09-21 20:51
> Will this [issue] be solved in 3.3 final?

No. It would be an huge change and the RC2 was already released. No
new feature are accepted after the version 3.3.0 beta 1:
http://www.python.org/dev/peps/pep-0398/

I'm not really motivated to work on this issue, because it is really
hard to get something working in all cases. Using
ReadConsoleW/WriteConsoleW helps, but it doesn't solve all issues as
you said.
msg170999 - (view) Author: (Drekin) Date: 2012-09-22 14:27
I have finished a solution working for me. It bypasses standard Python interactive interpreter and uses its own repl based on code.interact(). This repl is activated by an ugly hack since PYTHONSTARTUP doesn't apply when some file is run (python -i somefile.py). Why it works like that? Startup script could find out if a file is run or not. If anybody knows how to get rid of time.sleep() used for wait for KeyboardInterrupt or how to get rid of PromptHack, please let me know. The "patch" can be activated by win_unicode_console_2.enable(change_console=True, use_hack=True) in site or sitecustomize or usercustomize.
msg185135 - (view) Author: (Drekin) Date: 2013-03-24 13:02
Hello. I have made a small upgrade of the workaround.
• win_unicode_console.enable_streams() sets sys.stdin, stdout and stderr to custom filelike objects which use Windows functions ReadConcoleW and WriteConsoleW to handle unicode data properly. This can be done in sitecustomize.py to take effect automatically.

• Since Python interactive console doesn't use sys.stdin for getting input (still don't know reason for this), there is an alternative repl based on code.interact(). win_unicode_console.IntertactiveConsole.enable() sets it up. To set it up automatically, put the enabling code into a startup file and set PYTHONSTARTUP environment variable. This works for interactive session (just running python with no script).

• Since there is no hook to run InteractiveConsole.enable() when a script is run interactively (-i flag), that is after the script and before the interactive session, I have written a helper script i.py. It just runs given script and then enters an interactive mode using InteractiveConsole. Just put i.py into site-packages and run "py -m i script.py arguments" instead of "py -i script.py arguments".

It's a shame that in the year 2013 one cannot simply run Python console on Windows and enter Unicode characters. I'm not saying it's just Python fault, but there is a workaround on Python side.
msg197700 - (view) Author: (Drekin) Date: 2013-09-14 10:15
Hello again. I have rewritten the custom stdio objects and implemented them as raw io reading and writing bytes in UTF-16-LE encoding. They are then wrapped in standard BufferedReader/Writer and TextIOWrapper objects. This approach also solves a bug of wrong string length given to WriteConsoleW when the string contained supplementary character. Since we are waiting for Ctrl-C signal to arrive, this implmentation doesn't suffer from http://bugs.python.org/issue18597 . It seems to work when main script is executed however it doesn't work in Python interactive REPL since the REPL doesn't use sys.stdin for input. However it uses its encoding which results in mess when sys.stdin is changed to object with different encoding like UTF-16-LE. See http://bugs.python.org/issue17620 .
msg197751 - (view) Author: Glenn Linderman (v+python) Date: 2013-09-15 06:20
Hi Drekin. Thanks for your work in progressing this issue. There have been a variety of techniques proposed for this issue, but it sounds like yours has built on what the others learned, and is close to complete, together with issue 17620.

Is this in a form that can be used with Python 3.3? or 3.4 alpha? Can it be loaded externally from a script, or must it be compiled into Python, or both?

I've been using a variant of davidsarah's patch since 2 years now, but would like to take yours out for a spin. Is there a Complete Idiot's guide to using your patch? :)
msg197752 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-09-15 06:32
From reading the module,
  import stream; stream.enable()
replaces sys.stdin/out/err with new classes.
msg197773 - (view) Author: (Drekin) Date: 2013-09-15 13:26
Glenn Linderman: Yes I have built on what the others learned. For your question, I made it and tested it in Python 3.3, it should also work in 3.4 and what I've tried, it actually works. As Terry J. Reedy says you can just load the module and enable the streams. I do this automatically on startup using sitecustomize. However as I said currently this meeses up the interactive session because of http://bugs.python.org/issue17620 . I have made some workaround – custom REPL built on stdlib module code. And also a helper script which runs the main script and then runs the custom REPL (I couldn't find any stadard hook which would run the custom REPL). I'm uploding full code. I will delete it if this isn't appropriate place.

Things like this could be fixed more easily if more core interpreter logic took place in stdlib. E. g. the code for interactive REPL. Few days ago I started some discussion on python ideas: https://mail.python.org/pipermail/python-ideas/2013-August/023000.html .
msg221175 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-06-21 12:27
The fact Unicode doesn't work at the command prompt makes it look like Unicode on Windows just plain doesn't work, even in Python 3. Steve, if you (or a colleague) could provide some insight on getting this to work properly, that would be greatly appreciated.
msg221178 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-06-21 15:12
My understanding is that the best way to write Unicode to the console is through WriteConsoleW(), which seems to be where this discussion ended up. The only apparent sticking point is that this would cause an ordering incompatibility with `stdout.write(); stdout.buffer.write(); stdout.write()`.

Last I heard, the official "advice" was to use PowerShell. Clearly everyone's keen to jump on that... (I'm not even sure it's an instant fix either - PS is a much better shell for file manipulation and certainly handles encoding better than type/echo/etc., but I think it will still go back to the OEM CP for executables.)

One other point that came up was UTF-8 handling after redirecting output to a file. I don't see an issue there - UTF-8 is going to be one of the first guesses (with or without a BOM) for text that is not UTF-16, and apps that assume something else are no worse off than with any other codepage.

So I don't have any great answers, sorry. I'd love to see the defaults handle it properly, but opt-in scripts like Drekin's may be the best way to enable it broadly.
msg223403 - (view) Author: (Drekin) Date: 2014-07-18 09:04
I have made some updates in the streams code. Better error handling (getting errno by GetLastError() and raising exception when zero bytes are written on non-zero input). This prevents the infinite loop in BufferedIOWriter.flush() when there is odd number of bytes (WriteConsoleW accepts UTF-16-LE so only even number of bytes is written). It also prevents the same infinite loop when the buffer is too big to write at once (see http://bugs.python.org/issue11395 ). The limit of 32767 bytes was added to raw write.
msg223404 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-07-18 09:14
@Drekin: Please don't send ZIP files to the bug tracker. It would be much better to have a project on github, Mercurial or something else, to have the history of the source code. You may try tp list all people who contributed to this code.

You may also create a project on pypi.python.org to share your code. This bug tracker is not the best place for that.

When the code will be consider mature (well tested, widely used), we can try to integrate it into Python.
msg223507 - (view) Author: (Drekin) Date: 2014-07-20 10:48
@Victor Stinner: You are right. So I did it. Here are the links to GitHub and PyPI: https://github.com/Drekin/win-unicode-console, https://pypi.python.org/pypi/win_unicode_console.

I also tried to delete the files, but it seems that it is only possible to unlink a file from the issue, but the file itself remains. Is it possible to manage the files?
msg223509 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-20 11:56
Thanks Drekin - I'll point folks to your project as a good place to provide initial feedback, and if that seems promising we can look at potentially integrating the various fixes into Python 3.5
msg223945 - (view) Author: Mark Summerfield (mark) Date: 2014-07-25 13:24
I used pip to install the win_unicode_console package on windows 7 python 3.3.

It works but wouldn't freeze with cx_freeze because there's no __init__.py file in the win_unicode_console directory.
msg223946 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-25 13:27
Hmm, I'm not sure if that would be a bug in cxFreeze or CPython - I don't think we've tried freezing or zipimporting namespace packages... (either way, adding the __init__.py to win_unicode_console would likely be the quickest fix)
msg223947 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-07-25 13:28
Since there is now an external project fixing the support of Windows console, I suggest to close this issue as "wontfix". In a few months, if we get enough feedback on this project, we may reconsider integrating it into Python. What do you think?

https://pypi.python.org/pypi/win_unicode_console.

> I used pip to install the win_unicode_console package ...

Please don't use Python bug tracker to report bugs to the package.
msg223948 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-25 13:34
The poor interaction with the Windows command line is still a bug in CPython - we could mark it closed/later but I don't see any value in doing so.

I see Drekin's win_unicode_console module as similar to my own contextlib2 - used to prove the concept, and perhaps iterate on some of the details, but the ultimate long term solution is to fix CPython itself.
msg223949 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-07-25 13:40
> The poor interaction with the Windows command line is still a bug in CPython - we could mark it closed/later but I don't see any value in doing so.

I don't see any value in keeping the issue open since nobody worked on it last 7 years. I just want to make it clear that we will *not* fix this issue.

Well, in fact I spent a lot of hours trying to find a way to fix the issue, and my conclusion is that it's not possible to handle correctly Unicode (input and output) in a Windows console. Please read the whole issue for the detail.

The win_unicode_console project may improve the Unicode support, but I'm convinced that it still has various issues because it is just not possible to handle all cases.

A workaround is to not use the Windows console, but use IDLE or another shell... Try maybe PowerShell. But PowerShell has at least an issue with the code page 65001 (Microsoft UTF-8): see the issue #21927.
msg223951 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-25 13:52
Based on Steve's last post, the main challenge is that the IO model assumes a bytes-based streaming API - it isn't really set up to cope with a UTF-16 buffering layer.

However, that's not substantially different from the situation when the standard streams are replaced with StringIO objects, and they don't have an underlying buffer object at all. That may be a suitable model for Windows console IO as well - present it to the user in a way that doesn't expose an underlying bytes-based API at all.

Now, it may not be feasible to implement this until we get the startup code cleaned up, but I'm not going to squash interest in improving the situation when it's one of the major culprits behind the "Unicode is even more broken in Python 3 than it is in Python 2" meme.
msg223952 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-25 13:53
Changing targets to Python 3.5, since this is almost certainly going to be too invasive for a maintenance release.
msg224019 - (view) Author: Glenn Linderman (v+python) Date: 2014-07-26 04:30
This bug deserves to stay open with its high priority (for whatever good that does these last seven years, although I appreciate all the efforts put forth, and have been making heavy use of the workarounds in the patches), because when working with Unicode data in programs, even exception messages are not properly displayed... instead, they cause a secondary exception of not being able to display the data of the original exception to the console.

And writing Unicode data to the console as part of an interactive or command line program has to either be done with the hopes that the data only includes characters in the console, to avoid the failures, or with lots of special encoding calls and character substitutions for code points not in the console repertoire. Remember that the console is supposed to be human readable, not encoded numerically as ascii() would do. 

ascii() is sort of OK for for exception messages, but since that doesn't happen by default, the initial message to the console with Unicode data often doesn't appear, and an extra repetition after a failed message and a rework of the message parameters is required, which impedes productivity.
msg224086 - (view) Author: (Drekin) Date: 2014-07-26 20:33
I have deleted all my old files and added only my current implementation of the stream objects as the only relevant part to this issue.

@Mark Summerfield: I have added __init__.py to the new version of win_unicode_console. If there is any problem, you can start an issue on project GitHub site or contact me.

@Victor Stinner, @Nick Coghlan: What's wrong with looking on Windows wide strings as on UTF-16-LE encoded bytes and building the raw stream objects around this?
msg224095 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-27 01:55
Drekin, you're right, that's a much better way to go, I just didn't think it through :)
History
Date User Action Args
2014-07-28 12:45:35hayposetnosy: - haypo
2014-07-27 01:55:57ncoghlansetmessages: + msg224095
2014-07-26 20:33:55Drekinsetfiles: + streams.py

messages: + msg224086
2014-07-26 20:09:47Drekinsetfiles: - win_unicode_console.zip
2014-07-26 20:07:52Drekinsetfiles: - win_unicode_console.zip
2014-07-26 20:07:12Drekinsetfiles: - streams.py
2014-07-26 20:05:58Drekinsetfiles: - win_unicode_console_3.py
2014-07-26 20:05:03Drekinsetfiles: - i.py
2014-07-26 20:02:41Drekinsetfiles: - win_unicode_console_2.py
2014-07-26 04:30:46v+pythonsetmessages: + msg224019
2014-07-25 13:53:50ncoghlansetmessages: + msg223952
versions: + Python 3.5, - Python 3.3, Python 3.4
2014-07-25 13:52:39ncoghlansetmessages: + msg223951
2014-07-25 13:40:42hayposetmessages: + msg223949
2014-07-25 13:34:01ncoghlansetmessages: + msg223948
2014-07-25 13:28:43hayposetmessages: + msg223947
2014-07-25 13:27:48ncoghlansetmessages: + msg223946
2014-07-25 13:24:59marksetmessages: + msg223945
2014-07-20 11:56:54ncoghlansetmessages: + msg223509
2014-07-20 10:48:20Drekinsetmessages: + msg223507
2014-07-20 10:33:49Drekinsetfiles: - win_unicode_console.py
2014-07-18 09:14:01hayposetmessages: + msg223404
2014-07-18 09:04:06Drekinsetfiles: + win_unicode_console.zip

messages: + msg223403
2014-06-21 15:12:49steve.dowersetmessages: + msg221178
2014-06-21 12:27:21ncoghlansetpriority: normal -> high
nosy: + ncoghlan, steve.dower
messages: + msg221175

2014-06-21 12:20:46ncoghlansetpriority: low -> normal
2014-04-11 17:43:23terry.reedylinkissue21164 superseder
2013-09-15 13:26:32Drekinsetfiles: + win_unicode_console.zip

messages: + msg197773
2013-09-15 06:32:11terry.reedysetmessages: + msg197752
2013-09-15 06:20:35v+pythonsetmessages: + msg197751
2013-09-14 10:15:17Drekinsetfiles: + streams.py

messages: + msg197700
2013-03-24 13:40:48floxsetnosy: + flox
2013-03-24 13:03:32Drekinsetfiles: + win_unicode_console_3.py
2013-03-24 13:02:25Drekinsetfiles: + i.py

messages: + msg185135
versions: + Python 3.4
2012-09-22 14:27:21Drekinsetfiles: + win_unicode_console_2.py

messages: + msg170999
2012-09-21 20:51:32hayposetmessages: + msg170915
2012-09-21 16:20:08Drekinsetfiles: + win_unicode_console.py
nosy: + Drekin
messages: + msg170899

2012-07-03 19:54:34v+pythonsetmessages: + msg164619
2012-07-03 19:44:59terry.reedysetmessages: + msg164618
2012-07-03 06:47:08v+pythonsetmessages: + msg164580
2012-07-03 05:56:18terry.reedysetmessages: + msg164578
2012-07-03 05:14:25v+pythonsetmessages: + msg164572
2012-05-26 08:58:19v+pythonsetmessages: + msg161651
2012-05-21 23:59:27brian.curtinsetnosy: - brian.curtin
2012-05-21 23:58:47v+pythonsetmessages: + msg161308
2012-05-19 20:44:27Matt.Mackallsetnosy: - Matt.Mackall
2012-05-19 19:25:21davidsarahsetmessages: + msg161153
2012-05-19 18:55:40giampaolo.rodolasetmessages: + msg161151
2012-05-19 18:24:54giampaolo.rodolasetnosy: + giampaolo.rodola
2012-05-16 17:54:02terry.reedysetmessages: + msg160897
2012-05-16 08:58:12v+pythonsetmessages: + msg160813
2012-05-16 08:57:27v+pythonsetmessages: + msg160812
2012-04-05 11:47:07hayposetmessages: + msg157569
2012-03-11 18:18:17loewislinkissue14253 superseder
2012-01-21 05:59:05akirasetnosy: + akira
2011-12-07 21:18:22Matt.Mackallsetnosy: + Matt.Mackall
messages: + msg148990
2011-10-26 23:50:17hayposetmessages: + msg146471
2011-10-19 20:58:45hayposetfiles: + test_win_console.py

messages: + msg145964
2011-10-19 20:57:38hayposetfiles: + win_console.patch

messages: + msg145963
2011-10-19 20:42:02pitrousetnosy: + mhammond, brian.curtin
2011-10-19 11:55:13hayposetfiles: + unicode3.py

messages: + msg145899
2011-10-19 11:52:57hayposetmessages: + msg145898
2011-04-10 20:33:45smerlinsetnosy: + smerlin
2011-03-26 19:28:00v+pythonsetmessages: + msg132268
2011-03-26 19:22:48davidsarahsetmessages: + msg132266
2011-03-26 01:45:12v+pythonsetmessages: + msg132208
2011-03-26 00:28:45brian.curtinsetnosy: - brian.curtin
2011-03-26 00:18:52davidsarahsetmessages: + msg132191
2011-03-25 23:37:05davidsarahsetmessages: + msg132184
2011-03-25 02:12:29hayposetmessages: + msg132067
2011-03-25 01:30:03v+pythonsetmessages: + msg132065
2011-03-25 01:21:22davidsarahsetmessages: + msg132064
2011-03-25 00:59:19v+pythonsetmessages: + msg132062
2011-03-25 00:54:35davidsarahsetmessages: + msg132061
2011-03-25 00:39:48davidsarahsetmessages: + msg132060
2011-03-23 04:54:35davidsarahsetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, hippietrail, sorin, brian.curtin, davidsarah, santa4nt, David.Sankel
messages: + msg131854
2011-03-21 14:25:19hayposetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, hippietrail, sorin, brian.curtin, davidsarah, santa4nt, David.Sankel
messages: + msg131657
2011-03-04 17:47:18santa4ntsetnosy: + santa4nt
2011-02-11 16:06:19hippietrailsetnosy: + hippietrail
2011-02-03 02:47:04davidsarahsetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg127782
2011-01-15 08:51:05sorinsetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg126319
2011-01-15 01:46:39v+pythonsetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg126308
2011-01-14 23:38:10hayposetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg126304
2011-01-14 23:31:45hayposetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg126303
2011-01-14 19:06:13brian.curtinsetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg126288
2011-01-14 19:00:02terry.reedysetnosy: + terry.reedy
messages: + msg126286
2011-01-12 05:32:09davidsarahsetfiles: + doc-patch.diff
nosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
title: windows console doesn't print utf8 (Py30a2) -> windows console doesn't print or input Unicode
2011-01-10 23:38:43davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg125956
2011-01-10 22:47:09amaury.forgeotdarcsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg125947
2011-01-10 22:21:59davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg125942
2011-01-10 22:15:04davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg125938
2011-01-10 12:33:17hayposetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg125899
2011-01-10 11:50:02amaury.forgeotdarcsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
messages: + msg125898
2011-01-10 10:07:44tim.goldensetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, David.Sankel
versions: + Python 3.3, - Python 3.1, Python 3.2
2011-01-10 10:05:11tim.goldensetstatus: closed -> open

nosy: - BreamoreBoy
messages: + msg125890

resolution: not a bug ->
2011-01-10 09:50:41davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125889
2011-01-10 02:27:28v+pythonsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125877
2011-01-09 19:23:56davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125852
2011-01-09 09:03:08hayposetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125833
2011-01-09 07:28:45davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125826
2011-01-09 06:52:50v+pythonsetfiles: + unicode2.py
nosy: lemburg, tzot, amaury.forgeotdarc, pitrou, haypo, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, sorin, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125824
2011-01-09 05:32:00davidsarahsetnosy: + davidsarah
messages: + msg125823
2010-11-08 01:26:30hayposetstatus: open -> closed
resolution: not a bug
messages: + msg120700
2010-11-04 15:22:02tzotsetmessages: + msg120416
2010-11-04 15:15:02hayposetfiles: + sys_write_stdout.patch
keywords: + patch
messages: + msg120415
2010-11-04 15:09:59hayposetmessages: + msg120414
2010-11-04 03:07:38David.Sankelsetnosy: + David.Sankel
2010-09-18 15:39:35BreamoreBoysetnosy: + tim.golden, brian.curtin, BreamoreBoy
messages: + msg116801
2010-06-20 09:00:57hayposetmessages: + msg108228
2010-06-19 12:09:58pitrousetstage: test needed -> needs patch
versions: + Python 3.2, - Python 3.0
2010-06-19 12:05:00christophsetnosy: + christoph
messages: + msg108173
2010-01-12 16:09:36sorinsetnosy: + sorin
2009-10-26 17:06:21v+pythonsetmessages: + msg94496
2009-10-26 09:19:55lemburgsetnosy: + lemburg
messages: + msg94483
2009-10-26 09:07:28marksetmessages: + msg94480
2009-10-25 00:06:49v+pythonsetnosy: + v+python
messages: + msg94445
2009-09-19 00:38:48tzotsetmessages: + msg92854
2009-05-19 09:46:13amaury.forgeotdarcsetmessages: + msg88077
2009-05-19 07:54:22pitrousetnosy: + amaury.forgeotdarc
2009-05-19 00:09:03tzotsetnosy: + tzot
messages: + msg88059
2009-05-03 23:57:04pitrousetnosy: + pitrou
messages: + msg87086
2009-05-03 23:51:10hayposetnosy: haypo, christian.heimes, mark, ezio.melotti
components: + Windows
2009-05-03 23:50:37hayposetnosy: haypo, christian.heimes, mark, ezio.melotti
components: - Windows
2009-04-27 23:38:12ajaksu2setnosy: + haypo, ezio.melotti
versions: + Python 3.1

stage: test needed
2008-01-06 22:29:44adminsetkeywords: - py3k
versions: Python 3.0
2007-12-15 02:08:14christian.heimessetpriority: low
keywords: + py3k
messages: + msg58651
nosy: + christian.heimes
2007-12-14 11:31:28marksetmessages: + msg58621
2007-12-12 09:56:30markcreate