This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: windows console doesn't print or input Unicode
Type: behavior Stage: resolved
Components: Unicode, Windows Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder: Add interactive console tests
View: 28217
Assigned To: steve.dower Nosy List: David.Sankel, Drekin, Jonitis, THRlWiTi, akira, amaury.forgeotdarc, berker.peksag, christoph, davidsarah, davispuh, dead1ne, eryksun, escapewindow, ezio.melotti, flox, giampaolo.rodola, gurnec, hippietrail, lemburg, lilydjwg, mark, martin.panter, mhammond, ncoghlan, ned.deily, paul.moore, piotr.dobrogost, pitrou, python-dev, santoso.wijaya, smerlin, steve.dower, stijn, terry.reedy, tim.golden, tzot, v+python, wiz21
Priority: high Keywords: patch

Created on 2007-12-12 09:56 by mark, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
sys_write_stdout.patch vstinner, 2010-11-04 15:15 review
unicode2.py v+python, 2011-01-09 06:52
doc-patch.diff davidsarah, 2011-01-12 05:32 Proposed changes to user-visible documentation review
unicode3.py vstinner, 2011-10-19 11:55
win_console.patch vstinner, 2011-10-19 20:57 review
test_win_console.py vstinner, 2011-10-19 20:58
streams.py Drekin, 2014-07-26 20:33
wincontest.py dead1ne, 2015-11-09 20:56 Example io.TextIOWrapper sublcass using WideCharToMultiByte
winconsoleio.diff steve.dower, 2016-08-13 17:17 review
1602_2.patch steve.dower, 2016-08-31 04:28 review
1602_3.patch steve.dower, 2016-09-05 22:24
1602_4.patch steve.dower, 2016-09-06 23:50 review
1602_5.patch steve.dower, 2016-09-07 20:49
1602_6.patch steve.dower, 2016-09-07 23:08
Messages (148)
msg58487 - (view) Author: Mark Summerfield (mark) * Date: 2007-12-12 09:56
I am not sure if this is a Python bug or simply a limitation of cmd.exe.

I am using Windows XP Home.
I run cmd.exe with the /u option and I have set my console font to
"Lucida Console" (the only TrueType font offered), and I run chcp 65001
to set the utf8 code page.
When I run the following program:

for x in range(32, 2000):
    print("{0:5X} {0:c}".format(x))

one blank line is output.

But if I do chcp 1252 the program prints up to 7F before hitting a
unicode encoding error.

This is different behaviour from Python 2.5.1 which (with a suitably
modified print line) after chcp 65001 prints up to 7F and then fails
with "IOError: [Errno 0] Error".
msg58621 - (view) Author: Mark Summerfield (mark) * Date: 2007-12-14 11:31
I've looked into this a bit more, and from what I can see, code page
65001 just doesn't work---so it is a Windows problem not a Python problem.
A possible solution might be to read/write UTF16 which "managed" Windows
applications can do.
msg58651 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-15 02:08
We are aware of multiple Windows related problems. We are planing to
rewrite parts of the Windows specific API to use the widechar variants.
Maybe that will help.
msg87086 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-03 23:57
Yes, it is a Windows problem. There simply doesn't seem to be a true
Unicode codepage for command-line apps. Recommend closing.
msg88059 - (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) * Date: 2009-05-19 00:08
Just in case it helps, this behaviour is on Win XP Pro, Python 2.5.1:

First, I added an alias for 'cp65001' to 'utf_8' in
Lib/encodings/aliases.py .

Then, I opened a command prompt with a bitmap font.

c:\windows\system32>python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print u"\N{EM DASH}"
—

I switched the font to Lucida Console, and retried (without exiting the
python interpreter, although the behaviour is the same when exiting and
entering again: )

>>> print u"\N{EM DASH}"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 13] Permission denied

Then I tried (by pressing Alt+0233 for é, which is invalid in my normal
cp1253 codepage):

>>> print u"née"

and the interpreter exits without any information. So it does for:

>>> a=u"née"

Then I created a UTF-8 text file named 'test65001.py':

# -*- coding: utf_8 -*-
a=u"néeα"
print a

and tried to run it directly from the command line:

c:\windows\system32>python d:\src\PYTHON\test65001.py
néeαTraceback (most recent call last):
  File "d:\src\PYTHON\test65001.py", line 4, in <module>
    print a
IOError: [Errno 2] No such file or directory

You see? It printed all the characters before failing.

Also the following works:

c:\windows\system32>echo heéε
heéε

and

c:\windows\system32>echo heéε >D:\src\PYTHON\dummy.txt

creates successfully a UTF-8 file (without any UTF-8 BOM marks at the
beginning).

So it's possible that it is a python bug, or at least something can be
done about it.
msg88077 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-05-19 09:46
an immediate thing to do is to declare cp65001 as an encoding:

Index: Lib/encodings/aliases.py
===================================================================
--- Lib/encodings/aliases.py    (revision 72757)
+++ Lib/encodings/aliases.py    (working copy)
@@ -511,6 +511,7 @@
     'utf8'               : 'utf_8',
     'utf8_ucs2'          : 'utf_8',
     'utf8_ucs4'          : 'utf_8',
+    'cp65001'            : 'utf_8',

     ## uu_codec codec
     #'uu'                 : 'uu_codec',

This is not enough unfortunately, because the win32 API function
WriteFile() returns the number of characters written, not the number of
(utf8) bytes:

>>> print("\u0124\u0102" + 'abc')
ĤĂabc
c
[44420 refs]
>>>

Additionally, there is a bug in the ReadFile, which returns an empty
string (and no error) when a non-ascii character is entered, which is
the behavior of an EOF condition...

Maybe the solution is to use the win32 console API directly...
msg92854 - (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) * Date: 2009-09-19 00:38
Another note:
if one creates a dummy Stream object (having a softspace attribute and a
write method that writes using os.write, as in
http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462
) to replace sys.stdout and sys.stderr, then writes occur correctly,
without issues. Pre-requisites:
chcp 65001, Lucida Console font and cp65001 as an alias for UTF-8 in
encodings/aliases.py
This is Python 2.5.4 on Windows.
msg94445 - (view) Author: Glenn Linderman (v+python) * Date: 2009-10-25 00:06
With Python 3.1.1, the following batch file seems to be necessary to use
UTF-8 successfully from an XP console:

set PYTHONIOENCODING=UTF-8
cmd /u /k chcp 65001
set PYTHONIOENCODING=
exit

the cmd line seems to be necessary because of Windows having
compatibility issues, but it seems that Python should notice the cp65001
and not need the PYTHONIOENCODING stuff.
msg94480 - (view) Author: Mark Summerfield (mark) * Date: 2009-10-26 09:07
Glenn Linderman's fix pretty well works for me on XP Home. I can print
every Unicode character up to and including U+D7FF (although most just
come out as rectangles, at least I don't get encoding errors).

It fails at U+D800 with message:

UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in
position 17: surrogates not allowed

I also tried U+D801 and got the same error.

Nonetheless, this is *much* better than before.
msg94483 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-10-26 09:19
Mark Summerfield wrote:
> 
> Mark Summerfield <mark@qtrac.eu> added the comment:
> 
> Glenn Linderman's fix pretty well works for me on XP Home. I can print
> every Unicode character up to and including U+D7FF (although most just
> come out as rectangles, at least I don't get encoding errors).
> 
> It fails at U+D800 with message:
> 
> UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in
> position 17: surrogates not allowed
> 
> I also tried U+D801 and got the same error.

That's normal and expected: D800 is the start of the surrogate
ranges which are only allows in pairs in UTF-8.
msg94496 - (view) Author: Glenn Linderman (v+python) * Date: 2009-10-26 17:06
The choice of the Lucida Consola or the Consolas font cures most of the
rectangle problems.  Those are just a limitation of the selected font
for the console window.
msg108173 - (view) Author: Christoph Burgmer (christoph) Date: 2010-06-19 12:04
Will this bug be tackled or Python2.7?

And is there a way to get hold of the access denied error?

Here are my steps to reproduce:

I started the console with "cmd /u /k chcp 65001"
_______________________________________________________________________
Aktive Codepage: 65001.

C:\Dokumente und Einstellungen\root>set PYTHONIOENCODING=UTF-8

C:\Dokumente und Einstellungen\root>d:

D:\>cd Python31

D:\Python31>python
Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("\u573a")
场
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 13] Permission denied
>>>
_______________________________________________________________________

I see a rectangle on screen but obviously c&p works.
msg108228 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-20 09:00
> Maybe the solution is to use the win32 console API directly...

Yes, it is the best solution because it avoids the horrible mbcs encoding.

About cp65001: it is not *exactly* the same encoding than utf-8 and so it cannot be used as an alias to utf-8: see issue #6058.
msg116801 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-09-18 15:39
@Brian/Tim what's your take on this?
msg120414 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-11-04 15:09
I wrote a small function to call WriteConsoleOutputA() and  WriteConsoleOutputW() in Python to do some tests. It works correclty, except if I change the code page using chcp command. It looks like the problem is that the chcp command changes the console code page and the ANSI code page, but it should only changes the ANSI code page (and not the console code page).


chcp command
============

The chcp command changes the console code page, but in practice, the console still expects the OEM code page (eg. cp850 on my french setup). Example:

C:\...> python.exe -c "import sys; print(sys.stdout.encoding")
cp850
C:\...> chcp 65001
C:\...> python.exe
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp65001
C:\...> SET PYTHONIOENCODING=utf-8
C:\...> python.exe
>>> import sys
>>> sys.stdout.write("\xe9\n")
é
2
>>> sys.stdout.buffer.write("\xe9\n".encode("utf8"))
é
3
>>> sys.stdout.buffer.write("\xe9\n".encode("cp850"))
é
2

os.device_encoding(1) uses GetConsoleOutputCP() which gives 65001. It should maybe use GetOEMCP() instead? Or chcp command should be fixed?

Set the console code page looks to be a bad idea, because if I type "é" using my keyboard, a random character (eg. U+0002) is displayed instead...


WriteConsoleOutputA() and WriteConsoleOutputW()
===============================================

Without touching the code page
------------------------------

If the character can be rendered by the current font (eg. U+00E9): WriteConsoleOutputA() and WriteConsoleOutputW() work correctly.

If the character cannot be rendered by the current font, but there is a replacment character (eg. U+0141 replaced by U+0041): WriteConsoleOutputA() cannot be used (U+0141 cannot be encoded to the code page), WriteConsoleOutputW() writes U+0141 but the console contains U+0041 (I checked using ReadConsoleOutputW()) and U+0041 is displayed. It works like the mbcs encoding, the behaviour looks correct.

If the character cannot be rendered by the current font, but there is a replacment character (eg. U+042D): WriteConsoleOutputA() cannot be used (U+042D cannot be encoded to the code page), WriteConsoleOutputW() writes U+042D but U+003d (?) is displayed instead. The behaviour looks correct.

chcp 65001
----------

Using "chcp 65001" command (+ "set PYTHONIOENCODING=utf-8" to avoid the fatal error), it becomes worse: the result depends on the font...

Using raster font:
 - (ANSI) write "\xe9".encode("cp850") using WriteConsoleOutputA() displays U+00e9 (é), whereas the console output code page is cp65001 (I checked using GetConsoleOutputCP())
 - (ANSI) write "\xe9".encode("utf-8") using WriteConsoleOutputA() displays é (mojibake!)
 - (UNICODE) write "\xe9" using WriteConsoleOutputW() displays... a random character (U+0002, U+0008, U+0069, U+00b0, ...)

Using Lucida (TrueType font): 
 - (ANSI) write "\xe9".encode("cp850") using WriteConsoleOutputA() displays U+0000 !?
 - (UNICODE) write "\xe9" using WriteConsoleOutputW() works correctly (display U+00e9), even with "\u0141", it works correctly (display U+0141)
msg120415 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-11-04 15:15
sys_write_stdtout.patch: Create sys.write_stdout() test function to call WriteConsoleOutputA() or WriteConsoleOutputW() depending on the input types (bytes or str).
msg120416 - (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) * Date: 2010-11-04 15:22
http://blogs.msdn.com/b/michkap/archive/2008/03/18/8306597.aspx

If you want any kind of Unicode output in the console, the font must be an “official” MS console TTF (“official” as defined by the Windows version); I believe only Lucida Console and Consolas are the ones with all MS private settings turned on inside the font file.
msg120700 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-11-08 01:26
I don't understand exactly the goal of this issue. Different people described various bugs of the Windows console, but I don't see any problem with Python here. It looks like it's just not possible to display correctly unicode with the Windows console (the whole unicode charset, not the current code page subset).

- 65001 code page: it's not the same encoding than utf-8 and so it cannot be set as an alias to utf-8 (see #6058) => nothing to do, or maybe document that PYTHONIOENCODING=utf-8 workaround... But if you do that, you may get strange errors when writing to stdout or stderr like "IOError: [Errno 13] Permission denied" or "IOError: [Errno 2] No such file or directory" ...
- chcp command sets the console encoding, which is stupid because the console still expects text encoded to the previous code page => Windows (chcp command) bug, chcp command should not be used (it doesn't solve any problem, it just makes the situation worse)
- use the console API instead of read()/write() to fix this issue: it doesn't work, the console is completly buggy (msg120414) => Windows (console) bug
- use "Lucida Console" font avoids some issue => I don't think that the Python interpreter should configure the console (using SetCurrentConsoleFontEx?), it's not the role of Python

To me, there is nothing to do, and so I close the bug.

If you would like to fix a particular Python bug, open a new specific issue. If you consider that I'm wrong, Python should fix this issue and you know how, please reopen it.
msg125823 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-09 05:32
It is certainly possible to write Unicode to the console successfully using WriteConsoleW. This works regardless of the console code page, including 65001. The code <a href="http://tahoe-lafs.org/trac/tahoe-lafs/browser/src/allmydata/windows/fixups.py">here</a> does so (it's for Python 2.x, but you'd be calling WriteConsoleW from C anyway).

WriteConsoleW has one bug that I know of, which is that it <a href="http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232">fails when writing more than 26608 characters at once</a>. That's easy to work around by limiting the amount of data passed in a single call.

Fonts are not Python's problem, but encoding is. It doesn't make sense to fail to output the right characters just because some users might not have selected fonts that can display those characters. This bug should be reopened.

(For completeness, it is possible to display Unicode on the console using fonts other than Lucida Console and Consolas, but it <a href="http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/3259271#3259271">requires a registry hack</a>.)
msg125824 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-09 06:52
Interesting!

I was able to tweak David-Sarah's code to work with Python 3.x, mostly doing things that 2to3 would probably do: changing  unicode() to str(), dropping u from u'...', etc.

I skipped the unmangling of command-line arguments, because it produced an error I didn't understand, about needing a buffer protocol.  But I'll attach David-Sarah's code + tweaks + a test case showing output of the Cyrillic alphabet to a console with code page 437 (at least, on my Win7-64 box, that is what it is).

Nice work, David-Sarah.  I'm quite sure this is not in a form usable inside Python 3, but it shows exactly what could be done inside Python 3 to make things work... and gives us a workaround if Python 3 is not fixed.
msg125826 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-09 07:28
Glenn Linderman wrote:
> I skipped the unmangling of command-line arguments, because it produced an error I didn't understand, about needing a buffer protocol.

If I understand correctly, that part isn't needed on Python 3 because issue2128 is already fixed there.
msg125833 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-09 09:03
> It is certainly possible to write Unicode to the console 
> successfully using WriteConsoleW

Did you tried with characters not encodable to the code page and with character that cannot be rendeded by the font?

See msg120414 for my tests with WriteConsoleOutputW.
msg125852 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-09 19:23
haypo wrote:
> davidsarah wrote:
>> It is certainly possible to write Unicode to the console 
>> successfully using WriteConsoleW
>
> Did you tried with characters not encodable to the code page and with character that cannot be rendeded by the font?

Yes, characters not encodable to the code page do work (as confirmed by Glenn Linderman, since code page 437 does not include Cyrillic).

Characters that cannot be rendered by the font print as missing-glyph boxes, as expected. They don't cause any other problem, and they can be cut-and-pasted to other Unicode-aware applications, showing up as the original characters.

> See msg120414 for my tests with WriteConsoleOutputW

Even if it handled encoding correctly, WriteConsoleOutputW (http://msdn.microsoft.com/en-us/library/ms687404%28v=vs.85%29.aspx) would not be the right API to use in any case, because it prints to a rectangle of characters without scrolling. WriteConsoleW does scroll in the same way that printing to a console output stream normally would. (Redirection to a non-console stream can be detected and handled differently, as the code in unicode2.py does.)
msg125877 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-10 02:27
I would certainly be delighted if someone would reopen this issue, and figure out how to translate unicode2.py to Python internals so that Python's console I/O on Windows would support Unicode "out of the box".

Otherwise, I'll have to include the equivalent of unicode2.py in all my Python programs, because right now, I'm including instructions for the use to (1) choose Lucida or Consolas font if they can't figure out any other font that gets rid of the square boxes (2) chcp 65001 (3) set PYTHONIOENCODING=UTF-8

Having this capability inside Python (or my programs) will enable me to eliminate two-thirds of the geeky instructions for my users.  But it seems like a very appropriate capability to have within Python, especially Python 3.x with its preference and support Unicode in so many other ways.
msg125889 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-10 09:50
I'll have a look at the Py3k I/O internals and see what I can do.
(Reopening a bug appears to need Coordinator permissions.)
msg125890 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2011-01-10 10:05
Reopening as there seems to be some possibility of progress
msg125898 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-01-10 11:50
The script unicode2.py uses the console STD_OUTPUT_HANDLE iff sys.stdout.fileno()==1.
But is it always the case? What about pythonw.exe? 
Also some applications may redirect fd=1: I'm sure that py.test does this http://pytest.org/capture.html#setting-capturing-methods-or-disabling-capturing and IIRC Apache also redirects file descriptors.
msg125899 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-10 12:33
amaury> The script unicode2.py uses the console STD_OUTPUT_HANDLE iff
amaury> sys.stdout.fileno()==1

Interesting article about the Windows console:
http://blogs.msdn.com/b/michkap/archive/2008/03/18/8306597.aspx

There is an example which has many tests to check that stdout is the windows 
console (and not a pipe or something else).
msg125938 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-10 22:15
> The script unicode2.py uses the console STD_OUTPUT_HANDLE iff sys.stdout.fileno()==1.

You may have missed "if not_a_console(hStdout): real_stdout = False".
not_a_console uses GetFileType and GetConsoleMode to check whether that handle is directed to something other than a console.

> But is it always the case?

The technique used here for detecting a console is almost the same as the code for IsConsoleRedirected at http://blogs.msdn.com/b/michkap/archive/2010/05/07/10008232.aspx , or in WriteLineRight at http://blogs.msdn.com/b/michkap/archive/2010/04/07/9989346.aspx (I got it from that blog, can't remember exactly which page).

[This code will give a false positive in the strange corner case that stdout/stderr is redirected to a console *input* handle. It might be better to use GetConsoleScreenBufferInfo instead of GetConsoleMode, as suggested by http://stackoverflow.com/questions/3648711/detect-nul-file-descriptor-isatty-is-bogus/3650507#3650507 .]

> What about pythonw.exe?

I just tested that, using pythonw run from cmd.exe with stdout redirected to a file; it works as intended. It also works (for both console and non-console cases) when the handles are inherited from a parent process.

Incidentally, what's the earliest supported Windows version for Py3k? I see that http://www.python.org/download/windows/ mentions Windows ME. I can fairly easily make it fall back to never using WriteConsoleW on Windows ME, if that's necessary.
msg125942 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-10 22:21
Note: Michael Kaplan's code checks whether GetConsoleMode failed due to ERROR_INVALID_HANDLE. My code intentionally doesn't do that, because it is correct and conservative to fall back to the non-console behaviour when there is *any* error from GetConsoleMode. (It could also fail due to not having the GENERIC_READ right on the handle, for example.)
msg125947 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-01-10 22:47
Even if python.exe starts normally, py.test for example uses os.dup2() to redirect the file descriptors 1 and 2 to temporary files. sys.stdout.fileno() is still 1, the STD_OUTPUT_HANDLE did not change, but normal print() now goes to a file; but the proposed script won't detect this and will write to the console...
Somehow we should extract the file handle from the file descriptor, with a call to _get_osfhandle() for example.
msg125956 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-01-10 23:38
"... os.dup2() ..."

Good point, thanks.

It would work to change os.dup2 so that if its second argument is 0, 1, or 2, it calls _get_osfhandle to get the Windows handle for that fd, and then reruns the console-detection logic. That would even allow Unicode output to work after redirection to a different console.

Programs that directly called the CRT dup2 or SetStdHandle would bypass this. Can we consider such programs to be broken? Methinks a documentation patch for os.dup2 would be sufficient, something like:

"When fd1 refers to the standard input, output, or error handles (0, 1 and 2 respectively), this function also ensures that state associated with Python's initial sys.{stdin,stdout,stderr} streams is correctly updated if needed. It should therefore be used in preference to calling the C library's dup2, or similar APIs such as SetStdHandle on Windows."
msg126286 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-14 19:00
http://www.python.org/dev/peps/pep-0011/ says

Name:             Win9x, WinME, NT4
    Unsupported in:   Python 2.6 (warning in 2.5 installer)
    Code removed in:  Python 2.6

Only xp+ now. email sent to webmaster@...

Even if the best fix only applies to win7, please include it.
msg126288 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2011-01-14 19:06
I think we even agreed to drop 2000, although the PEP hasn't been updated and I couldn't find the supposed email where this was said.

For implementing functionality that isn't supported on all Windows versions or architectures, you can look at PC/winreg.c for a few examples. DisableReflectionKey is a good example off the top of my head.
msg126303 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-14 23:31
Here are some results of my test of unicode2.py. I'm testing py3k on Windows XP, OEM: cp850, ANSI: cp1252.

Raster fonts
------------

With a fresh console, unicode2.py displays "?????????????????". input() accepts characters encodable to the OEM code page.

If I set the code page to 65001 (chcp program+set PYTHONIOENCODING=utf-8; or SetConsoleCP() + SetConsoleOutputCP()), it displays weird characters. input() accepts ASCII characters, but non-ASCII characters (encodable to the console and OEM code pages) display weird characters (smileys! control characters?).

Lucida console
--------------

With my system code page (OEM: cp850), characters not encodable to the code pages are displayed correctly. I can type some non-ASCII characters (encodable to the code page). If I copy/paste characters non encodable to the code page, there are replaced by similar glyph (eg. Ł => L) or ? (€ => ?).

If I set the code page to 65001, all characters are still correctly displayed. But I cannot type non-ASCII characters anymore: input() fails with EOFError (I suppose that Python gets control characters).

Redirect output to a pipe
-------------------------

I patched unicode2.py to use sys.stdout.buffer instead of sys.stdout for UnicodeOutput stream. I also patched UnicodeOutput to replace \n by \r\n. 

It works correctly with any character. No UTF-8 BOM is written. But "Here 1" is written at the end. I suppose that sys.stdout should be flushed before the creation of UnicodeOutput.

But it always use UTF-8. I don't know if UTF-8 is well supported by any application on Windows.

Without unicode2.py, only characters encodable to OEM code page are supported, and \n is used as end of line string.

Let's try to summarize
----------------------

Tests:
 d1) Display characters encodable to the console code page
 t1) Type characters encodable to the console code page
 d2) Display characters not encodable to any code page
 t2) Type characters not encodable to any code page

I'm using Windows with OEM=cp850 and ANSI=cp1252. For test (t2), I copy €-Ł and paste it to the console (right click on the window title > Edit > Paste).

Raster fonts, console=cp850:

d1) ok
t1) ok
d2) FAIL: €-Ł is displayed ?-L
t2) FAIL: €-Ł is read as ?-L

Raster fonts, console=cp65001:

d1) FAIL: é is displayed as 2 strange glyphs
t1) FAIL: EOFError
d2) FAIL: only display unreadable glyphs
t2) FAIL: EOFError

Lucida console, console=cp850:

d1) ok
t1) ok
d2) ok
t2) FAIL: €-Ł is read as ?-L

Lucida console, console=cp65001:

d1) ok
t1) FAIL: EOFError
d2) ok
t2) FAIL: EOFError

So, setting the console code page to 65001 doesn't solve any issue, but it breaks the input (input with the keyboard or pasting text).

With Raster fonts or Lucida console, it's possible to display characters encodable to the code page. But it is not new, it's already possible with Python 3. But for characters not encodable to the code page, it works with unicode2.py and Lucida console, with is something new :-)

For the input, I suppose that we need also to use a Windows console function, to support unencodable characters.
msg126304 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-14 23:38
> ..., because right now, I'm including instructions for the use to 
> (1) choose Lucida or Consolas font if they can't figure out 
>     any other font that gets rid of the square boxes 
> (2) chcp 65001 
> (3) set PYTHONIOENCODING=UTF-8

Why do you set the code page to 65001? In all my tests (on Windows XP), it always break the standard input.
msg126308 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-15 01:46
Victor said:
Why do you set the code page to 65001? In all my tests (on Windows XP), it always break the standard input.

My response:
Because when I searched Windows for Unicode and/or UTF-8 stuff, I found 65001, and it seems like it might help, and it does a bit.  And then I find PYTHONIOENCODING, and that helps some.  And that got me something that works better enough than what I had before, so I quit searching.

You did a better job of analyzing and testing all the cases.  I will have to go subtract the 65001 part, and confirm your results, maybe it is useless now that other pieces of the puzzle are in place.  Certainly with David-Sarah's code it seems to not be needed, whether it was a necessary part of the previous workaround I am not sure, because of the limited number of cases I tried (trying to find something that worked well enough, but not having enough knowledge to find David-Sarah's solution, nor a good enough testing methodology to try the pieces independently.

Thank your for your interest in this issue.
msg126319 - (view) Author: Sorin Sbarnea (ssbarnea) * Date: 2011-01-15 08:51
remeber that cp65001 cannot be set on windows. Also please read http://blogs.msdn.com/b/michkap/archive/2010/10/07/10072032.aspx and contact the author, Michael Kaplan from Microsoft, if you have more questions. I'm sure he will be glad to help.
msg127782 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-02-03 02:47
Feedback from Julie Solon of Microsoft:

> These console functions share a per-process heap that is 64K. There is some overhead, the heap can get fragmented, and calls from multiple threads all affect how much is available for this buffer. 

> I am working to update the documentation for this function [WriteConsoleW] and other affected functions with information along these lines, and will post it within the next week or two.

I replied thanking her and asking for clarification:

When you say that the heap can get fragmented, is this true only when
there are concurrent calls to the console functions, or can it occur
even with single-threaded use? I'm trying to determine whether acquiring
a process-global lock while calling these functions would be sufficient
to ensure that the available heap space will not be unexpectedly low.
(This assumes that the functions not used outside the lock by other
libraries in the same process.)

ReadConsoleW seems also to be affected, incidentally.

I've asked for clarification about whether acquiring a process-global lock when using these functions ...
Julie
msg131657 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-03-21 14:25
I did some tests with WriteConsoleW():
 - with raster fonts, U+00E9 is displayed as é, U+0141 as L and U+042D as ? => good (work as expected)
 - with TrueType font (Lucida), U+00E9 is displayed as é, U+0141 as Ł and U+042D as Э => perfect! (all characters are rendered correctly)

Now I agree that WriteConsoleW() is the best solution to fix this issue.

My test code (added to Python/sysmodule.c):
---------
static PyObject *
sys_write_stdout(PyObject *self, PyObject *args)
{
    PyObject *textobj;
    wchar_t *text;
    DWORD written, total;
    Py_ssize_t len, chunk;
    HANDLE console;
    BOOL ok;

    if (!PyArg_ParseTuple(args, "U:write_stdout", &textobj))
        return NULL;

    console = GetStdHandle(STD_OUTPUT_HANDLE);
    if (console == INVALID_HANDLE_VALUE) {
        PyErr_SetFromWindowsErr(GetLastError());
        return NULL;
    }

    text = PyUnicode_AS_UNICODE(textobj);
    len = PyUnicode_GET_SIZE(textobj);
    total = 0;
    while (len != 0) {
        if (len > 10000)
            /* WriteConsoleW() is limited to 64 KB (32,768 UTF-16 units), but
               this limit depends on the heap usage. Use a safe limit of 10,000
               UTF-16 units.
               http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232 */
            chunk = 10000;
        else
            chunk = len;
        ok = WriteConsoleW(console, text, chunk, &written, NULL);
        if (!ok) 
            break;
        text += written;
        len -= written;
        total += written;
    }
    return PyLong_FromUnsignedLong(total);
}
---------


The question is now how to integrate WriteConsoleW() into Python without breaking the API, for example:
 - Should sys.stdout be a TextIOWrapper or not?
 - Should sys.stdout.fileno() returns 1 or raise an error?
 - What about sys.stdout.buffer: should sys.stdout.buffer.write() calls WriteConsoleA() or sys.stdout should not have a buffer attribute? I think that many modules and programs now rely on sys.stdout.buffer to write directly bytes into stdout. There is at least python -m base64.
 - Should we use ReadConsoleW() for stdin?
msg131854 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-23 04:54
(For anyone wondering about the hold-up on this bug, I ended up switching to Ubuntu. Not to worry, I now have Python 3 building in XP under VirtualBox -- which is further than I ever got with my broken Vista install :-/ It seems to behave identically to native XP as far as this bug is concerned.)

Victor STINNER wrote:
> The question is now how to integrate WriteConsoleW() into Python without breaking the API, for example:
> - Should sys.stdout be a TextIOWrapper or not?

It pretty much has to be a TextIOWrapper for compatibility. Also it's easier to implement it that way, because the text stream object has to be able to fall back to using the buffer if the fd is redirected.

> - Should sys.stdout.fileno() returns 1 or raise an error?

Return sys.stdout.buffer.fileno(), which is 1 unless redirected.

This is the Right Thing because in Windows, fds are an abstraction of the C runtime library, and the C runtime allows an fd to be associated with a console. In that case, from the application's point of view it is still writing to the same fd. In fact, we'd be implementing this by calling the WriteConsoleW win32 API directly in order to avoid bugs in the CRT's Unicode support, but that's an implementation detail.

> - What about sys.stdout.buffer: should sys.stdout.buffer.write() calls WriteConsoleA() or sys.stdout should not have a buffer attribute?

I was thinking that sys.std{out,err}.buffer would still be set up exactly as they are now. Then if an app writes to that buffer, it will get interleaved with any writes via the text stream. (The writes to the buffer go to the underlying fd, which probably ends up calling WriteFile at the win32 level.)

> I think that many modules and programs now rely on sys.stdout.buffer to write directly bytes into stdout. There is at least python -m base64.

That would just work. The only caveat would be that if you write a partial line to the buffer object (or if you set the buffer object to be fully buffered and write to it), and then write to the text stream, the buffer wouldn't be flushed before the text is written. I think that is fine as long as it is documented.

If an app sets the .buffer attribute of sys.std{out,err}, it would fall back to using that buffer in the same way as when the fd is redirected.

> - Should we use ReadConsoleW() for stdin?

Yes. I'll probably start with a patch that just handles std{out,err}, though.
msg132060 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-25 00:39
I wrote:
> The only caveat would be that if you write a partial line to the buffer object (or if you set the buffer object to be fully buffered and write to it), and then write to the text stream, the buffer wouldn't be flushed before the text is written.

Actually it looks like that already happens (because the sys.std{out,err} TextIOWrappers are line-buffered separately to their underlying buffers), so it would not be an incompatibility:

$ python3 -c 'import sys; sys.stdout.write("foo"); sys.stdout.buffer.write(b"bar"); sys.stdout.write("baz\n")'
barfoobaz
msg132061 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-25 00:54
I wrote:
$ python3 -c 'import sys; sys.stdout.write("foo"); sys.stdout.buffer.write(b"bar"); sys.stdout.write("baz\n")'
barfoobaz

Hmm, the behaviour actually would differ here: the proposed implementation would print

foobaz
bar

(the "foobaz\n" is written by a call to WriteConsoleW and then the "bar" gets flushed to stdout when the process exits).

But since the naive expectation is "foobarbaz\n" and you already have to flush after each call in order to get that, I think this change in behaviour would be unlikely to affect correct applications.
msg132062 - (view) Author: Glenn Linderman (v+python) * Date: 2011-03-25 00:59
Presently, a correct application only needs to flush between a sequence of writes and a sequence of buffer.writes.

Don't assume the flush happens after every write, for a correct application.
msg132064 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-25 01:21
Glenn Linderman wrote:
> Presently, a correct application only needs to flush between a sequence of writes and a sequence of buffer.writes.

Right. The new requirement would be that a correct app also needs to flush between a sequence of buffer.writes (that end in an incomplete line, or always if PYTHONUNBUFFERED or python -u is used), and a sequence of writes.

> Don't assume the flush happens after every write, for a correct application.

It's rather hard to implement this without any change in behaviour. Or rather, it isn't hard if the TextIOWrapper were to flush its underlying buffer before each time it writes to the console, but I'd be concerned about the extra overhead of that call. I'd prefer not to do that unless the new requirement above leads to incompatibilities in practice.
msg132065 - (view) Author: Glenn Linderman (v+python) * Date: 2011-03-25 01:30
Would it suffice if the new scheme internally flushed after every buffer.write?  It wouldn't be needed after write, because the correct application would already do one there?

Am I off-base in supposing that the performance of buffer.write is expected to include a flush (because it isn't expected to be buffered)?
msg132067 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-03-25 02:12
Le vendredi 25 mars 2011 à 00:54 +0000, David-Sarah Hopwood a écrit :
> David-Sarah Hopwood <david-sarah@jacaranda.org> added the comment:
> 
> I wrote:
> $ python3 -c 'import sys; sys.stdout.write("foo");
> sys.stdout.buffer.write(b"bar"); sys.stdout.write("baz\n")'
> barfoobaz
> 
> Hmm, the behaviour actually would differ here: the proposed
> implementation would print
> 
> foobaz
> bar
> 
> (the "foobaz\n" is written by a call to WriteConsoleW and then the
> "bar" gets flushed to stdout when the process exits).
> 
> But since the naive expectation is "foobarbaz\n" and you already have
> to flush after each call in order to get that, I think this change in
> behaviour would be unlikely to affect correct applications.

I would not call this "naive". "foobaz\nbar" is really weird. I think
that sys.stdout and sys.stdout.buffer will both have to flush after each
write, or they may be desynchronized.

Some developers already think that adding sys.stdout.flush() after
print("Processing.. ", end='') is too hard (#11633). So I cannot imagine
how they would react if they will have to do it explicitly after all
print, sys.stdout.write() and sys.stdout.buffer.write().
msg132184 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-25 23:37
First a minor correction:
> The new requirement would be that a correct app also needs to flush between a sequence of buffer.writes (that end in an incomplete line, or always if PYTHONUNBUFFERED or python -u is used), and a sequence of writes.

That should be "and only if PYTHONUNBUFFERED or python -u is not used".

I also said:
> If an app sets the .buffer attribute of sys.std{out,err}, it would fall back to using that buffer in the same way as when the fd is redirected.

but the .buffer attribute is readonly, so this case can't occur.

Glenn Linderman wrote:
> Would it suffice if the new scheme internally flushed after every buffer.write?  It wouldn't be needed after write, because the correct application would already do one there?

Yes, that would be sufficient.

> Am I off-base in supposing that the performance of buffer.write is expected to include a flush (because it isn't expected to be buffered)?

It is expected to be line-buffered. So an app might expect that printing characters one-at-a-time will have reasonable performance.

In any case, given that the buffer of the initial std{out,err} will always be a BufferedWriter object (since .buffer is readonly), it would be possible for the TextIOWriter to test a dirty flag in the BufferedWriter, in order to check efficiently whether the buffer needs flushing on each write. I've looked at the implementation complexity cost of this, and it doesn't seem too bad.

A similar issue arises for stdin: to maintain strict compatibility, every read from a TextIOWrapper attached to an input console would have to drain the buffer of its buffer object, in case the app has read from it. This is a bit tricky because the bytes drained from the buffer have to be converted to Unicode, so what happens if they end part-way through a multibyte character? Ugh, I'll have to think about that one.

Victor STINNER wrote:
> Some developers already think that adding sys.stdout.flush() after
print("Processing.. ", end='') is too hard (#11633).

IIUC, that bug is about the behaviour of 'print', and didn't suggest to change the fact that sys.stdout is line-buffered.


By the way, are these changes going to be in a major release? If I understand correctly, the layout of structs (for standard library types not prefixed with '_', such as 'buffered' in bufferedio.c or 'textio' in textio.c) can change with major releases but not with minor releases, correct?
msg132191 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-26 00:18
I wrote:
> A similar issue arises for stdin: to maintain strict compatibility, every read from a TextIOWrapper attached to an input console would have to drain the buffer of its buffer object, in case the app has read from it. This is a bit tricky because the bytes drained from the buffer have to be converted to Unicode, so what happens if they end part-way through a multibyte character? Ugh, I'll have to think about that one.

It seems like there is no correct way for an app to read from both sys.stdin, and sys.stdin.buffer (even without these console changes). It must choose one or the other.
msg132208 - (view) Author: Glenn Linderman (v+python) * Date: 2011-03-26 01:45
David-Sarah said:
In any case, given that the buffer of the initial std{out,err} will always be a BufferedWriter object (since .buffer is readonly), it would be possible for the TextIOWriter to test a dirty flag in the BufferedWriter, in order to check efficiently whether the buffer needs flushing on each write. I've looked at the implementation complexity cost of this, and it doesn't seem too bad.

So if flush checks that bit, maybe TextIOWriter could just call buffer.flush, and it would be fast if clean and slow if dirty?  Calling it at the beginning of a Text level write, that is, which would let the char-at-a-time calls to buffer.write be fast.

And I totally agree with msg132191
msg132266 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2011-03-26 19:22
Glenn wrote:
> So if flush checks that bit, maybe TextIOWriter could just call buffer.flush, and it would be fast if clean and slow if dirty?

Yes. I'll benchmark how much overhead is added by the calls to flush; there's no point in breaking the abstraction boundary of BufferedWriter if it doesn't give a significant performance benefit. (I suspect that it might not, because Windows is very slow at scrolling a console, which might make the cost of flushing insignificant in comparison.)
msg132268 - (view) Author: Glenn Linderman (v+python) * Date: 2011-03-26 19:27
David-Sarah wrote:
Windows is very slow at scrolling a console, which might make the cost of flushing insignificant in comparison.)

Just for the record, I noticed a huge speedup in Windows console scrolling when I switched from WinXP to Win7 on a faster computer :)
How much is due to the XP->7 switch and how much to the faster computer, I cannot say, but it seemed much more significant than other speedups in other software.  The point?  Benchmark it on Win7, not XP.
msg145898 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-19 11:52
I done more tests on the Windows console. I focused my tests on output.

To sum up, if we implement sys.stdout using WriteConsoleW() and sys.stdout.buffer.raw using WriteConsoleA():

 - print() will not fail anymore on unencodable characters, because the string is no longer encoded to the console code page
 - if you set the console font to a TrueType font, most characters will be displayed correctly
 - you don't need to change the (console) code page to CP_UTF8 (65001) anymore if you just use print()
 - you still need cp65001 if the output (stdout and/or stderr) is redirected or if you use directly sys.stdout.buffer or sys.stderr.buffer

Other facts:

 - locale.getpreferredencoding() returns the ANSI code page
 - sys.stdin.encoding is the console encoding (GetConsoleCP())
 - sys.stdout.encoding and sys.stderr.encoding are the console output code page (GetConsoleOutputCP())
 - sys.stdout is not a TTY if the output is redirect, e.g. "python script.py|more"
 - sys.stderr is not a TTY if the output is redirect, e.g. "python script.py 2>&1|more" (this example redirects stdout and stderr, I don't know how to redirect only stderr)
 - WriteConsoleW() is not affected by the console output code page (GetConsoleOutputCP)
 - WriteConsoleA() is indirectly affected by the console output code page: if a string cannot be encoded to the console output code page (e.g. sys.stdout.encoding), you cannot call WriteConsoleA with the result...
 - If the console font is a raster font and and the font doesn't contain a character, the console tries to find a similar glyph, or it falls back to the character '?'
 - If the console font is a TrueType font, it is able to display most Unicode characters
msg145899 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-19 11:55
unicode3.py replaces sys.stdout, sys.stdout.buffer, sys.stderr and sys.stderr.buffer to use WriteConsoleW() and WriteConsoleA(). It displays also a lot of information about encodings and displays some characters (I wrote my tests for cp850, cp1252 and cp65001).
msg145963 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-19 20:57
win_console.patch: a more complete prototype

 * patch the site module to replace sys.stdout and sys.stderr by UnicodeConsole and BytesConsole classes which use WriteConsoleW and WriteConsoleA
 * UnicodeConsole inherits from io.TextIOBase and BytesConsole inherits from io.RawIOBase
 * Revert the workaround for WriteConsoleA bug from io.FileIO

sys.stdout and/or sys.stderr are only replaced if there are not redirected.
msg145964 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-19 20:58
test_win_console.py: Small script to test win_console.patch. Write some characters into sys.stdout.buffer (WriteConsoleA) and sys.stdout (WriteConsoleW). The test is written for cp850, cp1252 and cp65001 code pages.
msg146471 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-26 23:50
I added a cp65001 codec to Python 3.3: see issue #13216.
msg148990 - (view) Author: Matt Mackall (Matt.Mackall) Date: 2011-12-07 21:18
The underlying cause of Python's write exceptions with cp65001 is:

The ANSI C write() function as implemented by the Windows console returns the number of _characters_ written rather than the number of _bytes_, which Python reasonably interprets as a "short write error". It then consults errno, which gives the effectively random error message seen.

This can be bypassed by using os.write(sys.stdout.fileno(), utf8str), which will a) succeed and b) return a count <= len(utf8str).

With os.write() and an appropriate font, the Windows console will correctly display a large number of characters.

Possible workaround: clear errno before calling write, check for non-zero errno after. The vast majority of (non-Python) applications never check the return value of write, so don't encounter this problem.
msg157569 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-05 11:47
The issue #14227 has been marked as a duplicate of this issue. Copy of msg155149:

This is on Windows 7 SP1.  Run 'chcp 65001' then Python from a console.  Note the extra characters when non-ASCII characters are in the string.  At a guess it appears to be using the UTF-8 byte length of the internal representation instead of the character count.

Python 3.3.0a1 (default, Mar  4 2012, 17:27:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('hello')
hello
>>> print('p\u012bny\u012bn')
pīnyīn
n
>>> print('\u012b'*10)
īīīīīīīīīī
�īīīī
�ī
msg160812 - (view) Author: Glenn Linderman (v+python) * Date: 2012-05-16 08:57
Has something incompatible changed between 3.2.2 and 3.2.3 with respect to this bug?

I have a program that had an earlier version of the workaround (Michael's original, I think), and it worked fine, then I upgraded from 3.2.2 to 3.2.3 due to testing for issue 14811 and then the old workaround started complaining about no attribute 'errors'.

So I grabbed unicode3.py, but it does the same thing:

AttributeError: 'UnicodeConsole' object has no attribute 'errors'

I have no clue how to fix this, other than going back to Python 3.2.2...
msg160813 - (view) Author: Glenn Linderman (v+python) * Date: 2012-05-16 08:58
Oh, and is this issues going to be fixed for 3.3, so we don't have to use the workaround in the future?
msg160897 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-05-16 17:54
Glenn, I do not know what you are using the interactive interpreter for, but for the unicode BMP, the Idle shell generally works better. I only use CommandPrompt for cross-checking behavior.
msg161151 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2012-05-19 18:55
Not sure whether a solution has already been proposed because the issue is very long, but I just bumped into this on Windows and come up with this:


from __future__ import print_function
import sys

def safe_print(s):
    try:
        print(s)
    except UnicodeEncodeError:
        if sys.version_info >= (3,):
            print(s.encode('utf8').decode(sys.stdout.encoding))
        else:
            print(s.encode('utf8'))

safe_print(u"\N{EM DASH}")


Couldn't python do the same thing internally?
msg161153 - (view) Author: David-Sarah Hopwood (davidsarah) Date: 2012-05-19 19:25
Giampaolo: See #msg120700 for why that won't work, and the subsequent comments for what will work instead (basically, using WriteConsoleW and a workaround for a Windows API bug). Also see the prototype win_console.patch from Victor Stinner: #msg145963
msg161308 - (view) Author: Glenn Linderman (v+python) * Date: 2012-05-21 23:58
I actually had to go back to 3.1.2 to get it to run, I guess I had never run with Unicode output after installing 3.2.  So it isn't an incompatibility between 3.2.2 and 3.2.3, but more likely a change between 3.1 and 3.2 that invalidates this patch and workaround.  At least it is easier to keep 3.1.x and 3.2.x on the same system!

Terry, applications for non-programmers that want to emit Unicode on the console... so the IDLE shell isn't appropriate.
msg161651 - (view) Author: Glenn Linderman (v+python) * Date: 2012-05-26 08:58
A little more empirical info: the missing "errors" attribute doesn't show up except for input. print works fine.
msg164572 - (view) Author: Glenn Linderman (v+python) * Date: 2012-07-03 05:14
For the win_console.patch, it seems like adding the line

self.errors='strict'

inside UnicodeOutput.__init__ resolves the problem with input causing exceptions.

Not sure if the sys_write_stdout.patch has the same sort of problem. Sure home this issue makes it into 3.3.
msg164578 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-07-03 05:56
3.3b0, Win7, 64 bit. Original test script stops at
File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
  return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x80' in position 6:

I am slightly puzzled because cp437 is an extended ascii codepage and there *is* a character for 0x80
https://en.wikipedia.org/wiki/Code_page_437

If I add .encode('latin1'), it does not print the pentagon for 0x7e, but does print \x7e to \xff.

Someone wrote elsewhere that 3.3 could use cp65001. True?
msg164580 - (view) Author: Glenn Linderman (v+python) * Date: 2012-07-03 06:47
My fix for this "errors" error, might be similar to what is needed for issue 12967, although I don't know if my fix is really correct... just that it gets past the error, and 'strict' is the default for TextIOWrapper.

I'm not at all sure why there is now (since 3.2) an interaction between input on stdin and the particulars of the output class for stdout. But I'm not at all an expert in Python internals or Python IO.

I'm not sure whether or not you applied the patch to your b0, if not, that is what I'm running, too... but using the win_console.patch as supporting code.  The original test script didn't use the supporting code.

If you did patch your b0 bwith unicode3.py, then you shouldn't need to do a chcp to write any Unicode characters; someone reported that doing a chcp caused problems, but I don't know how to apply the patch or build a Python with it, so can't really test all the cases. Victor did add a cp65001 codec using a different issue, not sure how that is relevant here, other than for the tests he wrote.
msg164618 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-07-03 19:44
I was reporting stock, as distributed 3.3b0.

Is unicode3.py something to run once or import in each app that wants unicode output? Either way, if it is possible to fix the console, why is it not distribute it with the fix?

>Terry, applications for non-programmers that want to emit Unicode on the console... so the IDLE shell isn't appropriate.

Someone just posted on python-list about a problem with that.

Hmm. Maybe IDLE should gain a batch-mode console window -- basically a stripped down version of the current shell -- a minimal auto-gui for apps.
msg164619 - (view) Author: Glenn Linderman (v+python) * Date: 2012-07-03 19:54
Terry said:
Is unicode3.py something to run once or import in each app that wants unicode output? 

I say:
The latter... import it.

Terry said:
Either way, if it is possible to fix the console, why is it not distribute it with the fix?

I say:
Not sure what you are asking here. Yes it is possible to fix the console, but this fix depends on the version-specific internals of the Python IO system... so unicode3.py works with Python 3.1, but not Python 3.2 or 3.3.  I haven't tested to see if my patched unicode3.py still works on Python 3.1 (I imagine it would, due to the nature of the fix just adding something that Python 3.1 probably would ignore.

So my opinion is the fix is better done inside Python than inside the application.
msg170899 - (view) Author: Adam Bartoš (Drekin) * Date: 2012-09-21 16:20
Hello, I'm trying to handle Unicode input and output in Windows console and found this issue. Will this be solved in 3.3 final? I tried to write a solution (file attached) based on solution here – rewriting sys.stdin and sys.stdout so it uses ReadConsoleW and WriteConsoleW.

Output works well, but there are few problems with input. First, the Python interactive interpreter actually doesn't use sys.stdin but standard C stdin. It's implemented over file pointer (PyRun_InteractiveLoopFlags, PyRun_InteractiveOneFlags in pythonrun). But still the interpreter uses sys.stdin.encoding (assigning sys.stdin something, that doesn't have encoding==None freezes the interpreter). Wouldn't it make more sense if it used sys.__stdin__.encoding?

However, input() (which uses stdin.readline) works as expected. There's a small problem with KeyboardInterrupt. Since signals are processed asynchronously, it's raised at random place and it behaves wierdly. time.sleep(0.01) after the C call works well, but it's an ugly solution.

When code.interact() is used instead of standard interpreter, it works as expected. Is there a way of changing the intepreter loop? Some hook which calls code.interact() at the right place? The patch can be applied in site or sitecustomized, but calling code.iteract() there obviously doesn't work.

Some other remarks:
- When sys.stdin or sys.stdout doesn't define encoding and errors, input() raises TypeError: bad argument type for built-in operation.
- input() raises KeyboardInterrupt on Ctrl-C in Python 3.2 but not in Python 3.3rc2.
msg170915 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-09-21 20:51
> Will this [issue] be solved in 3.3 final?

No. It would be an huge change and the RC2 was already released. No
new feature are accepted after the version 3.3.0 beta 1:
http://www.python.org/dev/peps/pep-0398/

I'm not really motivated to work on this issue, because it is really
hard to get something working in all cases. Using
ReadConsoleW/WriteConsoleW helps, but it doesn't solve all issues as
you said.
msg170999 - (view) Author: Adam Bartoš (Drekin) * Date: 2012-09-22 14:27
I have finished a solution working for me. It bypasses standard Python interactive interpreter and uses its own repl based on code.interact(). This repl is activated by an ugly hack since PYTHONSTARTUP doesn't apply when some file is run (python -i somefile.py). Why it works like that? Startup script could find out if a file is run or not. If anybody knows how to get rid of time.sleep() used for wait for KeyboardInterrupt or how to get rid of PromptHack, please let me know. The "patch" can be activated by win_unicode_console_2.enable(change_console=True, use_hack=True) in site or sitecustomize or usercustomize.
msg185135 - (view) Author: Adam Bartoš (Drekin) * Date: 2013-03-24 13:02
Hello. I have made a small upgrade of the workaround.
• win_unicode_console.enable_streams() sets sys.stdin, stdout and stderr to custom filelike objects which use Windows functions ReadConcoleW and WriteConsoleW to handle unicode data properly. This can be done in sitecustomize.py to take effect automatically.

• Since Python interactive console doesn't use sys.stdin for getting input (still don't know reason for this), there is an alternative repl based on code.interact(). win_unicode_console.IntertactiveConsole.enable() sets it up. To set it up automatically, put the enabling code into a startup file and set PYTHONSTARTUP environment variable. This works for interactive session (just running python with no script).

• Since there is no hook to run InteractiveConsole.enable() when a script is run interactively (-i flag), that is after the script and before the interactive session, I have written a helper script i.py. It just runs given script and then enters an interactive mode using InteractiveConsole. Just put i.py into site-packages and run "py -m i script.py arguments" instead of "py -i script.py arguments".

It's a shame that in the year 2013 one cannot simply run Python console on Windows and enter Unicode characters. I'm not saying it's just Python fault, but there is a workaround on Python side.
msg197700 - (view) Author: Adam Bartoš (Drekin) * Date: 2013-09-14 10:15
Hello again. I have rewritten the custom stdio objects and implemented them as raw io reading and writing bytes in UTF-16-LE encoding. They are then wrapped in standard BufferedReader/Writer and TextIOWrapper objects. This approach also solves a bug of wrong string length given to WriteConsoleW when the string contained supplementary character. Since we are waiting for Ctrl-C signal to arrive, this implmentation doesn't suffer from http://bugs.python.org/issue18597 . It seems to work when main script is executed however it doesn't work in Python interactive REPL since the REPL doesn't use sys.stdin for input. However it uses its encoding which results in mess when sys.stdin is changed to object with different encoding like UTF-16-LE. See http://bugs.python.org/issue17620 .
msg197751 - (view) Author: Glenn Linderman (v+python) * Date: 2013-09-15 06:20
Hi Drekin. Thanks for your work in progressing this issue. There have been a variety of techniques proposed for this issue, but it sounds like yours has built on what the others learned, and is close to complete, together with issue 17620.

Is this in a form that can be used with Python 3.3? or 3.4 alpha? Can it be loaded externally from a script, or must it be compiled into Python, or both?

I've been using a variant of davidsarah's patch since 2 years now, but would like to take yours out for a spin. Is there a Complete Idiot's guide to using your patch? :)
msg197752 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-09-15 06:32
From reading the module,
  import stream; stream.enable()
replaces sys.stdin/out/err with new classes.
msg197773 - (view) Author: Adam Bartoš (Drekin) * Date: 2013-09-15 13:26
Glenn Linderman: Yes I have built on what the others learned. For your question, I made it and tested it in Python 3.3, it should also work in 3.4 and what I've tried, it actually works. As Terry J. Reedy says you can just load the module and enable the streams. I do this automatically on startup using sitecustomize. However as I said currently this meeses up the interactive session because of http://bugs.python.org/issue17620 . I have made some workaround – custom REPL built on stdlib module code. And also a helper script which runs the main script and then runs the custom REPL (I couldn't find any stadard hook which would run the custom REPL). I'm uploding full code. I will delete it if this isn't appropriate place.

Things like this could be fixed more easily if more core interpreter logic took place in stdlib. E. g. the code for interactive REPL. Few days ago I started some discussion on python ideas: https://mail.python.org/pipermail/python-ideas/2013-August/023000.html .
msg221175 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-06-21 12:27
The fact Unicode doesn't work at the command prompt makes it look like Unicode on Windows just plain doesn't work, even in Python 3. Steve, if you (or a colleague) could provide some insight on getting this to work properly, that would be greatly appreciated.
msg221178 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-06-21 15:12
My understanding is that the best way to write Unicode to the console is through WriteConsoleW(), which seems to be where this discussion ended up. The only apparent sticking point is that this would cause an ordering incompatibility with `stdout.write(); stdout.buffer.write(); stdout.write()`.

Last I heard, the official "advice" was to use PowerShell. Clearly everyone's keen to jump on that... (I'm not even sure it's an instant fix either - PS is a much better shell for file manipulation and certainly handles encoding better than type/echo/etc., but I think it will still go back to the OEM CP for executables.)

One other point that came up was UTF-8 handling after redirecting output to a file. I don't see an issue there - UTF-8 is going to be one of the first guesses (with or without a BOM) for text that is not UTF-16, and apps that assume something else are no worse off than with any other codepage.

So I don't have any great answers, sorry. I'd love to see the defaults handle it properly, but opt-in scripts like Drekin's may be the best way to enable it broadly.
msg223403 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-07-18 09:04
I have made some updates in the streams code. Better error handling (getting errno by GetLastError() and raising exception when zero bytes are written on non-zero input). This prevents the infinite loop in BufferedIOWriter.flush() when there is odd number of bytes (WriteConsoleW accepts UTF-16-LE so only even number of bytes is written). It also prevents the same infinite loop when the buffer is too big to write at once (see http://bugs.python.org/issue11395 ). The limit of 32767 bytes was added to raw write.
msg223404 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-18 09:14
@Drekin: Please don't send ZIP files to the bug tracker. It would be much better to have a project on github, Mercurial or something else, to have the history of the source code. You may try tp list all people who contributed to this code.

You may also create a project on pypi.python.org to share your code. This bug tracker is not the best place for that.

When the code will be consider mature (well tested, widely used), we can try to integrate it into Python.
msg223507 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-07-20 10:48
@Victor Stinner: You are right. So I did it. Here are the links to GitHub and PyPI: https://github.com/Drekin/win-unicode-console, https://pypi.python.org/pypi/win_unicode_console.

I also tried to delete the files, but it seems that it is only possible to unlink a file from the issue, but the file itself remains. Is it possible to manage the files?
msg223509 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-20 11:56
Thanks Drekin - I'll point folks to your project as a good place to provide initial feedback, and if that seems promising we can look at potentially integrating the various fixes into Python 3.5
msg223945 - (view) Author: Mark Summerfield (mark) * Date: 2014-07-25 13:24
I used pip to install the win_unicode_console package on windows 7 python 3.3.

It works but wouldn't freeze with cx_freeze because there's no __init__.py file in the win_unicode_console directory.
msg223946 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-25 13:27
Hmm, I'm not sure if that would be a bug in cxFreeze or CPython - I don't think we've tried freezing or zipimporting namespace packages... (either way, adding the __init__.py to win_unicode_console would likely be the quickest fix)
msg223947 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-25 13:28
Since there is now an external project fixing the support of Windows console, I suggest to close this issue as "wontfix". In a few months, if we get enough feedback on this project, we may reconsider integrating it into Python. What do you think?

https://pypi.python.org/pypi/win_unicode_console.

> I used pip to install the win_unicode_console package ...

Please don't use Python bug tracker to report bugs to the package.
msg223948 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-25 13:34
The poor interaction with the Windows command line is still a bug in CPython - we could mark it closed/later but I don't see any value in doing so.

I see Drekin's win_unicode_console module as similar to my own contextlib2 - used to prove the concept, and perhaps iterate on some of the details, but the ultimate long term solution is to fix CPython itself.
msg223949 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-25 13:40
> The poor interaction with the Windows command line is still a bug in CPython - we could mark it closed/later but I don't see any value in doing so.

I don't see any value in keeping the issue open since nobody worked on it last 7 years. I just want to make it clear that we will *not* fix this issue.

Well, in fact I spent a lot of hours trying to find a way to fix the issue, and my conclusion is that it's not possible to handle correctly Unicode (input and output) in a Windows console. Please read the whole issue for the detail.

The win_unicode_console project may improve the Unicode support, but I'm convinced that it still has various issues because it is just not possible to handle all cases.

A workaround is to not use the Windows console, but use IDLE or another shell... Try maybe PowerShell. But PowerShell has at least an issue with the code page 65001 (Microsoft UTF-8): see the issue #21927.
msg223951 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-25 13:52
Based on Steve's last post, the main challenge is that the IO model assumes a bytes-based streaming API - it isn't really set up to cope with a UTF-16 buffering layer.

However, that's not substantially different from the situation when the standard streams are replaced with StringIO objects, and they don't have an underlying buffer object at all. That may be a suitable model for Windows console IO as well - present it to the user in a way that doesn't expose an underlying bytes-based API at all.

Now, it may not be feasible to implement this until we get the startup code cleaned up, but I'm not going to squash interest in improving the situation when it's one of the major culprits behind the "Unicode is even more broken in Python 3 than it is in Python 2" meme.
msg223952 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-25 13:53
Changing targets to Python 3.5, since this is almost certainly going to be too invasive for a maintenance release.
msg224019 - (view) Author: Glenn Linderman (v+python) * Date: 2014-07-26 04:30
This bug deserves to stay open with its high priority (for whatever good that does these last seven years, although I appreciate all the efforts put forth, and have been making heavy use of the workarounds in the patches), because when working with Unicode data in programs, even exception messages are not properly displayed... instead, they cause a secondary exception of not being able to display the data of the original exception to the console.

And writing Unicode data to the console as part of an interactive or command line program has to either be done with the hopes that the data only includes characters in the console, to avoid the failures, or with lots of special encoding calls and character substitutions for code points not in the console repertoire. Remember that the console is supposed to be human readable, not encoded numerically as ascii() would do. 

ascii() is sort of OK for for exception messages, but since that doesn't happen by default, the initial message to the console with Unicode data often doesn't appear, and an extra repetition after a failed message and a rework of the message parameters is required, which impedes productivity.
msg224086 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-07-26 20:33
I have deleted all my old files and added only my current implementation of the stream objects as the only relevant part to this issue.

@Mark Summerfield: I have added __init__.py to the new version of win_unicode_console. If there is any problem, you can start an issue on project GitHub site or contact me.

@Victor Stinner, @Nick Coghlan: What's wrong with looking on Windows wide strings as on UTF-16-LE encoded bytes and building the raw stream objects around this?
msg224095 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-27 01:55
Drekin, you're right, that's a much better way to go, I just didn't think it through :)
msg224596 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-08-02 23:52
To ensure that we're all talking about the same thing, is everybody using the /u unicode output option or /a ansi (which I'm assuming is the default) when running cmd?
msg224605 - (view) Author: Glenn Linderman (v+python) * Date: 2014-08-03 02:20
Mark, the /U and /A switches to CMD only affect (as the help messages say) the output of internal CMD commands. So they would only affect interoperability between internal command output piped to a Python program. The biggest issue in this bug, however, is the output of Python programs not being properly displayed by the console window (often thought of or described as the CMD shell window).

While my biggest concerns have been with output, I suppose input can be an issue also, and running the output of echo, or other internal commands, into Python could be an issue as well. I have pasted a variety of data into Python programs beyond ASCII, but I'm not sure I've gone beyond ANSI or beyond Unicode BMP. Obviously, once output is working properly, input should also be tested and fixed, although I think output is more critical.

With the impetus of your question... I just took some text supplied in another context that has a bunch of characters from different repertoires, including non-BMP, and tried to paste it into the console window.  Here is the text:


こんにちは世界 - fine on Linux, all boxes on Windows (all boxes in Chrome on Linux too)
مرحبا، العالم! - fine on Linux and Windows
안녕하세요, 세계! - fine on Linux, just boxes and punctuation on Windows
(likewise in Chrome)
Привет, мир! - fine on Linux and Windows
Αυτή είναι μια δοκιμή - fine on both, but Google Translate has a
problem with this! It returned "Hello, world!" as the Greek for
"Hello, world!"... so I tried again with "This is a test".
𝓗𝓮𝓵𝓵𝓸, 𝔀𝓸𝓻𝓵𝓭! - not actually a language, but this is astral
In the console window, which I have configured using the Consolas font, the glyphs for the non-ASCII characters in the first two and last lines were boxes... likely Consolas doesn't support those characters. I had written a Python equivalent of "echo", including some workarounds originally posted in this issue, and got exactly the same output as input, with no errors produced. So it is a bit difficult to test characters outside the repertoire of whatever font is configured for the console window.  Perhaps someone that has Chinese or Korean fonts configured for their console window could report on further testing of the above or similar strings.
msg224690 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-08-04 06:43
I think that boxes are ok, it's just missing font. Without active workaroud there is just UnicodeEncodeError (with cp852 for me). There is problem with astral characters – I'm getting each box twice. It is possible that Windows console doesn't handle astral characters at all – it doesn't interpret surrogate pairs.
msg227329 - (view) Author: Stefan Champailler (wiz21) Date: 2014-09-23 08:52
I don't know if this is 100% related, but here I go. Here's a session in a windows console (cmd.exe) :

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\stc>chcp 65001
Active code page: 65001

C:\Users\stc>\PORT-STCA2\opt\python3\python
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print '€'


C:\Users\stc>

So basically, the python interpreters just quits without any message. Windows doesn't comply about python crashing though...

Best regards,

Stefan
msg227330 - (view) Author: Stefan Champailler (wiz21) Date: 2014-09-23 09:07
In my previous comment, I've shown :

print '€'

which is not valid python 3.4.1 (don't why the interpreter didn't complaing though). So I tested again with missing parenthesis added :

C:\PORT-STCA2\pl-PRIVATE\horse>chcp 65001
Active code page: 65001

C:\PORT-STCA2\pl-PRIVATE\horse>python
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("€")


C:\PORT-STCA2\pl-PRIVATE\horse>echo %PROCESSOR_IDENTIFIER%
Intel64 Family 6 Model 42 Stepping 7, GenuineIntel

Exactly the same behaviour.
msg227332 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-09-23 09:22
Drekin, it would be good to be able to incorporate some of your improvements for Python 3.5. Before we could do that, we'd need to review and agree to the PSF Contributor Agreement at https://www.python.org/psf/contrib/contrib-form/

The underlying licensing situation for CPython is a little messy (albeit in a way that doesn't impact users or redistributors), so we use the contributor agreement to ensure we continue to have the right to distribute Python under its current license without making the history any messier, and to preserve the option of switching to a simpler standard license at some point in the future (if it ever becomes feasible to do so).
msg227333 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-09-23 09:35
Stefan Champailler:

The crash you see is maybe not a crash at all. First it has nothing to do with printing, the problem is reading of your input line. That explains why Python exited even before printing the traceback of the SyntaxError. If you try to read input using `sys.stdin.buffer.raw.read(100)` and type Unicode characters, it returns just empty bytes `b''`. So maybe Python REPL then thinks the input just ended and so standardly exits the interpreter.

Why are you using chcp 65001? As far as I know, it doesn't give you the ability to use Unicode in the console. It somehow helps with printing, but there are some issues. `print("\N{euro sign}")` prints the right character, but it prints additional blank line. `sys.stdout.write("\N{euro sign}")` and `sys.stdout.buffer.write("\N{euro sign}".encode("cp65001"))` does the same, but `sys.stdout.buffer.raw.write("\N{euro sign}".encode("cp65001"))` works as expected.

If you want to enter and display Unicode in Python on Windows console, try my package `win_unicode_console`, which tries to solve the issues. See https://pypi.python.org/pypi/win_unicode_console.
msg227337 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-09-23 10:11
Nick Coghlan: Ok, done.
msg227338 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-09-23 10:21
Drekin: thanks! That should get processed by the PSF Secretary before too long, and the "*" to indicate you have signed it will appear by your name.
msg227347 - (view) Author: Stefan Champailler (wiz21) Date: 2014-09-23 11:26
Dear Drekin,

> The crash you see is maybe not a crash at all. First it has nothing 
> to do with printing, the problem is reading of your input line. 

I guessed that, but thanks for pointing out.

> So maybe Python REPL then thinks the input just ended and so standardly exits the interpreter.

Yes. I have showed that because the line of code seemed perfectly valid and innocuous (I moved to Python3 because I *need* good unicode/encodings support). The answer from the REPL is, to me, very suprising. I would have expected a badly displayed character at least and a syntax error at worst. I consider myself quite aware of unicode issues but without any output from the repl, I'd have very hard times figuring out what went wrong, hence my bug report.

So even though this might not qualify as the worse bug in Python, I'd say it is actually quite misleading. But see no complaint here, I'm very happy with Python in general. It's just that I thought I had to tell it to the dev team.

> Why are you using chcp 65001? 

I thought it'd help me with printing unicode (I tried CP437 but problem is the EURO sign is not there, and I *do* need eurosign :-)). But I'll readily admit I didn't read all the stuff about encoing issues on Windows console before trying.

>try my package `win_unicode_console`, which tries to solve the issues. 

I'll certainly do that.

Thank you for your answer

Stefan
msg227354 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2014-09-23 12:54
> The crash you see is maybe not a crash at all.

I'd call it a "crash" - the repl shouldn't exit.  But it's not necessarily part of *this* bug.
msg227373 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-09-23 18:47
Stefan, the Idle Shell handles the BMP subset of Unicode quite well.

>>> print('€')
€
>>>

It is superior to the Windows console in other ways too.  For instance, cut and paste work normally as for other Windows windows.

(cp65001 is know to be buggy and essentially useless. Check the results in any search engine.)
msg227374 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-09-23 19:04
Idle shell handles Unicode characters well, but one cannot enter them using deadkey combinations. See http://bugs.python.org/issue22408.
msg227441 - (view) Author: Stefan Champailler (wiz21) Date: 2014-09-24 11:31
Thank you all for your quick and good answers. This level of responsiveness is truly amazing.

I've played a bit with IPython and it works just fine. I can type the eurosign drectly with "Alt Gr - E" (so I didn't enter a unicode code). So the bug is basically solved for me. But the python-repl behaviour still looks strange to me. So here's a successful IPython session :

C:\PORT-STCA2\pl-PRIVATE\horse>chcp 65001
Active code page: 65001                            
                                                                
C:\PORT-STCA2\pl-PRIVATE\horse>ipython
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

IPython 2.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
               
In [1]: print('€')         
€                                                                                      
In [2]:
msg227450 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-09-24 14:14
Aye, IPython has the advantage of running in a fully initialised browser, with the backend in a fully initialised Python environment.

CPython's setting up the standard streams for the default REPL at a much lower level, and there are quite a few problems with the way we're currently doing it.

I think Drekin's pointed the way towards substantially improving the situation for 3.5, though.
msg228191 - (view) Author: stijn (stijn) Date: 2014-10-02 08:50
New here, but I think this is the correct issue to get info about this unicode problem. On the windows console:

> chcp
Active code page: 437

> type utf.txt
Привет

> chcp 65001
Active code page: 65001

> type utf.txt
Привет

> python --version
Python 3.5.0a0

> cat utf.py
f = open('utf.txt')
l = f.readline()
print(l)
print(len(l))

> python utf.py
Привет
�²ÐµÑ‚
�‚


13

> cat utf_explicit.py
import codecs
f = codecs.open('utf.txt', encoding='utf-8', mode='r')
l = f.readline()
print(l)
print(len(l))

> python utf_explicit.py
Привет
ет


7

I partly read through the page but these things are a bit above my head. Could anyone explain
- how to figure out what codec files returned by open()?
- is there a way to change it globally to utf-8?
- the last case is almost correct: it has the correct number of characters, but the print() still does something wrong. I got this working by using the stream patch, but got another example on which is is not correct, see below. Any way around this?

> type utf2.txt
aαbβcγdδ

> cat utf2.py
import streams
import codecs
streams.enable()
f = codecs.open('utf2.txt', encoding='utf-8', mode='r')
print(f.read(1))
print(f.read(1))
print(f.read(2))
print(f.read(4))

> python utf2.py
a
α
bβc
γdδ
msg228208 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-10-02 10:39
stijn: You are mixing two issues here. One is reading text from a file. There is no problem with it. You just call open(path, encoding=the_encoding_of_the_file). Since the encoding of the file depends on the file, you should provide the information about it.

Another issue is interactively entering and displaying Unicode characters in Python REPL in Windows console. That's what is this issue about. The streams code you use is outdated, for recent version see https://pypi.python.org/pypi/win_unicode_console and https://github.com/Drekin/win-unicode-console. It's an installable package which tries to solve the issue. The readme also contains a summary of the issue. Try the package and let me know if there is any problem.
msg228210 - (view) Author: stijn (stijn) Date: 2014-10-02 10:58
Drekin: you're right for both input and output. Using encoding with plain open() works just fine and using the latest win-unicode-console does give correct output for the second example as well. Thanks!
msg233347 - (view) Author: Glenn Linderman (v+python) * Date: 2015-01-03 04:26
Just to note that another side effect of this bug is that stepping through code where the source contains non-ASCII characters results in pdb producing an error when trying to print the source lines. This makes stepping through such source code impossible.

I mention it, because it hasn't been mentioned before, and debuggers are mysterious and low-level enough, that solutions that might work for normal code, may not solve working with the debugger...
msg233350 - (view) Author: Adam Bartoš (Drekin) * Date: 2015-01-03 10:27
I tried the following code:

import pdb
pdb.set_trace()
print(1 + 2)
print("αβγ∫")

When run in vanilla Python it indeed ends with UnicodeEncodeError as soon as it hits the line with non-ASCII characters. However, the solution via win_unicode_console package seems to work correctly. There is just an issue when you keep calling 'next' even after the main program ended. It ends with a RuntimeError after a few iterations. I didn't know that pdb can continue debugging after the main program has ended.
msg233916 - (view) Author: Dainis Jonitis (Jonitis) Date: 2015-01-13 09:25
Drekins module at https://github.com/Drekin/win-unicode-console is great, but there is small issue with it when running within debugger in Visual Studio (Python Tools for Visual Studio 2.1 installed). Debugger already wraps stdout and stderr inside the visualstudio_py_debugger._DebuggerOutput wrapper and it does not have the fileno() method which win-unicode-console stream.py check_stream() expects. I've created potential fix for it at https://github.com/Drekin/win-unicode-console/pull/4/commits that checks whether object has old_out and uses it to get to fileno. There might be much more robust ways to check for wrappers. I just wanted to make you aware, if this code will be used as basis for Python 3.5.
msg233937 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-01-13 14:06
It sounds like the script should handle the case where someone has already changed stdout better. We wrap the streams in PTVS so we can forward the output into the IDE where Unicode will display properly anyway.

Our wrapper missing fileno is a bug in our side, but finding the original one will break output forwarding.
msg234019 - (view) Author: Adam Bartoš (Drekin) * Date: 2015-01-14 11:08
Note that win-unicode-console replaces the stdio streams rather than wraps them. So the desired state would be Unicode stream objects wrapped by PTVS. There would be no problem if win-unicode-console stream replacement occured before PTVS wraps them, which should be the case when Unicode streams for Windows are hadled by Python 3.5 itself. Is there any way to run custom Python code (like sitecustomize) before PTVS wraps the stdio streams?
msg234020 - (view) Author: Dainis Jonitis (Jonitis) Date: 2015-01-14 11:47
Presumably Unicode streams would also fix file redirects. Currently, if you want to redirect stdout output to file it throws. For example PowerShell:
 C:\Python34\python.exe .\test.py | out-file -Encoding utf8 -FilePath 'test.txt'
msg234096 - (view) Author: Adam Bartoš (Drekin) * Date: 2015-01-15 20:35
File redirection has nothing to do with win-unicode-console and this issue. When stdout is redirected, it is not a tty so win-unicode-console doesn't replace the stream object, which is the right thing to do. You got UnicodeEncodeError because Python creates sys.stdout with encoding based on your locale. In my case it is cp1250 which cannot encode whole Unicode. You can control the encoding used by setting PYTHONIOENCODING environment variable. For example, if you have a script producer.py, which prints some Unicode characters, and consumer.py, which just does print(input()), then "py producer.py | py consumer.py" shows that redirection works (when PYTHONIOENCODING is set e.g. to utf-8).
msg234371 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2015-01-20 11:38
> File redirection has nothing to do with win-unicode-console

Thank you, that comment is spot on - there are multiple issues being conflated here. This bug is purely about the tty/console behaviour.
msg242884 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-05-11 08:07
It sounds like fixing this properly requires fixing issue 17620 first (so the interactive interpreter actually uses sys.stdin), so I've flagged that as a dependency.
msg254405 - (view) Author: (dead1ne) Date: 2015-11-09 20:56
I've tried addressing the output problem by subclassing TextIOWrapper to use the windows functions GetConsoleOutputCP and WideCharToMultiByte.

I've tested this as well as I can without figuring out how to install a better font for the windows console. It appears to work on both python 3.4 and 2.7 although there may be an issue with 2.7 and CJK Extension B and higher codepoints.

Hopefully this is useful in finally resolving the issue. Also I think some maintenance patch for 2.7 is in order as currently it fails utterly if you set the console to 65001 since it doesn't recognize it. Had to wrap all print statements in try/except so it wouldn't fail before testing the wrapper.
msg254407 - (view) Author: Adam Bartoš (Drekin) * Date: 2015-11-09 21:11
dead1ne: Hello, I'm maintaining a package that tries to solve this issue: https://github.com/Drekin/win-unicode-console . There are actually many related problems.
msg272596 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-08-13 17:17
I'm now actively working on this for 3.6.

I've attached my first pass at implementing an alternative raw IO stream that uses the *ConsoleW APIs instead of the CRT. It works fine for basic print() and input() (including handling redirection "properly", which is a separate issue to change the default encoding there, and not issue17620 yet).

I expect there to be many *many* compatibility issues with this change, so we really need everyone interested to try it out and see what doesn't work. So far I haven't even tried looking at readline hooks or similar (though maybe all those issues fall under issue17620?).

Any *specific, technical* information about compatibility issues would be appreciated (i.e. enough that I can fix the issue without having to completely reproduce your setup - I'll be working on doing those myself anyway, so simply saying "X is broken" isn't helpful yet).

It doesn't look like this will be available in 3.6.0a4, but I think I should be able to land it by the first beta.
msg272605 - (view) Author: Adam Bartoš (Drekin) * Date: 2016-08-13 18:40
Hello Steve, that's great you are working on this!

I've ran through your patch and I have the following remarks:

• Since wide chars have two bytes, there may be problem when someone wants to read or write odd number of bytes. If the number is > 1, it's ok since the code may read or write less bytes, but when the number is exactly 1, the code should maybe raise some exception.

• WriteConsoleW always fails with ERROR_NOT_ENOUGH_MEMORY (8) if we try to write more than a certain number of bytes. For me, the number is something like 41000. Unfortunately, it depends on actual heap usage of the console process. I do len = min(len, 32767) in write. The the value chosen comes from issue11395 .

• If someone types something like ^Zfoo, the standard sys.stdin returns '' -- it ignores everything after EOF if it is the first byte read. I reproduce the bahaviour in win_unicode_console to be compatible.

• There may be some issue when someone hits Ctrl-C on input. It seems that in that case, ReadConsoleW fails with ERROR_OPERATION_ABORTED (995) and some signal is asynchronously fired. It may happen that the corresponding KeyboardInterrupt exception occurs later that it should. In my Python/ctypes situation I do an ugly hack – I detect ERROR_OPERATION_ABORTED and in that case I sleep for 0.1 seconds to wait for the exception. I understand that the situation may me different in C.
msg272645 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-08-14 05:26
For compatibility, I think it may be good to add custom implementations of the buffer attribute and detach() method to stdin/out. They should be able to at least read and write ASCII bytes. It might be easiest to keep them as the current BufferedReader/Writer objects. Probably also make stdin/out.fileno() defer to the buffer attribute.

With the current patch that only allows reading and writing in UTF-16 pairs, I forsee a few problems:

* I assume stdin.buffer.raw.readline() will try to read one byte at a time, and will therefore always indicate EOF.
* Incompatibility with using stdin/out.buffer for ASCII character input and output. I suggest testing the patch with “python -m base64”, a use case mentioned earlier in this thread.
msg272662 - (view) Author: Adam Bartoš (Drekin) * Date: 2016-08-14 10:31
There is also the following consequence of (not) having the standard filenos: input() either considers the streams interactive or not. To consider them interactive, standard filenos and isatty are needed on sys.stdin and sys.stdout.

If the streams are considered interactive, input() goes via readlinehook machinery, otherwise it just writes and reads an ordinary file.

The latter means we don't have to touch readline machinery now, the downside is that custom rlcompleters like pyreadline won't work on input().
msg272675 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-08-14 15:25
The current patch actually only affects the raw IO, so the concern would be one of the wrappers trying to work in bytes when it should be dealing in characters. This should be no different from reading a UTF16 file, so either both work or both are broken.

The readline API is most annoying because it assumes strlen is valid for any encoded text (and at so many places it's near unfixable), but there's another issue for this part.

Also, I don't have answers for most of the questions in the review on the patch because I copied all of those bits from fileio.c. Can certainly clean parts of them up for the console API, but I count compatibility with the FileIO class a useful goal where possible.
msg272716 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-08-15 04:46
I'm fairly happy with where my current patch is at (not posted right now - too many different machines involved) and only one test is failing - test_cgi.

The problem seems to be that urllib.parse.unquote() takes an encoding parameter to decode utf-8 encoded bytes with, and cgi infers this parameter from sys.stdin. I don't have the slightest idea why unquote/unquote_to_bytes unconditionally encodes with utf-8 and then allows decoding with an arbitrary encoding, but I guess it works okay for ASCII-compatible encodings?

Unfortunately, utf-16-le is not ASCII compatible, and so this doesn't work. I'm not familiar enough with cgi or urllib.parse to know what to fix - any suggestions?
msg272718 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-08-15 04:52
For more info here, cgi.parse has code like this:

def parse(fp, ...):
    if fp is None:
        fp = sys.stdin

    encoding = getattr(fp, 'encoding', 'latin-1')

    # later on...

    return urllib.parse.parse_qs(a_str, encoding=encoding, ...)

As an easy hack, I added this after assigning encoding:

    if len(' '.encode(encoding, errors='replace')) > 1:
        encoding = 'latin-1'

I have no idea if this is a good idea or not. The current behaviour of mojibake in the parsed result is certainly worse, since the choice of utf-16-le is entirely contained within the parse() function.
msg272720 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-08-15 06:32
I think this CGI thing is a separate bug, just exacerbated by the stdin.encoding problem. :) The urllib.parse.parse_qs() function takes an encoding parameter to figure out what to do with percent-encoded values: "%A9" → b"\xA9".decode(...). This is different lower-level encoding: b"%A9".decode("ascii").

Maybe the best solution is just to remove the encoding argument, and let it revert to UTF-8, as it did before r87998. Or maybe it really should use the locale encoding. (Is that ASCII-compatible on Windows?) It really depends on where the query string was generated (in a browser, pre-computed URL, etc).
msg273999 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-08-31 04:28
New patch attached (1602_2.patch - hopefully the review will work this time too).

I discovered while researching for the PEP that a decent amount of code expects to be able to write ASCII to sys.stdout.buffer (or sys.stdout.buffer.raw). As my first patch required utf-16-le at this point, it was going to cause havoc.

Rather than break that compatibility, I decided that exposing utf-8 and doing the reencoding at the latest possible stage was better. This is also more consistent with how other encoding issues are likely to be resolved, and shouldn't be any less performant, given that previously we were decoding to utf-16 anyway.

The downsides of this is that read(n) now can only read up to n/4 characters, and write(n) has a much more complicated time dealing with large buffers (as we need to cap the number of utf-16-le bytes but return the number of utf-8 bytes - it's not a direct relationship, so there's more work and a little bit of guessing in some cases).

On the upside, the readline handling is simpler as utf-8 is compatible with the existing interface and now sys.stdin.encoding is accurate. I've rolled that fix into this patch (just the myreadline.c change) as they really ought to go in together.
msg274449 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-05 22:24
Updated patch. This implements everything we've been discussing on python-dev
msg274673 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-06 23:50
Latest patch is attached.

PEP acceptance is sounding likely, so feel free to critically review.
msg274884 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-07 20:49
Updated patch based on some suggestions from Eryk.

The PEP has been accepted, so now I just need to land it in the next two days.

Currently "normal" usage here is fine, and some edge cases match the Python 3.5 behaviour. I'm going to go through now and bulk out the tests to try and catch more problems, but modulo that I hope the implementation is nearly ready.
msg274906 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-07 23:08
I can't actually come up with many useful tests for this... so far I can validate that we can open the console IO object from 0, 1, 2, "CON", "CONIN$" and "CONOUT$", get fileno(), check readable()/writable() and close (multiple times without crashing).

Anything else requires a real console with a real person with a real keyboard.

But I fixed a couple of issues in fd handling as a result of the tests, so it's not a complete waste.
msg274912 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-09-07 23:35
I left some minor comments for Doc/whatsnew/3.6.rst on Rietveld.

In Lib/test/test_winconsoleio.py:

* self.assert_() (deprecated) can be replaced by self.assertTrue()

* We can add

  if __name__ == '__main__':
      unittest.main()
msg274939 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-08 01:03
Thanks! I've made the changes you suggested.
msg275003 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-09-08 12:10
+++ b/Lib/test/test_winconsoleio.py
+to real people with real keyborads.
Should be keyboards
There are still assert_() calls in this file (1602_6.patch). Did you miss them?

+++ b/Lib/io.py
+from _io import WindowsConsoleIO
+__all__.append('WindowsConsoleIO')
I think you should either document this class, or remove it from __all__ to clarify it is just an implementation detail.

+++ b/Modules/_io/winconsoleio.c
+_io_WindowsConsoleIO___init___impl
+    PyObject *decodedname = Py_None;
+    Py_INCREF(decodedname);
+    int d = PyUnicode_FSDecoder(nameobj, (void*)&decodedname);
Won’t this leak a reference to Py_None?
(Also, I think needless casting like in the last line can mask mistakes that the compiler would otherwise pick up. Imagine if you got the parameters around the wrong way.)

+read_console_w(HANDLE handle, DWORD maxlen, DWORD *readlen) {
+    /* If we didn't read a full buffer that time, don't try
+       again or we will block a second time. */
I’m not familiar with the Windows APIs involved, but this doesn’t seem robust. What if there were exactly one full buffer waiting, would the next call block without returning anything?
msg275004 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-09-08 12:15
Ah sorry I see Berker’s assert_() comment was _after_ you posted 1602_6.patch, so ignore that bit :)
msg275005 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-09-08 12:18
Also as I understand it, the open() function can return this new class, so the documentation at <https://docs.python.org/3.6/library/functions.html#open> needs updating.
msg275157 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-08 21:15
New changeset 6142d2d3c471 by Steve Dower in branch 'default':
Issue #1602: Windows console doesn't input or print Unicode (PEP 528)
https://hg.python.org/cpython/rev/6142d2d3c471
msg275362 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-09-09 18:00
Martin, the console should be in line-input mode, in which case ReadConsole will block if there isn't at least one line in the input buffer. It reads up to the lesser of a complete line or the number of UTF-16 codes requested. If the previous call read the entire request size but didn't stop on '\n', then we know the next call shouldn't block because the input buffer has at least one '\n' in it.

> I can validate that we can open the console IO object from 
> 0, 1, 2, "CON", "CONIN$" and "CONOUT$", get fileno(), check
> readable()/writable() and close (multiple times without 
> crashing).

I like the idea to have fileno() lazily get a file descriptor on demand, but _open_osfhandle is a low I/O function that uses _open flags -- not 'rb' (int 0x7262) or 'wb' (int 0x7762). ;-)

You can use _O_RDONLY | _O_BINARY or _O_WRONLY | _O_BINARY. But really those values would be ignored anyway. It's not actually opening the file, so it only cares about a few flags. Specifically, in lowio\osfinfo.cpp I see that it looks for _O_APPEND, _O_TEXT, and _O_NOINHERIT. 

On line 329, the following assignment

        if (self->writable)
            access |= GENERIC_WRITE;

should be `access = GENERIC_WRITE`. Requesting both read and write access is an invalid parameter when opening "CON", as can be seen here:

    >>> f = open('CON', 'wb', buffering=0)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: [WinError 87] The parameter is incorrect: 'CON'

CONOUT$ works, of course:

    >>> f = open('CONOUT$', 'wb', buffering=0)
    >>> f
    <_io._WindowsConsoleIO mode='wb' closefd=True>

Lastly, for a readall that starts with ^Z, you're still breaking out of the loop before incrementing len, which is thus 0 when subsequently checked. It ends up calling WideCharToMultiByte with len == 0, which fails.

    >>> sys.stdin.buffer.raw.read()
    ^Z
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: [WinError 87] The parameter is incorrect

> I can't actually come up with many useful tests for this... 

ctypes can be used to write to the input buffer and read from a screen buffer. For the latter it helps to first create and activate a scratch screen buffer, initialized to NULs to make it easy to read back everything that was written up to the current cursor position. I have existing ctypes code for this, written to solve the problem of a subprocess that stubbornly writes directly to the console instead of writing to stdout/stderr pipes.
msg275510 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-09-10 00:30
Okay so regarding blocking reads with a full buffer, what you are saying is the second check to break the read loop should be sufficient:

+/* If the buffer ended with a newline, break out */
+if (buf[*readlen - 1] == '\n')
+    break;
msg277047 - (view) Author: Dāvis (davispuh) * Date: 2016-09-20 17:11
Steve Dower (steve.dower)
> [...]
> Anything else requires a real console with a real person with a real keyboard.

FYI, not really, it is possible to fully automatically test console's output/input using WinAPI functions like WriteConsoleInput, GetConsoleScreenBufferInfo, ReadConsoleOutputCharacter

very recently I wrote such test, you can look at it as example http://review.source.kitware.com/gitweb?p=KWSys.git;a=blob;f=testConsoleBuf.cxx;hb=HEAD

it tests all 3 cases when output is actual console, redirected pipe and file.
msg277048 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-20 17:41
Oh nice, I like that. We should definitely add some tests using that (though it seems like quite a big task... maybe I'll open a new issue for it).
msg277050 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-20 17:45
Created issue28217 for adding these tests.
History
Date User Action Args
2022-04-11 14:56:28adminsetgithub: 45943
2017-03-26 06:19:27martin.panterlinkissue23901 superseder
2017-03-26 06:06:32martin.panterlinkissue29907 superseder
2016-10-22 10:46:13THRlWiTisetnosy: + THRlWiTi
2016-09-20 17:45:06steve.dowersetsuperseder: Add interactive console tests
messages: + msg277050
2016-09-20 17:41:29steve.dowersetmessages: + msg277048
2016-09-20 17:11:07davispuhsetnosy: + davispuh
messages: + msg277047
2016-09-10 00:30:01martin.pantersetmessages: + msg275510
2016-09-09 18:00:28eryksunsetnosy: + eryksun
messages: + msg275362
2016-09-09 16:42:53steve.dowersetstatus: open -> closed
dependencies: - Python interactive console doesn't use sys.stdin for input
resolution: fixed
stage: patch review -> resolved
2016-09-09 08:56:06ssbarneasetnosy: - ssbarnea
2016-09-08 21:15:29python-devsetnosy: + python-dev
messages: + msg275157
2016-09-08 12:18:04martin.pantersetmessages: + msg275005
2016-09-08 12:15:14martin.pantersetmessages: + msg275004
2016-09-08 12:10:06martin.pantersetmessages: + msg275003
2016-09-08 01:03:35steve.dowersetmessages: + msg274939
2016-09-07 23:35:05berker.peksagsetnosy: + berker.peksag
messages: + msg274912
2016-09-07 23:08:19steve.dowersetfiles: + 1602_6.patch

messages: + msg274906
2016-09-07 20:49:15steve.dowersetfiles: + 1602_5.patch

messages: + msg274884
2016-09-06 23:50:32steve.dowersetfiles: + 1602_4.patch

messages: + msg274673
2016-09-05 22:24:15steve.dowersetfiles: + 1602_3.patch

messages: + msg274449
2016-08-31 04:28:35steve.dowersetfiles: + 1602_2.patch

messages: + msg273999
2016-08-29 21:25:44gurnecsetnosy: + gurnec
2016-08-15 10:34:50christian.heimessetnosy: - christian.heimes
2016-08-15 10:34:04BreamoreBoysetnosy: - BreamoreBoy
2016-08-15 06:32:00martin.pantersetmessages: + msg272720
2016-08-15 04:52:28steve.dowersetmessages: + msg272718
2016-08-15 04:46:35steve.dowersetmessages: + msg272716
2016-08-14 15:25:07steve.dowersetmessages: + msg272675
2016-08-14 10:31:16Drekinsetmessages: + msg272662
2016-08-14 05:26:53martin.pantersetnosy: + martin.panter

messages: + msg272645
stage: needs patch -> patch review
2016-08-13 18:40:42Drekinsetmessages: + msg272605
2016-08-13 17:17:41steve.dowersetfiles: + winconsoleio.diff

nosy: + ned.deily
versions: + Python 3.6, - Python 3.5
messages: + msg272596

assignee: steve.dower
2015-11-09 21:11:32Drekinsetmessages: + msg254407
2015-11-09 20:56:51dead1nesetfiles: + wincontest.py
nosy: + dead1ne
messages: + msg254405

2015-05-19 00:48:10escapewindowsetnosy: + escapewindow
2015-05-12 05:09:39ncoghlanlinkissue22555 dependencies
2015-05-11 09:59:40paul.mooresetnosy: + paul.moore
2015-05-11 08:07:25ncoghlansetdependencies: + Python interactive console doesn't use sys.stdin for input
messages: + msg242884
2015-04-16 10:05:01lilydjwgsetnosy: + lilydjwg
2015-04-10 08:24:13piotr.dobrogostsetnosy: + piotr.dobrogost
2015-02-13 21:42:39terry.reedylinkissue23424 superseder
2015-01-20 11:38:05mhammondsetmessages: + msg234371
2015-01-15 20:35:41Drekinsetmessages: + msg234096
2015-01-14 11:47:01Jonitissetmessages: + msg234020
2015-01-14 11:08:13Drekinsetmessages: + msg234019
2015-01-13 14:06:06steve.dowersetmessages: + msg233937
2015-01-13 09:25:15Jonitissetnosy: + Jonitis
messages: + msg233916
2015-01-03 10:27:07Drekinsetmessages: + msg233350
2015-01-03 04:26:39v+pythonsetmessages: + msg233347
2014-10-02 10:58:29stijnsetmessages: + msg228210
2014-10-02 10:39:37Drekinsetmessages: + msg228208
2014-10-02 08:50:55stijnsetnosy: + stijn
messages: + msg228191
2014-09-24 14:14:54ncoghlansetmessages: + msg227450
2014-09-24 11:31:17wiz21setmessages: + msg227441
2014-09-23 19:04:01Drekinsetmessages: + msg227374
2014-09-23 18:47:58terry.reedysetmessages: + msg227373
2014-09-23 12:54:40mhammondsetmessages: + msg227354
2014-09-23 11:26:14wiz21setmessages: + msg227347
2014-09-23 10:21:44ncoghlansetmessages: + msg227338
2014-09-23 10:11:59Drekinsetmessages: + msg227337
2014-09-23 09:35:53Drekinsetmessages: + msg227333
2014-09-23 09:22:11ncoghlansetmessages: + msg227332
2014-09-23 09:07:02wiz21setmessages: + msg227330
2014-09-23 08:52:46wiz21setnosy: + wiz21
messages: + msg227329
2014-08-04 06:43:50Drekinsetmessages: + msg224690
2014-08-03 02:20:55v+pythonsetmessages: + msg224605
2014-08-02 23:52:07BreamoreBoysetnosy: + BreamoreBoy
messages: + msg224596
2014-07-28 12:45:35vstinnersetnosy: - vstinner
2014-07-27 01:55:57ncoghlansetmessages: + msg224095
2014-07-26 20:33:55Drekinsetfiles: + streams.py

messages: + msg224086
2014-07-26 20:09:47Drekinsetfiles: - win_unicode_console.zip
2014-07-26 20:07:52Drekinsetfiles: - win_unicode_console.zip
2014-07-26 20:07:12Drekinsetfiles: - streams.py
2014-07-26 20:05:58Drekinsetfiles: - win_unicode_console_3.py
2014-07-26 20:05:03Drekinsetfiles: - i.py
2014-07-26 20:02:41Drekinsetfiles: - win_unicode_console_2.py
2014-07-26 04:30:46v+pythonsetmessages: + msg224019
2014-07-25 13:53:50ncoghlansetmessages: + msg223952
versions: + Python 3.5, - Python 3.3, Python 3.4
2014-07-25 13:52:39ncoghlansetmessages: + msg223951
2014-07-25 13:40:42vstinnersetmessages: + msg223949
2014-07-25 13:34:01ncoghlansetmessages: + msg223948
2014-07-25 13:28:43vstinnersetmessages: + msg223947
2014-07-25 13:27:48ncoghlansetmessages: + msg223946
2014-07-25 13:24:59marksetmessages: + msg223945
2014-07-20 11:56:54ncoghlansetmessages: + msg223509
2014-07-20 10:48:20Drekinsetmessages: + msg223507
2014-07-20 10:33:49Drekinsetfiles: - win_unicode_console.py
2014-07-18 09:14:01vstinnersetmessages: + msg223404
2014-07-18 09:04:06Drekinsetfiles: + win_unicode_console.zip

messages: + msg223403
2014-06-21 15:12:49steve.dowersetmessages: + msg221178
2014-06-21 12:27:21ncoghlansetpriority: normal -> high
nosy: + ncoghlan, steve.dower
messages: + msg221175

2014-06-21 12:20:46ncoghlansetpriority: low -> normal
2014-04-11 17:43:23terry.reedylinkissue21164 superseder
2013-09-15 13:26:32Drekinsetfiles: + win_unicode_console.zip

messages: + msg197773
2013-09-15 06:32:11terry.reedysetmessages: + msg197752
2013-09-15 06:20:35v+pythonsetmessages: + msg197751
2013-09-14 10:15:17Drekinsetfiles: + streams.py

messages: + msg197700
2013-03-24 13:40:48floxsetnosy: + flox
2013-03-24 13:03:32Drekinsetfiles: + win_unicode_console_3.py
2013-03-24 13:02:25Drekinsetfiles: + i.py

messages: + msg185135
versions: + Python 3.4
2012-09-22 14:27:21Drekinsetfiles: + win_unicode_console_2.py

messages: + msg170999
2012-09-21 20:51:32vstinnersetmessages: + msg170915
2012-09-21 16:20:08Drekinsetfiles: + win_unicode_console.py
nosy: + Drekin
messages: + msg170899

2012-07-03 19:54:34v+pythonsetmessages: + msg164619
2012-07-03 19:44:59terry.reedysetmessages: + msg164618
2012-07-03 06:47:08v+pythonsetmessages: + msg164580
2012-07-03 05:56:18terry.reedysetmessages: + msg164578
2012-07-03 05:14:25v+pythonsetmessages: + msg164572
2012-05-26 08:58:19v+pythonsetmessages: + msg161651
2012-05-21 23:59:27brian.curtinsetnosy: - brian.curtin
2012-05-21 23:58:47v+pythonsetmessages: + msg161308
2012-05-19 20:44:27Matt.Mackallsetnosy: - Matt.Mackall
2012-05-19 19:25:21davidsarahsetmessages: + msg161153
2012-05-19 18:55:40giampaolo.rodolasetmessages: + msg161151
2012-05-19 18:24:54giampaolo.rodolasetnosy: + giampaolo.rodola
2012-05-16 17:54:02terry.reedysetmessages: + msg160897
2012-05-16 08:58:12v+pythonsetmessages: + msg160813
2012-05-16 08:57:27v+pythonsetmessages: + msg160812
2012-04-05 11:47:07vstinnersetmessages: + msg157569
2012-03-11 18:18:17loewislinkissue14253 superseder
2012-01-21 05:59:05akirasetnosy: + akira
2011-12-07 21:18:22Matt.Mackallsetnosy: + Matt.Mackall
messages: + msg148990
2011-10-26 23:50:17vstinnersetmessages: + msg146471
2011-10-19 20:58:45vstinnersetfiles: + test_win_console.py

messages: + msg145964
2011-10-19 20:57:38vstinnersetfiles: + win_console.patch

messages: + msg145963
2011-10-19 20:42:02pitrousetnosy: + mhammond, brian.curtin
2011-10-19 11:55:13vstinnersetfiles: + unicode3.py

messages: + msg145899
2011-10-19 11:52:57vstinnersetmessages: + msg145898
2011-04-10 20:33:45smerlinsetnosy: + smerlin
2011-03-26 19:28:00v+pythonsetmessages: + msg132268
2011-03-26 19:22:48davidsarahsetmessages: + msg132266
2011-03-26 01:45:12v+pythonsetmessages: + msg132208
2011-03-26 00:28:45brian.curtinsetnosy: - brian.curtin
2011-03-26 00:18:52davidsarahsetmessages: + msg132191
2011-03-25 23:37:05davidsarahsetmessages: + msg132184
2011-03-25 02:12:29vstinnersetmessages: + msg132067
2011-03-25 01:30:03v+pythonsetmessages: + msg132065
2011-03-25 01:21:22davidsarahsetmessages: + msg132064
2011-03-25 00:59:19v+pythonsetmessages: + msg132062
2011-03-25 00:54:35davidsarahsetmessages: + msg132061
2011-03-25 00:39:48davidsarahsetmessages: + msg132060
2011-03-23 04:54:35davidsarahsetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, hippietrail, ssbarnea, brian.curtin, davidsarah, santoso.wijaya, David.Sankel
messages: + msg131854
2011-03-21 14:25:19vstinnersetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, hippietrail, ssbarnea, brian.curtin, davidsarah, santoso.wijaya, David.Sankel
messages: + msg131657
2011-03-04 17:47:18santoso.wijayasetnosy: + santoso.wijaya
2011-02-11 16:06:19hippietrailsetnosy: + hippietrail
2011-02-03 02:47:04davidsarahsetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg127782
2011-01-15 08:51:05ssbarneasetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg126319
2011-01-15 01:46:39v+pythonsetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg126308
2011-01-14 23:38:10vstinnersetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg126304
2011-01-14 23:31:45vstinnersetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg126303
2011-01-14 19:06:13brian.curtinsetnosy: lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg126288
2011-01-14 19:00:02terry.reedysetnosy: + terry.reedy
messages: + msg126286
2011-01-12 05:32:09davidsarahsetfiles: + doc-patch.diff
nosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
title: windows console doesn't print utf8 (Py30a2) -> windows console doesn't print or input Unicode
2011-01-10 23:38:43davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg125956
2011-01-10 22:47:09amaury.forgeotdarcsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg125947
2011-01-10 22:21:59davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg125942
2011-01-10 22:15:04davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg125938
2011-01-10 12:33:17vstinnersetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg125899
2011-01-10 11:50:02amaury.forgeotdarcsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
messages: + msg125898
2011-01-10 10:07:44tim.goldensetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, David.Sankel
versions: + Python 3.3, - Python 3.1, Python 3.2
2011-01-10 10:05:11tim.goldensetstatus: closed -> open

nosy: - BreamoreBoy
messages: + msg125890

resolution: not a bug -> (no value)
2011-01-10 09:50:41davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125889
2011-01-10 02:27:28v+pythonsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125877
2011-01-09 19:23:56davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125852
2011-01-09 09:03:08vstinnersetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125833
2011-01-09 07:28:45davidsarahsetnosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125826
2011-01-09 06:52:50v+pythonsetfiles: + unicode2.py
nosy: lemburg, tzot, amaury.forgeotdarc, pitrou, vstinner, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, ssbarnea, brian.curtin, davidsarah, BreamoreBoy, David.Sankel
messages: + msg125824
2011-01-09 05:32:00davidsarahsetnosy: + davidsarah
messages: + msg125823
2010-11-08 01:26:30vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg120700
2010-11-04 15:22:02tzotsetmessages: + msg120416
2010-11-04 15:15:02vstinnersetfiles: + sys_write_stdout.patch
keywords: + patch
messages: + msg120415
2010-11-04 15:09:59vstinnersetmessages: + msg120414
2010-11-04 03:07:38David.Sankelsetnosy: + David.Sankel
2010-09-18 15:39:35BreamoreBoysetnosy: + tim.golden, brian.curtin, BreamoreBoy
messages: + msg116801
2010-06-20 09:00:57vstinnersetmessages: + msg108228
2010-06-19 12:09:58pitrousetstage: test needed -> needs patch
versions: + Python 3.2, - Python 3.0
2010-06-19 12:05:00christophsetnosy: + christoph
messages: + msg108173
2010-01-12 16:09:36ssbarneasetnosy: + ssbarnea
2009-10-26 17:06:21v+pythonsetmessages: + msg94496
2009-10-26 09:19:55lemburgsetnosy: + lemburg
messages: + msg94483
2009-10-26 09:07:28marksetmessages: + msg94480
2009-10-25 00:06:49v+pythonsetnosy: + v+python
messages: + msg94445
2009-09-19 00:38:48tzotsetmessages: + msg92854
2009-05-19 09:46:13amaury.forgeotdarcsetmessages: + msg88077
2009-05-19 07:54:22pitrousetnosy: + amaury.forgeotdarc
2009-05-19 00:09:03tzotsetnosy: + tzot
messages: + msg88059
2009-05-03 23:57:04pitrousetnosy: + pitrou
messages: + msg87086
2009-05-03 23:51:10vstinnersetnosy: vstinner, christian.heimes, mark, ezio.melotti
components: + Windows
2009-05-03 23:50:37vstinnersetnosy: vstinner, christian.heimes, mark, ezio.melotti
components: - Windows
2009-04-27 23:38:12ajaksu2setnosy: + vstinner, ezio.melotti
versions: + Python 3.1

stage: test needed
2008-01-06 22:29:44adminsetkeywords: - py3k
versions: Python 3.0
2007-12-15 02:08:14christian.heimessetpriority: low
keywords: + py3k
messages: + msg58651
nosy: + christian.heimes
2007-12-14 11:31:28marksetmessages: + msg58621
2007-12-12 09:56:30markcreate