This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author paul.moore
Recipients paul.moore, r.david.murray, steve.dower, tim.golden, vstinner, zach.ware
Date 2015-04-09.20:44:32
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1428612272.94.0.497417303255.issue23901@psf.upfronthosting.co.za>
In-reply-to
Content
Generally, my understanding is that the console does pretty badly at supporting Unicode via the normal WriteFile APIs and the "code page" support (mainly because support for the UTF8 code page is rubbish). But the WriteConsole API does, I believe, have pretty solid Unicode support (it's what Powershell uses, for example). Typically, attempts to support Unicode for Python console output (e.g., win_unicode_console on PyPI) deal with this by making a file-like object that calls WriteConsole under the hood, and replaces sys.stdout with this. The problem with this approach is that it isn't a "normal" text stream object (there's no underlying raw bytes buffer), so the result isn't seamless (although win_unicode_console is pretty good).

What I noticed is that the C runtime supports an _O_U8TEXT mode for console file descriptors, at the (bytes) write() level. So that could be seamlessly integrated into the bytes IO layer of the Python IO stack.

As far as I can tell from the description, the way it works is to treat a block of bytes written via write() as a UTF8 string, encode it to Unicode and write it to the console via WriteConsole(). (I haven't checked the CRT source, but that seems like the most likely implementation).

Code speaks louder than words, obviously, and I do intend to produce a trial implementation. But that'll take a bit of time because I need to understand how the IO stack hangs together first.

An alternative approach would be a RawIOBase implementation that wrote bytes to the console by (re-)decoding them from UTF8 and using WriteConsole, then wrapping that in the usual buffered IO and text IO layers (with the text IO layer using UTF8 encoding). That may well be implementable in pure Python, and make a good prototype implementation. Hmm...
History
Date User Action Args
2015-04-09 20:44:32paul.mooresetrecipients: + paul.moore, vstinner, tim.golden, r.david.murray, zach.ware, steve.dower
2015-04-09 20:44:32paul.mooresetmessageid: <1428612272.94.0.497417303255.issue23901@psf.upfronthosting.co.za>
2015-04-09 20:44:32paul.moorelinkissue23901 messages
2015-04-09 20:44:32paul.moorecreate