Author haypo
Recipients David.Sankel, amaury.forgeotdarc, brian.curtin, christian.heimes, christoph, davidsarah, ezio.melotti, haypo, hippietrail, lemburg, mark, pitrou, santoso.wijaya, sorin, terry.reedy, tim.golden, tzot, v+python
Date 2011-03-21.14:25:19
SpamBayes Score 1.22624e-10
Marked as misclassified No
Message-id <1300717520.32.0.790975839962.issue1602@psf.upfronthosting.co.za>
In-reply-to
Content
I did some tests with WriteConsoleW():
 - with raster fonts, U+00E9 is displayed as é, U+0141 as L and U+042D as ? => good (work as expected)
 - with TrueType font (Lucida), U+00E9 is displayed as é, U+0141 as Ł and U+042D as Э => perfect! (all characters are rendered correctly)

Now I agree that WriteConsoleW() is the best solution to fix this issue.

My test code (added to Python/sysmodule.c):
---------
static PyObject *
sys_write_stdout(PyObject *self, PyObject *args)
{
    PyObject *textobj;
    wchar_t *text;
    DWORD written, total;
    Py_ssize_t len, chunk;
    HANDLE console;
    BOOL ok;

    if (!PyArg_ParseTuple(args, "U:write_stdout", &textobj))
        return NULL;

    console = GetStdHandle(STD_OUTPUT_HANDLE);
    if (console == INVALID_HANDLE_VALUE) {
        PyErr_SetFromWindowsErr(GetLastError());
        return NULL;
    }

    text = PyUnicode_AS_UNICODE(textobj);
    len = PyUnicode_GET_SIZE(textobj);
    total = 0;
    while (len != 0) {
        if (len > 10000)
            /* WriteConsoleW() is limited to 64 KB (32,768 UTF-16 units), but
               this limit depends on the heap usage. Use a safe limit of 10,000
               UTF-16 units.
               http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232 */
            chunk = 10000;
        else
            chunk = len;
        ok = WriteConsoleW(console, text, chunk, &written, NULL);
        if (!ok) 
            break;
        text += written;
        len -= written;
        total += written;
    }
    return PyLong_FromUnsignedLong(total);
}
---------


The question is now how to integrate WriteConsoleW() into Python without breaking the API, for example:
 - Should sys.stdout be a TextIOWrapper or not?
 - Should sys.stdout.fileno() returns 1 or raise an error?
 - What about sys.stdout.buffer: should sys.stdout.buffer.write() calls WriteConsoleA() or sys.stdout should not have a buffer attribute? I think that many modules and programs now rely on sys.stdout.buffer to write directly bytes into stdout. There is at least python -m base64.
 - Should we use ReadConsoleW() for stdin?
History
Date User Action Args
2011-03-21 14:25:20hayposetrecipients: + haypo, lemburg, terry.reedy, tzot, amaury.forgeotdarc, pitrou, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, hippietrail, sorin, brian.curtin, davidsarah, santoso.wijaya, David.Sankel
2011-03-21 14:25:20hayposetmessageid: <1300717520.32.0.790975839962.issue1602@psf.upfronthosting.co.za>
2011-03-21 14:25:19haypolinkissue1602 messages
2011-03-21 14:25:19haypocreate