This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: [Windows] Python 2 mishandles console code page after setlocale
Type: behavior Stage: resolved
Components: Windows Versions: Python 2.7
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Segev Finer, paul.moore, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2018-07-30 17:49 by Segev Finer, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg322683 - (view) Author: Segev Finer (Segev Finer) * Date: 2018-07-30 17:49
Found by trying to use pip: https://github.com/pypa/pip/issues/5665.

This is likely affected by the console code page.

Python version: 2.7.15 64 bit
OS: Windows 10.0.17134.165 x64
The console locale is set to cp872.
The console font is consolas.

Apparently, msvcrt does charset conversion when writing to its file descriptors based on the set locale! and it's even special cased to handle the OEM console code page (You can see this in crt/src/write.c:_write_nolock if you have MSVC 2008).

When the "C" locale is set, no conversion is done. Python encodes to the OEM code page, and it passes through to the console unscathed. But once you do setlocale than the CRT expects you to use the ANSI code page, but Python will be encoding to the OEM code page which will result in this error from fwrite.

file.encoding in Python 2 is also not settable directly from Python (C API only), it's only used for stdio and set internally on startup: Python/pythonrun.c:349-378.

I found this describing this: Why printf can display non-ASCII characters when ā€œCā€ locale is used?.

    #!/usr/bin/env python2
    from __future__ import print_function
    import locale

    print(u' |\u2588')  # Works
    locale.setlocale(locale.LC_ALL, '')
    print(u' |\u2588')  # IOError: [Errno 42] Illegal byte sequence
msg322689 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-07-30 20:10
I have no intention of changing how encodings work in 2.7. Use codecs.open if you want to specify text IO encoding.

Someone else may be willing to look at it.
msg367291 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-04-26 03:19
As this issue is specific to 2.7, I'm closing it.
Date User Action Args
2022-04-11 14:59:04adminsetgithub: 78464
2020-04-26 03:19:09zach.waresetstatus: open -> closed
resolution: out of date
messages: + msg367291

stage: resolved
2018-07-30 20:11:12steve.dowersetnosy: - steve.dower
2018-07-30 20:10:53steve.dowersetnosy: paul.moore, tim.golden, zach.ware, steve.dower, Segev Finer
messages: + msg322689
2018-07-30 17:49:57Segev Finersettitle: Python 2 mishandles console code page after setlocale -> [Windows] Python 2 mishandles console code page after setlocale
2018-07-30 17:49:31Segev Finercreate