This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients Paul Monson, eryksun, methane, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Date 2019-05-09.03:31:49
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1557372709.91.0.0173508694847.issue36778@roundup.psfhosted.org>
In-reply-to
Content
> FYI, I expect cp65001 will be used more widely in near future,
[...]
> It seems use `SetConsoleOutputCP(65001)` and `SetConsoleCP(65001)`.

Unless PYTHONLEGACYWINDOWSSTDIO is defined, Python 3.6+ doesn't use the console's codepage-based interface (except for low-level os.read and os.write). Console files uses the wide-character console API internally, and have a "utf-8" encoding. "cp65001" isn't a factor in this context.

This issue probably occurs due to the encoding returned by locale.getpreferredencoding(). This calls _locale._getdefaultlocale, which returns a tuple that mixes the user locale with the system ANSI codepage. For example, with ANSI set to UTF-8 (Windows 10):

    >>> _locale._getdefaultlocale()
    ('en_GB', 'cp65001')

The Universal CRT special cases CP_UTF8 (codepage 65001) as "utf8" and accepts "utf-8" as an alias. For example, after setting the ANSI codepage to UTF-8:

    >>> locale.setlocale(locale.LC_CTYPE, '')
    'English_United Kingdom.utf8'

Python could similarly special case CP_UTF8 as "utf-8" in _locale._getdefaultlocale.
History
Date User Action Args
2019-05-09 03:31:49eryksunsetrecipients: + eryksun, paul.moore, vstinner, tim.golden, methane, zach.ware, serhiy.storchaka, steve.dower, Paul Monson
2019-05-09 03:31:49eryksunsetmessageid: <1557372709.91.0.0173508694847.issue36778@roundup.psfhosted.org>
2019-05-09 03:31:49eryksunlinkissue36778 messages
2019-05-09 03:31:49eryksuncreate