This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients eryksun, lemburg, methane, vstinner
Date 2021-03-19.11:35:25
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1616153725.56.0.755889802744.issue43552@roundup.psfhosted.org>
In-reply-to
Content
> Read the ANSI code page on Windows,

I don't see why the Windows implementation is inconsistent with POSIX here. If it were changed to be consistent, the default encoding at startup would remain the same, since setlocale(LC_CTYPE, "") uses the process code page from GetACP(). In many if not most cases, no one would be the wiser. But it seems to me that if a script calls setlocale(LC_CTYPE, "el_GR"), then it clearly wants to encode Greek text (code page 1253). open() with encoding passed as None or "locale" should respect this. Similarly if it calls setlocale(LC_CTYPE, ".UTF-8"), then it wants the default locale (language/region), but with UTF-8 encoding.

The following is a snippet to get the current locale encoding with ucrt in Windows:

    #include <locale.h>

    int cp = 0;
    __crt_locale_data_public *locale_data;

    _locale_t locale = _get_current_locale();
    if (locale) {
        locale_data = (__crt_locale_data_public *)locale->locinfo;
        cp = locale_data->_locale_lc_codepage;
       _free_locale(locale);
    }

    if (cp == 0) {
    /* "C" locale. The CRT in effect uses Latin-1 (cp28591), but 
       Windows Python prefers the process code page. */
        cp = GetACP();
    }

With ucrt, the C runtime was changed to hide most of the locale definition that was previously public, but it intentionally defines __crt_locale_data_public, so I'm assuming it's there for programs to use. That said, the fact that we have to cast locinfo seems suspect to me. Steve Dower could maybe check with the ucrt devs to ensure that this is supported. 

There's also ___lc_codepage() to get the same value more simply, and also more efficiently since the current locale data doesn't have to be copied and freed. However, it's documented as internal and could be removed (unlikely as that is).
History
Date User Action Args
2021-03-19 11:35:25eryksunsetrecipients: + eryksun, lemburg, vstinner, methane
2021-03-19 11:35:25eryksunsetmessageid: <1616153725.56.0.755889802744.issue43552@roundup.psfhosted.org>
2021-03-19 11:35:25eryksunlinkissue43552 messages
2021-03-19 11:35:25eryksuncreate