Author vstinner
Recipients Naman-Bhalla, barry, benjamin.peterson, doko, ezio.melotti, jaysinh.shukla, mrabarnett, ncoghlan, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, xtreak, zach.ware
Date 2019-02-28.23:05:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1551395144.26.0.557817599941.issue29571@roundup.psfhosted.org>
In-reply-to
Content
> This seems to have broken test_re on Windows, see https://ci.appveyor.com/project/python/cpython/build/3.7.0a0.1

It seems like the ANSI code page is 1252 ("cp1252").

== CPython 3.7.0a0 (master:d31b28e16a2387d0251df948ef5d1b33d4357652, Mar 5 2017, 21:47:06) [MSC v.1900 32 bit (Intel)]
==   Windows-2012ServerR2-6.3.9600-SP0 little-endian
==   hash algorithm: siphash24 32bit
==  cwd: C:\projects\cpython\build\test_python_1844
==  encodings: locale=cp1252, FS=utf-8
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=1, verbose=0, bytes_warning=2, quiet=0, hash_randomization=1, isolated=0)
Using random seed 5949816

...

FAIL: test_locale_flag (test.test_re.ReTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\projects\cpython\lib\test\test_re.py", line 1422, in test_locale_flag
    self.assertTrue(pat.match(bletter))
AssertionError: None is not true

> getpreferredencoding() takes a completely different path on windows
> (returns a codepage) and isn't related to the C locale.

On my Windows 10 with Python 3.8, getpreferredencoding() (and getpreferredencoding(False)) returns "cp1252", getlocale(LC_CTYPE)[1] returns "1252". Python has an alias "1252" for "cp1252".

On Windows, getpreferredencoding() is implemented as _locale._getdefaultlocale()[1]. _getdefaultlocale()[1] is implemented with:

    PyOS_snprintf(encoding, sizeof(encoding), "cp%d", GetACP());

At the end, it's the ANSI code page (1252).

--

I don't understand how the change ace5c0fdd9b962e6e886c29dbcea72c53f051dc4 introduced a regression. And so I don't understand how commit 21a74312f2d1ddee71fade709af49d078085ec30 (revert) could fix anything.

--

On my PR 12099, two Windows CI run and both succeeded:

* AppVeyor: pythoninfo says "locale.encoding: cp1252"
  https://ci.appveyor.com/project/python/cpython/builds/22726025
* Windows PR Tests on Azure Pipeline: pythoninfo also says "locale.encoding: cp1252"

When the change ace5c0fdd9b962e6e886c29dbcea72c53f051dc4 was merged, Python had no working Windows CI. Things evolved at lot in the meanwhile.

I also tested manually my PR 12099 on my Windows 10 VM which also uses cp1252: test_re pass.

--

re.LOCALE flag of re.compile() for a bytes pattern uses the following function of Modules/_sre.c:

LOCAL(int)
char_loc_ignore(SRE_CODE pattern, SRE_CODE ch)
{
    return ch == pattern
        || (SRE_CODE) sre_lower_locale(ch) == pattern
        || (SRE_CODE) sre_upper_locale(ch) == pattern;
}
History
Date User Action Args
2019-02-28 23:05:44vstinnersetrecipients: + vstinner, barry, doko, paul.moore, ncoghlan, tim.golden, benjamin.peterson, ezio.melotti, mrabarnett, zach.ware, serhiy.storchaka, steve.dower, jaysinh.shukla, Naman-Bhalla, xtreak
2019-02-28 23:05:44vstinnersetmessageid: <1551395144.26.0.557817599941.issue29571@roundup.psfhosted.org>
2019-02-28 23:05:44vstinnerlinkissue29571 messages
2019-02-28 23:05:43vstinnercreate