classification
Title: test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001
Type: behavior Stage: resolved
Components: Tests, Windows Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Paul Monson, eryksun, inada.naoki, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2019-05-02 20:46 by Paul Monson, last changed 2019-06-04 15:09 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 13069 closed Paul Monson, 2019-05-02 20:48
PR 13072 closed Paul Monson, 2019-05-03 01:23
PR 13110 closed vstinner, 2019-05-06 15:03
PR 13211 closed Paul Monson, 2019-05-08 21:13
PR 13230 merged vstinner, 2019-05-09 23:07
PR 13240 merged vstinner, 2019-05-10 23:35
PR 13807 merged vstinner, 2019-06-04 14:14
Messages (31)
msg341316 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-05-02 20:46
Windows desktop skus have a default ANSI codepage (returned by GetACP()) of 1252 (Western European).  Windows IoT Core and Windows Nano Server have a default codepage of 65001 (UTF-8). 

This causes test_site.StartupImportTests.test_startup_imports to fail on Windows IoT Core and Windows Nano Server because cp65001.py is loaded instead of the frozen cp1252.py at startup.

I tried changing the default codepage to 65001 on my dev machine and rebuilding Python and it had no effect that I could tell on the generated frozen importlibs.

The simplest solutions would be for the test_startup_imports test to be skipped or changed to pass when the locale.getpreferredencoding() returns 'cp65001'
msg341323 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-05-03 02:13
Could you paste how the test fails?
msg341324 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-05-03 04:46
======================================================================
FAIL: test_startup_imports (test.test_site.StartupImportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\docker\pythond\lib\test\test_site.py", line 542, in test_startup_imports
    self.assertFalse(modules.intersection(collection_mods), stderr)
AssertionError: {'operator', 'keyword', 'functools', 'heapq', 'collections', 'reprlib'} is not false : import _frozen_importlib # frozen
import _imp # builtin
import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
import '_warnings' # <class '_frozen_importlib.BuiltinImporter'>
import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
import '_frozen_importlib_external' # <class '_frozen_importlib.FrozenImporter'>
import '_io' # <class '_frozen_importlib.BuiltinImporter'>
import 'marshal' # <class '_frozen_importlib.BuiltinImporter'>
import 'nt' # <class '_frozen_importlib.BuiltinImporter'>
import _thread # previously loaded ('_thread')
import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
import _weakref # previously loaded ('_weakref')
import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
import 'winreg' # <class '_frozen_importlib.BuiltinImporter'>
# installing zipimport hook
import 'time' # <class '_frozen_importlib.BuiltinImporter'>
import 'zipimport' # <class '_frozen_importlib.FrozenImporter'>
# installed zipimport hook
# c:\docker\pythond\lib\encodings\__pycache__\__init__.cpython-38.pyc matches c:\docker\pythond\lib\encodings\__init__.py
# code object from 'c:\\docker\\pythond\\lib\\encodings\\__pycache__\\__init__.cpython-38.pyc'
# c:\docker\pythond\lib\__pycache__\codecs.cpython-38.pyc matches c:\docker\pythond\lib\codecs.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\codecs.cpython-38.pyc'
import '_codecs' # <class '_frozen_importlib.BuiltinImporter'>
import 'codecs' # <_frozen_importlib_external.SourceFileLoader object at 0x01D9DBD0>
# c:\docker\pythond\lib\encodings\__pycache__\aliases.cpython-38.pyc matches c:\docker\pythond\lib\encodings\aliases.py
# code object from 'c:\\docker\\pythond\\lib\\encodings\\__pycache__\\aliases.cpython-38.pyc'
import 'encodings.aliases' # <_frozen_importlib_external.SourceFileLoader object at 0x01EFF900>
import 'encodings' # <_frozen_importlib_external.SourceFileLoader object at 0x01D9DA50>
# c:\docker\pythond\lib\encodings\__pycache__\utf_8.cpython-38.pyc matches c:\docker\pythond\lib\encodings\utf_8.py
# code object from 'c:\\docker\\pythond\\lib\\encodings\\__pycache__\\utf_8.cpython-38.pyc'
import 'encodings.utf_8' # <_frozen_importlib_external.SourceFileLoader object at 0x01D9DCC0>
import '_signal' # <class '_frozen_importlib.BuiltinImporter'>
# c:\docker\pythond\lib\encodings\__pycache__\cp65001.cpython-38.pyc matches c:\docker\pythond\lib\encodings\cp65001.py
# code object from 'c:\\docker\\pythond\\lib\\encodings\\__pycache__\\cp65001.cpython-38.pyc'
# c:\docker\pythond\lib\__pycache__\functools.cpython-38.pyc matches c:\docker\pythond\lib\functools.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\functools.cpython-38.pyc'
# c:\docker\pythond\lib\__pycache__\abc.cpython-38.pyc matches c:\docker\pythond\lib\abc.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\abc.cpython-38.pyc'
import '_abc' # <class '_frozen_importlib.BuiltinImporter'>
import 'abc' # <_frozen_importlib_external.SourceFileLoader object at 0x01F16FC0>
# c:\docker\pythond\lib\collections\__pycache__\__init__.cpython-38.pyc matches c:\docker\pythond\lib\collections\__init__.py
# code object from 'c:\\docker\\pythond\\lib\\collections\\__pycache__\\__init__.cpython-38.pyc'
# c:\docker\pythond\lib\__pycache__\_collections_abc.cpython-38.pyc matches c:\docker\pythond\lib\_collections_abc.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\_collections_abc.cpython-38.pyc'
import '_collections_abc' # <_frozen_importlib_external.SourceFileLoader object at 0x01F423C0>
# c:\docker\pythond\lib\__pycache__\operator.cpython-38.pyc matches c:\docker\pythond\lib\operator.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\operator.cpython-38.pyc'
import '_operator' # <class '_frozen_importlib.BuiltinImporter'>
import 'operator' # <_frozen_importlib_external.SourceFileLoader object at 0x01F4D630>
# c:\docker\pythond\lib\__pycache__\keyword.cpython-38.pyc matches c:\docker\pythond\lib\keyword.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\keyword.cpython-38.pyc'
import 'keyword' # <_frozen_importlib_external.SourceFileLoader object at 0x01F58810>
# c:\docker\pythond\lib\__pycache__\heapq.cpython-38.pyc matches c:\docker\pythond\lib\heapq.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\heapq.cpython-38.pyc'
import '_heapq' # <class '_frozen_importlib.BuiltinImporter'>
import 'heapq' # <_frozen_importlib_external.SourceFileLoader object at 0x01F588D0>
import 'itertools' # <class '_frozen_importlib.BuiltinImporter'>
# c:\docker\pythond\lib\__pycache__\reprlib.cpython-38.pyc matches c:\docker\pythond\lib\reprlib.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\reprlib.cpython-38.pyc'
import 'reprlib' # <_frozen_importlib_external.SourceFileLoader object at 0x01F59900>
import '_collections' # <class '_frozen_importlib.BuiltinImporter'>
import 'collections' # <_frozen_importlib_external.SourceFileLoader object at 0x01F25810>
import '_functools' # <class '_frozen_importlib.BuiltinImporter'>
import 'functools' # <_frozen_importlib_external.SourceFileLoader object at 0x01EFFCC0>
import 'encodings.cp65001' # <_frozen_importlib_external.SourceFileLoader object at 0x01EFF9F0>
# c:\docker\pythond\lib\encodings\__pycache__\latin_1.cpython-38.pyc matches c:\docker\pythond\lib\encodings\latin_1.py
# code object from 'c:\\docker\\pythond\\lib\\encodings\\__pycache__\\latin_1.cpython-38.pyc'
import 'encodings.latin_1' # <_frozen_importlib_external.SourceFileLoader object at 0x01EFF810>
# c:\docker\pythond\lib\__pycache__\io.cpython-38.pyc matches c:\docker\pythond\lib\io.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\io.cpython-38.pyc'
import 'io' # <_frozen_importlib_external.SourceFileLoader object at 0x01D88DB0>
Python 3.8.0a3+ (heads/iot-merged-dirty:88716a51a3, Apr  5 2019, 11:11:18) [MSC v.1916 32 bit (ARM)] on win32
Type "help", "copyright", "credits" or "license" for more information.
# c:\docker\pythond\lib\__pycache__\site.cpython-38.pyc matches c:\docker\pythond\lib\site.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\site.cpython-38.pyc'
# c:\docker\pythond\lib\__pycache__\os.cpython-38.pyc matches c:\docker\pythond\lib\os.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\os.cpython-38.pyc'
# c:\docker\pythond\lib\__pycache__\stat.cpython-38.pyc matches c:\docker\pythond\lib\stat.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\stat.cpython-38.pyc'
import '_stat' # <class '_frozen_importlib.BuiltinImporter'>
import 'stat' # <_frozen_importlib_external.SourceFileLoader object at 0x01F25990>
# c:\docker\pythond\lib\__pycache__\ntpath.cpython-38.pyc matches c:\docker\pythond\lib\ntpath.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\ntpath.cpython-38.pyc'
# c:\docker\pythond\lib\__pycache__\genericpath.cpython-38.pyc matches c:\docker\pythond\lib\genericpath.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\genericpath.cpython-38.pyc'
import 'genericpath' # <_frozen_importlib_external.SourceFileLoader object at 0x01F9CDE0>
import 'ntpath' # <_frozen_importlib_external.SourceFileLoader object at 0x01F9C5D0>
import 'os' # <_frozen_importlib_external.SourceFileLoader object at 0x01F873F0>
# c:\docker\pythond\lib\__pycache__\_sitebuiltins.cpython-38.pyc matches c:\docker\pythond\lib\_sitebuiltins.py
# code object from 'c:\\docker\\pythond\\lib\\__pycache__\\_sitebuiltins.cpython-38.pyc'
import '_sitebuiltins' # <_frozen_importlib_external.SourceFileLoader object at 0x01F87FC0>
import 'site' # <_frozen_importlib_external.SourceFileLoader object at 0x01F16C60>
# cleanup[3] wiping _functools
# cleanup[3] wiping _collections
# cleanup[3] wiping heapq
# cleanup[3] wiping _heapq
# destroy _heapq
# cleanup[3] wiping _operator
# cleanup[3] wiping _collections_abc
# cleanup[3] wiping _abc
# cleanup[3] wiping encodings.utf_8
# cleanup[3] wiping encodings.aliases
# cleanup[3] wiping codecs
# cleanup[3] wiping _codecs
# cleanup[3] wiping winreg
# cleanup[3] wiping _weakref
# cleanup[3] wiping _thread
# cleanup[3] wiping nt
# cleanup[3] wiping marshal
# cleanup[3] wiping _io
# cleanup[3] wiping _frozen_importlib_external
# destroy io
# destroy nt
# destroy winreg
# destroy marshal
# cleanup[3] wiping _warnings
# cleanup[3] wiping _imp
# cleanup[3] wiping _frozen_importlib
# destroy _frozen_importlib_external
# destroy _imp
# destroy _warnings
# cleanup[3] wiping sys
# clear builtins._
# clear sys.path
# clear sys.argv
# clear sys.ps1
# clear sys.ps2
# clear sys.last_type
# clear sys.last_value
# clear sys.last_traceback
# clear sys.path_hooks
# clear sys.path_importer_cache
# clear sys.meta_path
# clear sys.__interactivehook__
# clear sys.flags
# clear sys.float_info
# restore sys.stdin
# restore sys.stdout
# restore sys.stderr
# cleanup[2] removing sys
# cleanup[2] removing builtins
# cleanup[2] removing _frozen_importlib
# cleanup[2] removing _imp
# cleanup[2] removing _warnings
# cleanup[2] removing _frozen_importlib_external
# cleanup[2] removing _io
# cleanup[2] removing marshal
# cleanup[2] removing nt
# cleanup[2] removing _thread
# cleanup[2] removing _weakref
# cleanup[2] removing winreg
# cleanup[2] removing time
# cleanup[2] removing zipimport
# destroy zipimport
# cleanup[2] removing _codecs
# cleanup[2] removing codecs
# cleanup[2] removing encodings.aliases
# cleanup[2] removing encodings
# destroy encodings
# cleanup[2] removing encodings.utf_8
# cleanup[2] removing _signal
# cleanup[2] removing __main__
# destroy __main__
# cleanup[2] removing _abc
# cleanup[2] removing abc
# cleanup[2] removing _collections_abc
# cleanup[2] removing _operator
# cleanup[2] removing operator
# destroy operator
# cleanup[2] removing keyword
# destroy keyword
# cleanup[2] removing _heapq
# cleanup[2] removing heapq
# cleanup[2] removing itertools
# cleanup[2] removing reprlib
# destroy reprlib
# cleanup[2] removing _collections
# cleanup[2] removing collections
# destroy collections
# cleanup[2] removing _functools
# cleanup[2] removing functools
# cleanup[2] removing encodings.cp65001
# cleanup[2] removing encodings.latin_1
# cleanup[2] removing io
# destroy io
# cleanup[2] removing _stat
# cleanup[2] removing stat
# cleanup[2] removing genericpath
# cleanup[2] removing ntpath
# cleanup[2] removing os.path
# cleanup[2] removing os
# cleanup[2] removing _sitebuiltins
# cleanup[2] removing site
# destroy site
# destroy time
# destroy _signal
# destroy itertools
# destroy _sitebuiltins
# destroy abc
# destroy ntpath
# destroy _stat
# destroy os
# destroy stat
# destroy genericpath
# cleanup[3] wiping encodings.latin_1
# cleanup[3] wiping encodings.cp65001
# destroy functools
# cleanup[3] wiping builtins
# destroy _functools
# destroy _collections_abc
# destroy _operator
# destroy heapq
# destroy _weakref
# destroy _collections
# destroy _thread
# destroy _abc
# destroy _frozen_importlib
msg341326 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-05-03 06:12
@Victor It seems you added cp65001 as Windows-only encoding in bpo-13216.

How do you think about removing cp65001 encoding, and add 'cp65001' -> 'utf_8' alias which is available on all platforms?
msg341372 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-04 04:26
cp65001 is *not* utf-8: Microsoft decided to handle surrogates differently
for some reasons.
msg341377 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-05-04 07:35
> cp65001 is *not* utf-8: Microsoft decided to handle surrogates 
> differently for some reasons.

Do you mean valid UTF-16 surrogate pairs? For example:

    >>> codecs.code_page_encode(65001, '\ud800\udc00')
    (b'\xf0\x90\x80\x80', 2)

PyUnicode_AsUnicodeAndSize is neutral about storing surrogate codes in a 16-bit wchar_t string. In particular, the Python string in this case contains two surrogate codes, but they're passed to WideCharToMultiByte as a UTF-16 surrogate pair for the single character U+10000.

Anyway, it seems to me this issue will be resolved if cp65001.py is rewritten without functools.partial.
msg341383 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-05-04 13:36
I think it is better to just make the check in the test conditional. It already contains some macOs specific conditions.
msg341401 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-05-04 20:15
> I think it is better to just make the check in the test conditional.

Okay. The test verifies work done to minimize interpreter startup time, but probably the relative cost of importing functools (and thus collections et al) isn't significant compared to the overall cost of spawning a process in a Windows desktop environment. That may not be the case for Nano Server and IoT Core.
msg341520 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-06 15:04
Paul Monson: I'm unable to reproduce exactly your issue, but I tried to reproduce it partially using PYTHONIOENCODING=cp65001.

My PR 13110 avoids "import functools" at startup. Can you please try it and check if it fix test_site?
msg341531 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-06 15:31
Victor:
> cp65001 is *not* utf-8: Microsoft decided to handle surrogates differently for some reasons.

Eryk:
> Do you mean valid UTF-16 surrogate pairs? (...)

Code page 65001 handles lone surrogate differently on Windows XP and older. It changed in Windows Vista:
https://unicodebook.readthedocs.io/operating_systems.html#encode-and-decode-functions

Steve Dower removed support for Vista from test_codecs.py 3 years ago:

commit f5aba58480bb0dd45181f609487ac2ecfcc98673
Author: Steve Dower <steve.dower@microsoft.com>
Date:   Tue Sep 6 19:42:27 2016 -0700

    Issue #27959: Adds oem encoding, alias ansi to mbcs, move aliasmbcs to codec lookup

Maybe it's time to remove Lib/encodings/cp65001.py and add an alias cp65001 => utf_8 in Lib/encodings/aliases.py? See bpo-32592.
msg341570 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-05-06 17:32
> Okay. The test verifies work done to minimize interpreter startup time, but probably the relative cost of importing functools (and thus collections et al) isn't significant compared to the overall cost of spawning a process in a Windows desktop environment. That may not be the case for Nano Server and IoT Core.

Is there an easy way to measure this?

> PYTHONIOENCODING=cp65001

I tried setting PYTHONIOENCODING=cp1252 on Windows IoT Core as a workaround and it didn't work.

Victor> My PR 13110 avoids "import functools" at startup. Can you please try it and check if it fix test_site?

I tried the PR and it fixes test_startup_imports, which seems promising.  The PR breaks other test_site tests on Windows IoT Core. 
 The same ones you pointed out in the PR discussion.
msg341671 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-05-07 03:06
FYI, I expect cp65001 will be used more widely in near future,
because non UTF-8 default encoding reduced Developer eXperience,
and Microsoft try to improve DX recent years.

Today, Microsoft announced new Terminal application.
It seems use `SetConsoleOutputCP(65001)` and `SetConsoleCP(65001)`.

I think treating cp65001 as right "UTF-8" locale is better for all
Windows developers.
msg341924 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-05-08 18:27
cp65001 is the default codepage on Windows IoT Core and Windows NanoServer.  

There is also an option in control panel in Windows desktop 1809 (version 17763) and greater which changes the default codepage to cp65001. 
1. Run control.exe
2. Click Clock and Region> change date, time or number formats
3. Click administrative tab
4. Click "Change System locale..." button
5. Check "Beta: Use Unicode UTF-8 for worldwide language support"
6. Click OK twice.
7. You will be prompted to reboot.

> Code page 65001 handles lone surrogate differently on Windows XP and older.

If I read the docs correctly a lone surrogate is an error.  I don't think a corner case like handling errors differently makes cp65001 not UTF-8.  Am I misunderstanding this point?
Also, Why is Windows XP still relevant in this discussion?
msg341926 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-05-08 18:38
The XP/Vista change is just context - we don't have to worry about OS that old any more.

If we remove the functools.partial call, does that help?
msg341947 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-05-08 21:19
Removing import functools from cp65001.py fixes test_startup_imports.

Victor proposed this PR: https://github.com/python/cpython/pull/13110
but new test_codecs fails because it's passing self on to the lambda I think.

I tried to build on Victor's change but there is still one test failure I haven't tracked down yet: https://github.com/python/cpython/pull/13211

FAIL: test_incremental_surrogatepass (test.test_codecs.CP65001Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\master\pythond\lib\test\test_codecs.py", line 436, in test_incremental_surrogatepass
    self.assertEqual(dec.decode(data[i:], True), '\uD901')
AssertionError: '' != '\ud901'
+ \ud901
msg341955 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-05-09 03:31
> FYI, I expect cp65001 will be used more widely in near future,
[...]
> It seems use `SetConsoleOutputCP(65001)` and `SetConsoleCP(65001)`.

Unless PYTHONLEGACYWINDOWSSTDIO is defined, Python 3.6+ doesn't use the console's codepage-based interface (except for low-level os.read and os.write). Console files uses the wide-character console API internally, and have a "utf-8" encoding. "cp65001" isn't a factor in this context.

This issue probably occurs due to the encoding returned by locale.getpreferredencoding(). This calls _locale._getdefaultlocale, which returns a tuple that mixes the user locale with the system ANSI codepage. For example, with ANSI set to UTF-8 (Windows 10):

    >>> _locale._getdefaultlocale()
    ('en_GB', 'cp65001')

The Universal CRT special cases CP_UTF8 (codepage 65001) as "utf8" and accepts "utf-8" as an alias. For example, after setting the ANSI codepage to UTF-8:

    >>> locale.setlocale(locale.LC_CTYPE, '')
    'English_United Kingdom.utf8'

Python could similarly special case CP_UTF8 as "utf-8" in _locale._getdefaultlocale.
msg341968 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-05-09 05:42
@Eryk I didn't say new Terminal will cause this issue.  I know ConsoeIO too.

I just meant Microsoft use cp65001 more widely for better UTF-8 support nowadays.
So I want to make cp65001 as alias of UTF-8.


> Python could similarly special case CP_UTF8 as "utf-8" in _locale._getdefaultlocale.

I like this idea too.
msg342004 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-09 23:07
I wrote PR 13230 to remove Lib/encodings/cp65001.py and simply reuse Lib/encodings/utf_8.py.
msg342006 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-09 23:11
My PR 13110 (avoid functools) makes codecs.lookup('cp65001').encode() made 2.7x slower:
https://github.com/python/cpython/pull/13110#issuecomment-491095964
417 ns +- 17 ns

My PR 13230 (remove cp65001.py) makes it 1.5x faster :-)
https://github.com/python/cpython/pull/13230#issuecomment-491099012
105 ns +- 3 ns

The reference is: 156 ns +- 3 ns.
msg342008 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-09 23:13
> Python could similarly special case CP_UTF8 as "utf-8" in _locale._getdefaultlocale.

I dislike lying in the locale module. This change is basically useless with my PR 13230.
msg342010 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-05-09 23:34
> I dislike lying in the locale module. This change is basically useless 
> with my PR 13230.

Yes, functionally it's no different than using 'cp65001' as an alias. That said, the CRT special cases 65001 as "utf8":

    >>> locale.setlocale(locale.LC_CTYPE, '')
    'English_United Kingdom.utf8'
    >>> crt_locale = ctypes.CDLL('api-ms-win-crt-locale-l1-1-0', use_errno=True)
    >>> crt_locale.___lc_codepage_func()
    65001

So the suggested change makes the locale module internally consistent on Windows and more transparent for anyone who doesn't know off the top of their head that "cp65001" is just UTF-8.
msg342019 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-05-10 00:30
I can verify that PR 13110 fixes the issue with test_startup_imports on Windows IoT Core ARM32
msg342020 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-05-10 00:31
Sorry that was supposed to say:
I can verify that PR 13230 fixes the issue with test_startup_imports on Windows IoT Core ARM32
msg342025 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-05-10 01:12
> I dislike lying in the locale module. This change is basically useless with my PR 13230.


Note that Python produce "cpNNN" encoding name, not Windows.
https://github.com/python/cpython/blob/137be34180a20dba53948d126b961069f299f153/Modules/_localemodule.c#L395

So I don't think it is lie.  It is just "what encoding name we should choose when GetACP() returned 65001.".
With your PR 13230, cp65001 is truly utf-8.  So returning "utf-8" seems right behavior.
msg342027 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-10 01:19
New changeset d267ac20c309e37d85a986b4417aa8ab4d05dabc by Victor Stinner in branch 'master':
bpo-36778: cp65001 encoding becomes an alias to utf_8 (GH-13230)
https://github.com/python/cpython/commit/d267ac20c309e37d85a986b4417aa8ab4d05dabc
msg342029 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-10 01:23
About the ANSI code page, Lib/encodings/__init__.py calls _winapi.GetACP() to avoid relying on locale.getpreferredencoding() which lies when UTF-8 Mode is enabled:

            import _winapi
            ansi_code_page = "cp%s" % _winapi.GetACP()
            if encoding == ansi_code_page:
                import encodings.mbcs
                return encodings.mbcs.getregentry()

INADA-san:
> So I don't think it is lie.  It is just "what encoding name we should choose when GetACP() returned 65001.".
> With your PR 13230, cp65001 is truly utf-8.  So returning "utf-8" seems right behavior.

Well, feel free to propose a PR. I have no strong opinion on this level of detail :-)
msg342032 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-10 01:27
Paul Monson: Your initial issue has been fixed in the master branch.

I'm not sure what are Windows IoT Core and Windows Nano Server. Do you care of Python 3.7? If someone wants to support running test_site with ANSI code page set to 65001, I suggest to fix test_site directly like PR 13072 in Python 3.7. My attempt to avoid functools made cp65001 codec way slower. Fixing one specific test should not make Python that much slower ;-)
msg342047 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-05-10 02:38
Thanks Victor!  Since we aren't backporting ARM32 changes, I don't think it's important to fix this test in 3.7.  I am trying to get the buildbot tests for Windows ARM32 to zero errors.

Windows IoT Core runs on Raspberry Pi and similar devices: https://developer.microsoft.com/en-us/windows/iot

Windows NanoServer is a very small version of Windows Server for running in Docker containers hosted on Windows Server.
msg342053 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-10 03:30
> Since we aren't backporting ARM32 changes, I don't think it's important to fix this test in 3.7.  I am trying to get the buildbot tests for Windows ARM32 to zero errors.

Ok, thanks. I close the issue.
msg342290 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-13 08:42
New changeset 3aef48e3157f52a8bcdbacf47a35d0016348735e by Victor Stinner in branch 'master':
bpo-36778: Update cp65001 codec documentation (GH-13240)
https://github.com/python/cpython/commit/3aef48e3157f52a8bcdbacf47a35d0016348735e
msg344588 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 15:09
New changeset ca612a9728b83472d9d286bbea74972d426ed344 by Victor Stinner in branch 'master':
bpo-36778: Remove outdated comment from CodePageTest (GH-13807)
https://github.com/python/cpython/commit/ca612a9728b83472d9d286bbea74972d426ed344
History
Date User Action Args
2019-06-04 15:09:15vstinnersetmessages: + msg344588
2019-06-04 14:14:03vstinnersetpull_requests: + pull_request13692
2019-05-13 08:42:36vstinnersetmessages: + msg342290
2019-05-10 23:35:32vstinnersetpull_requests: + pull_request13150
2019-05-10 03:30:36vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg342053

stage: patch review -> resolved
2019-05-10 02:38:36Paul Monsonsetmessages: + msg342047
2019-05-10 01:27:05vstinnersetmessages: + msg342032
2019-05-10 01:23:50vstinnersetmessages: + msg342029
2019-05-10 01:19:56vstinnersetmessages: + msg342027
2019-05-10 01:12:16inada.naokisetmessages: + msg342025
2019-05-10 00:31:33Paul Monsonsetmessages: + msg342020
2019-05-10 00:30:38Paul Monsonsetmessages: + msg342019
2019-05-09 23:34:07eryksunsetmessages: + msg342010
2019-05-09 23:13:41vstinnersetmessages: + msg342008
2019-05-09 23:11:53vstinnersetmessages: + msg342006
2019-05-09 23:07:54vstinnersetmessages: + msg342004
2019-05-09 23:07:24vstinnersetpull_requests: + pull_request13138
2019-05-09 05:42:15inada.naokisetmessages: + msg341968
2019-05-09 03:31:49eryksunsetmessages: + msg341955
2019-05-08 21:19:21Paul Monsonsetmessages: + msg341947
2019-05-08 21:13:37Paul Monsonsetpull_requests: + pull_request13122
2019-05-08 18:38:36steve.dowersetmessages: + msg341926
2019-05-08 18:27:22Paul Monsonsetmessages: + msg341924
2019-05-07 03:06:46inada.naokisetmessages: + msg341671
2019-05-06 17:32:56Paul Monsonsetmessages: + msg341570
2019-05-06 16:47:24Paul Monsonsettitle: test_site.StartupImportTests.test_startup_imports fails if default code page is not cp1252 -> test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001
2019-05-06 15:31:33vstinnersetmessages: + msg341531
2019-05-06 15:04:08vstinnersetmessages: + msg341520
2019-05-06 15:03:00vstinnersetpull_requests: + pull_request13025
2019-05-04 20:15:39eryksunsetmessages: + msg341401
2019-05-04 13:36:42serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg341383
2019-05-04 07:35:12eryksunsetnosy: + eryksun
messages: + msg341377
2019-05-04 04:26:34vstinnersetmessages: + msg341372
2019-05-03 06:12:18inada.naokisetnosy: + vstinner
messages: + msg341326
2019-05-03 04:46:30Paul Monsonsetmessages: + msg341324
2019-05-03 02:13:18inada.naokisetnosy: + inada.naoki
messages: + msg341323
2019-05-03 01:23:14Paul Monsonsetpull_requests: + pull_request12988
2019-05-02 20:48:32Paul Monsonsetkeywords: + patch
stage: patch review
pull_requests: + pull_request12986
2019-05-02 20:46:25Paul Monsoncreate