classification
Title: Using _multibytecodec module on Windows, test_threading/embed get failure
Type: behavior Stage: resolved
Components: Extension Modules, Subinterpreters, Tests, Windows Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: corona10, erlendaasland, neonene, paul.moore, petr.viktorin, shihai1991, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2021-01-06 22:38 by neonene, last changed 2021-01-08 09:30 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
bug.py vstinner, 2021-01-07 22:18
Pull Requests
URL Status Linked Edit
PR 24157 merged vstinner, 2021-01-07 22:45
Messages (12)
msg384541 - (view) Author: neonene (neonene) * Date: 2021-01-06 22:38
After https://github.com/python/cpython/commit/0b858cdd5d114f0890b11b6c4d6559d0ceb468ab
(bpo-1635741: Convert _multibytecodec to multi-phase init),

On Windows x64/x86 with chinese/japanese/korean system-locale,
MultibyteCodec_Check() in multibytecodec.c returns false and
PyExc_TypeError follows. This affects some tests and PGO training.



1) python -m test --verbose test_threading

======================================================================
FAIL: test_daemon_threads_fatal_error (test.test_threading.SubinterpThreadi
ngTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_threading.py", line 1124, in test_da
emon_threads_fatal_error
    self.assertIn("Fatal Python error: Py_EndInterpreter: "
AssertionError: 'Fatal Python error: Py_EndInterpreter: not the last thread
' not found in 'TypeError: codec is unexpected type\nFatal Python error: _P
yThreadState_Delete: tstate 00000000003FF980 is still current\nPython runti
me state: initialized\n\nThread 0x00000710 (most recent call first):\n<no P
ython frame>\n'



2) python -m test --verbose test_embed

======================================================================
FAIL: test_audit_subinterpreter (test.test_embed.AuditingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 1433, in test_audit_
subinterpreter
    self.run_embedded_interpreter("test_audit_subinterpreter")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
    self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 000000000050CAF0 is still current\nPython runtime state: initializ
ed\n\nThread 0x000009d8 (most recent call first):\n<no Python frame>\n'

======================================================================
FAIL: test_subinterps_different_ids (test.test_embed.EmbeddingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 169, in test_subinte
rps_different_ids
    for run in self.run_repeated_init_and_subinterpreters():
  File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
_init_and_subinterpreters
    out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
erpreters")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
    self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 000000000041C960 is still current\nPython runtime state: initializ
ed\n\nThread 0x00000a40 (most recent call first):\n<no Python frame>\n'

======================================================================
FAIL: test_subinterps_distinct_state (test.test_embed.EmbeddingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 177, in test_subinte
rps_distinct_state
    for run in self.run_repeated_init_and_subinterpreters():
  File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
_init_and_subinterpreters
    out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
erpreters")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
    self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 000000000047C960 is still current\nPython runtime state: initializ
ed\n\nThread 0x00000b34 (most recent call first):\n<no Python frame>\n'

======================================================================
FAIL: test_subinterps_main (test.test_embed.EmbeddingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 163, in test_subinte
rps_main
    for run in self.run_repeated_init_and_subinterpreters():
  File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
_init_and_subinterpreters
    out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
erpreters")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
    self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 000000000032C960 is still current\nPython runtime state: initializ
ed\n\nThread 0x00000bf0 (most recent call first):\n<no Python frame>\n'
msg384607 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-01-07 21:52
I'm unable to reproduce this on Windows 10 (amd64). What's your exact locale setting? Are you compiling with HEAD at 0b858cdd5d114f0890b11b6c4d6559d0ceb468ab?
msg384610 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-07 22:18
I can reproduce the issue on Windows configured in Japanese language: ANSI code page cp932.

I managed to reproduce the bug on Linux with attached bug.py
msg384611 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-07 22:21
It took me a while to understand it, the _multibytecodec module itself is fine. The issue comes from the _codecs_jp module which uses the legacy module API:

    codec = _codecs_jp.getcodec('cp932')
msg384613 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-01-07 22:25
It should be sufficient to convert cjkcodecs.h to multi-phase init then? From what I can see, the support modules are state less, right?
msg384616 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-07 22:36
I'm working on a fix.
msg384618 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-07 23:05
Attached PR 24157 should fix the issue.

> FAIL: test_daemon_threads_fatal_error (test.test_threading.SubinterpThreadingTests)

This test runs code in a subinterpreter which is run in a subprocess. The problem is not in the code run in the subinterpreter, but the creation of sys.stdout in the subprocess.

The test creates a subprocess and redirects its stdout and stderr. In this case, Python doesn't create a _io._WindowsConsoleIO for sys.stdout.buffer.raw, but a regular _io.FileIO object. When the raw I/O is a _WindowsConsoleIO instance, create_stdio() of Python/pylifecycle.c forces the usage of the UTF-8 encoding. But for FileIO, it keeps the locale encoding.

If the locale encoding is "cp932", a CJK multicodec is used. In the main interpreter, it's fine. In a subinterpreter, we hit the bug of the _codecs_jp which doesn't use the new multi-phase initialization API.
msg384619 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-07 23:08
Simpler way to reproduce the issue with t.py script:
---
import test.support
import sys

import _testcapi

print(f"{sys.stdout.encoding=}", file=sys.stderr)

with test.support.SuppressCrashReport():
    _testcapi.run_in_subinterp("pass")
---

By default, UTF-8 is used, everything is fine:
-----
C:\> python t.py
sys.stdout.encoding='utf-8'
-----

Disable _WindowsConsoleIO with PYTHONLEGACYWINDOWSSTDIO env var, we get the issue:
-----
C:\> set PYTHONLEGACYWINDOWSSTDIO=1

C:\> python t.py
Running Debug|x64 interpreter...
sys.stdout.encoding='cp932'
TypeError: codec is unexpected type
Fatal Python error: (...)
-----

Or redirect the output into a program or a file to disable _WindowsConsoleIO to also reproduce the issue:
-----
C:\> python t.py|more
sys.stdout.encoding='cp932'
TypeError: codec is unexpected type
(...)
-----
msg384620 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-07 23:12
Ah, if you don't want to change the ANSI code page to cp932 (Japanese language) just to reproduce the issue, you can just set the stdio encoding:
-----
C:\> set PYTHONIOENCODING=cp932
C:\> python t.py|more
sys.stdout.encoding='cp1250'

TypeError: codec is unexpected type
(...)
-----
msg384621 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-07 23:15
New changeset 07f2cee93f1b619650403981c455f47bfed8d818 by Victor Stinner in branch 'master':
bpo-42846: Convert CJK codec extensions to multiphase init (GH-24157)
https://github.com/python/cpython/commit/07f2cee93f1b619650403981c455f47bfed8d818
msg384622 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-07 23:18
> 1) python -m test --verbose test_threading
> 2) python -m test --verbose test_embed

I ran manually these two tests with cp932 ANSI code page: they now pass with my fix.

I also added a regression test to test_multibytecodec.py.

Thanks for your quick bug report neonene! It's now fixed.
msg384644 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-08 09:30
> bpo-42846: Convert CJK codec extensions to multiphase init (GH-24157)

I added a new test and new test spotted a reference leak, likely an existing one: bpo-42866 "test test_multibytecodec: Test_IncrementalEncoder.test_subinterp() leaks references".
History
Date User Action Args
2021-01-08 09:30:21vstinnersetmessages: + msg384644
2021-01-07 23:18:25vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg384622

stage: patch review -> resolved
2021-01-07 23:15:29vstinnersetmessages: + msg384621
2021-01-07 23:13:00vstinnersetmessages: + msg384620
2021-01-07 23:08:48vstinnersetmessages: + msg384619
2021-01-07 23:05:41vstinnersetmessages: + msg384618
2021-01-07 22:45:55vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request22985
2021-01-07 22:36:31vstinnersetmessages: + msg384616
2021-01-07 22:25:30erlendaaslandsetmessages: + msg384613
2021-01-07 22:21:36vstinnersetmessages: + msg384611
2021-01-07 22:18:28vstinnersetfiles: + bug.py

messages: + msg384610
2021-01-07 21:52:26erlendaaslandsetmessages: + msg384607
2021-01-06 22:38:22neonenecreate