Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using _multibytecodec module on Windows, test_threading/embed get failure #87012

Closed
neonene mannequin opened this issue Jan 6, 2021 · 12 comments
Closed

Using _multibytecodec module on Windows, test_threading/embed get failure #87012

neonene mannequin opened this issue Jan 6, 2021 · 12 comments
Labels
3.10 only security fixes extension-modules C modules in the Modules dir OS-windows tests Tests in the Lib/test dir topic-subinterpreters type-bug An unexpected behavior, bug, or error

Comments

@neonene
Copy link
Mannequin

neonene mannequin commented Jan 6, 2021

BPO 42846
Nosy @pfmoore, @vstinner, @tjguk, @encukou, @zware, @zooba, @corona10, @shihai1991, @neonene, @erlend-aasland
PRs
  • bpo-42846: Convert CJK codec extensions to multiphase init #24157
  • Files
  • bug.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-01-07.23:18:25.235>
    created_at = <Date 2021-01-06.22:38:21.954>
    labels = ['extension-modules', 'type-bug', '3.10', 'expert-subinterpreters', 'tests', 'OS-windows']
    title = 'Using _multibytecodec module on Windows, test_threading/embed get failure'
    updated_at = <Date 2021-01-08.09:30:21.436>
    user = 'https://github.com/neonene'

    bugs.python.org fields:

    activity = <Date 2021-01-08.09:30:21.436>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-01-07.23:18:25.235>
    closer = 'vstinner'
    components = ['Extension Modules', 'Tests', 'Windows', 'Subinterpreters']
    creation = <Date 2021-01-06.22:38:21.954>
    creator = 'neonene'
    dependencies = []
    files = ['49727']
    hgrepos = []
    issue_num = 42846
    keywords = ['patch']
    message_count = 12.0
    messages = ['384541', '384607', '384610', '384611', '384613', '384616', '384618', '384619', '384620', '384621', '384622', '384644']
    nosy_count = 10.0
    nosy_names = ['paul.moore', 'vstinner', 'tim.golden', 'petr.viktorin', 'zach.ware', 'steve.dower', 'corona10', 'shihai1991', 'neonene', 'erlendaasland']
    pr_nums = ['24157']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue42846'
    versions = ['Python 3.10']

    @neonene
    Copy link
    Mannequin Author

    neonene mannequin commented Jan 6, 2021

    After 0b858cd
    (bpo-1635741: Convert _multibytecodec to multi-phase init),

    On Windows x64/x86 with chinese/japanese/korean system-locale,
    MultibyteCodec_Check() in multibytecodec.c returns false and
    PyExc_TypeError follows. This affects some tests and PGO training.

    1. python -m test --verbose test_threading

    ======================================================================
    FAIL: test_daemon_threads_fatal_error (test.test_threading.SubinterpThreadi
    ngTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\cpython-0b858\lib\test\test_threading.py", line 1124, in test_da
    emon_threads_fatal_error
        self.assertIn("Fatal Python error: Py_EndInterpreter: "
    AssertionError: 'Fatal Python error: Py_EndInterpreter: not the last thread
    ' not found in 'TypeError: codec is unexpected type\nFatal Python error: _P
    yThreadState_Delete: tstate 00000000003FF980 is still current\nPython runti
    me state: initialized\n\nThread 0x00000710 (most recent call first):\n<no P
    ython frame>\n'
    1. python -m test --verbose test_embed

    ======================================================================
    FAIL: test_audit_subinterpreter (test.test_embed.AuditingTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\cpython-0b858\lib\test\test_embed.py", line 1433, in test_audit_
    subinterpreter
        self.run_embedded_interpreter("test_audit_subinterpreter")
      File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
    _interpreter
        self.assertEqual(p.returncode, returncode,
    AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
    eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
    : tstate 000000000050CAF0 is still current\nPython runtime state: initializ
    ed\n\nThread 0x000009d8 (most recent call first):\n<no Python frame>\n'

    ======================================================================
    FAIL: test_subinterps_different_ids (test.test_embed.EmbeddingTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\cpython-0b858\lib\test\test_embed.py", line 169, in test_subinte
    rps_different_ids
        for run in self.run_repeated_init_and_subinterpreters():
      File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
    _init_and_subinterpreters
        out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
    erpreters")
      File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
    _interpreter
        self.assertEqual(p.returncode, returncode,
    AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
    eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
    : tstate 000000000041C960 is still current\nPython runtime state: initializ
    ed\n\nThread 0x00000a40 (most recent call first):\n<no Python frame>\n'

    ======================================================================
    FAIL: test_subinterps_distinct_state (test.test_embed.EmbeddingTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\cpython-0b858\lib\test\test_embed.py", line 177, in test_subinte
    rps_distinct_state
        for run in self.run_repeated_init_and_subinterpreters():
      File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
    _init_and_subinterpreters
        out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
    erpreters")
      File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
    _interpreter
        self.assertEqual(p.returncode, returncode,
    AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
    eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
    : tstate 000000000047C960 is still current\nPython runtime state: initializ
    ed\n\nThread 0x00000b34 (most recent call first):\n<no Python frame>\n'

    ======================================================================
    FAIL: test_subinterps_main (test.test_embed.EmbeddingTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\cpython-0b858\lib\test\test_embed.py", line 163, in test_subinte
    rps_main
        for run in self.run_repeated_init_and_subinterpreters():
      File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
    _init_and_subinterpreters
        out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
    erpreters")
      File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
    _interpreter
        self.assertEqual(p.returncode, returncode,
    AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
    eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
    : tstate 000000000032C960 is still current\nPython runtime state: initializ
    ed\n\nThread 0x00000bf0 (most recent call first):\n<no Python frame>\n'

    @neonene neonene mannequin added 3.10 only security fixes extension-modules C modules in the Modules dir tests Tests in the Lib/test dir OS-windows topic-subinterpreters type-bug An unexpected behavior, bug, or error labels Jan 6, 2021
    @erlend-aasland
    Copy link
    Contributor

    I'm unable to reproduce this on Windows 10 (amd64). What's your exact locale setting? Are you compiling with HEAD at 0b858cd?

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2021

    I can reproduce the issue on Windows configured in Japanese language: ANSI code page cp932.

    I managed to reproduce the bug on Linux with attached bug.py

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2021

    It took me a while to understand it, the _multibytecodec module itself is fine. The issue comes from the _codecs_jp module which uses the legacy module API:

        codec = _codecs_jp.getcodec('cp932')

    @erlend-aasland
    Copy link
    Contributor

    It should be sufficient to convert cjkcodecs.h to multi-phase init then? From what I can see, the support modules are state less, right?

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2021

    I'm working on a fix.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2021

    Attached PR 24157 should fix the issue.

    FAIL: test_daemon_threads_fatal_error (test.test_threading.SubinterpThreadingTests)

    This test runs code in a subinterpreter which is run in a subprocess. The problem is not in the code run in the subinterpreter, but the creation of sys.stdout in the subprocess.

    The test creates a subprocess and redirects its stdout and stderr. In this case, Python doesn't create a _io._WindowsConsoleIO for sys.stdout.buffer.raw, but a regular _io.FileIO object. When the raw I/O is a _WindowsConsoleIO instance, create_stdio() of Python/pylifecycle.c forces the usage of the UTF-8 encoding. But for FileIO, it keeps the locale encoding.

    If the locale encoding is "cp932", a CJK multicodec is used. In the main interpreter, it's fine. In a subinterpreter, we hit the bug of the _codecs_jp which doesn't use the new multi-phase initialization API.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2021

    Simpler way to reproduce the issue with t.py script:
    ---

    import test.support
    import sys
    
    import _testcapi
    
    print(f"{sys.stdout.encoding=}", file=sys.stderr)
    
    with test.support.SuppressCrashReport():
        _testcapi.run_in_subinterp("pass")

    By default, UTF-8 is used, everything is fine:
    -----
    C:\> python t.py
    sys.stdout.encoding='utf-8'
    -----

    Disable _WindowsConsoleIO with PYTHONLEGACYWINDOWSSTDIO env var, we get the issue:
    -----
    C:\> set PYTHONLEGACYWINDOWSSTDIO=1

    C:\> python t.py
    Running Debug|x64 interpreter...
    sys.stdout.encoding='cp932'
    TypeError: codec is unexpected type
    Fatal Python error: (...)
    -----

    Or redirect the output into a program or a file to disable _WindowsConsoleIO to also reproduce the issue:
    -----
    C:\> python t.py|more
    sys.stdout.encoding='cp932'
    TypeError: codec is unexpected type
    (...)
    -----

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2021

    Ah, if you don't want to change the ANSI code page to cp932 (Japanese language) just to reproduce the issue, you can just set the stdio encoding:
    -----
    C:\> set PYTHONIOENCODING=cp932
    C:\> python t.py|more
    sys.stdout.encoding='cp1250'

    TypeError: codec is unexpected type
    (...)
    -----

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2021

    New changeset 07f2cee by Victor Stinner in branch 'master':
    bpo-42846: Convert CJK codec extensions to multiphase init (GH-24157)
    07f2cee

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2021

    1. python -m test --verbose test_threading
    2. python -m test --verbose test_embed

    I ran manually these two tests with cp932 ANSI code page: they now pass with my fix.

    I also added a regression test to test_multibytecodec.py.

    Thanks for your quick bug report neonene! It's now fixed.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 8, 2021

    bpo-42846: Convert CJK codec extensions to multiphase init (GH-24157)

    I added a new test and new test spotted a reference leak, likely an existing one: bpo-42866 "test test_multibytecodec: Test_IncrementalEncoder.test_subinterp() leaks references".

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes extension-modules C modules in the Modules dir OS-windows tests Tests in the Lib/test dir topic-subinterpreters type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants