classification
Title: test_multiprocessing_spawn fails on AMD64 Windows8 3.x
Type: Stage: resolved
Components: Tests, Windows Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, eryksun, lukasz.langa, pablogsal, paul.moore, pitrou, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2019-02-26 07:13 by pablogsal, last changed 2019-03-06 01:37 by vstinner. This issue is now closed.

Messages (12)
msg336625 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-02-26 07:13
test test_multiprocessing_spawn failed
test_import (test.test_multiprocessing_spawn._TestImportStar) ... ok
======================================================================
FAIL: test_mymanager_context (test.test_multiprocessing_spawn.WithManagerTestMyManager)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\buildarea\3.x.bolen-windows8\build\lib\test\_test_multiprocessing.py", line 2747, in test_mymanager_context
    self.assertIn(manager._process.exitcode, (0, -signal.SIGTERM))
AssertionError: 3221225477 not found in (0, -15)
----------------------------------------------------------------------
Ran 344 tests in 328.196s
FAILED (failures=1, skipped=40)
1 test failed again:
    test_multiprocessing_spawn
== Tests result: FAILURE then FAILURE ==

https://buildbot.python.org/all/#/builders/32/builds/2204/steps/3/logs/stdio
msg336626 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-02-26 07:21
It also fails on Windows 7:

https://buildbot.python.org/all/#/builders/58/builds/1983/steps/3/logs/stdio
msg336627 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-02-26 07:28
It seems that return code  3221225477 in Windows is:

#define STATUS_ACCESS_VIOLATION  ((NTSTATUS)0xC0000005L)

so this is a segfault in the manager.
msg336774 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-02-27 17:30
Adding Łukasz, as I think this is a release blocker (the Windows 8 and 7 multiprocessing module may be causing segfaults).
msg336778 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-02-27 19:42
It's also possible that the child process is causing the segfault because of misconfiguration (e.g. broken environment variables).

And depending on the OS, abort() calls (via Py_FatalError) sometimes appear to be segfaults, so it could be any number of issues. (Aside - I'd love to replace the abort() calls with specific exit codes for configuration errors - they really mess up the crash data we see on Windows.)

I'll try some tests locally to see if this is reproducible, but if anyone can extract the original stdout/stderr from the buildbot, that would be helpful.
msg336782 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-02-27 22:19
> And depending on the OS, abort() calls (via Py_FatalError) sometimes 
> appear to be segfaults, so it could be any number of issues. 
> (Aside - I'd love to replace the abort() calls with specific exit
> codes for configuration errors - they really mess up the crash data 
> we see on Windows.)

In particular, with the Universal CRT, an unhandled abort() is implemented by a __fastfail intrinsic [1] (int 0x29 instruction in x86) with the argument FAST_FAIL_FATAL_APP_EXIT (7). 

Prior to Windows 8 this appears as an access violation. In Windows 8+ it's implemented as a second-chance STATUS_STACK_BUFFER_OVERRUN (0xC0000409) exception, which is overloaded from its previous use to support failure codes. (The old usage appears as the failure code FAST_FAIL_LEGACY_GS_VIOLATION, defined to be 0.) It starts as a second-chance exception in order to bypass normal exception handling (i.e. SEH, VEH, UnhandledExceptionFilter). The second-chance exception event is sent to an attached debugger and/or the session server (csrss.exe).

Python's normal signal handling for SIGABRT can't prevent this, since the C handler just sets a flag and returns. But enabling faulthandler sets a C signal handler that restores the previous handler and calls raise(SIGABRT). The default SIGABRT handler for the explicit raise() code path simply calls _exit(3). 

Alternatively, we could prevent the __fastfail call via _set_abort_behavior [2], if implemented in msvcrt. For example: msvcrt.set_abort_behavior(0, msvcrt.CALL_REPORTFAULT).

[1]: https://docs.microsoft.com/en-us/cpp/intrinsics/fastfail
[2]: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/set-abort-behavior
msg336928 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-01 16:29
https://buildbot.python.org/all/#/builders/32/builds/2219

FAIL: test_mymanager_context_prestarted (test.test_multiprocessing_spawn.WithManagerTestMyManager)
Re-running failed tests in verbose mode
Re-running test 'test_multiprocessing_spawn' in verbose mode
FAIL: test_mymanager_context_prestarted (test.test_multiprocessing_spawn.WithManagerTestMyManager)
msg337039 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-03-03 17:54
See also https://bugs.python.org/issue36114
msg337072 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-04 08:13
> It's also possible that the child process is causing the segfault because of misconfiguration (e.g. broken environment variables).

Maybe, but the test also produces core dump on FreeBSD: bpo-36114. It looks more like a real bug.

I set the priority again to release blocker to not forget this regression.
msg337077 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-04 08:41
test_mymanager and test_mymanager_context of test_multiprocessing_spawn.WithManagerTestMyManager failed in this build:

> https://buildbot.python.org/all/#/builders/58/builds/1983/steps/3/logs/stdio

ERROR: test_multiprocessing (test.test_venv.BasicTest)
FAIL: test_async_gen_asyncio_gc_aclose_09 (test.test_asyncgen.AsyncGenAsyncioTest)
FAIL: test_daemon_threads_shutdown_stderr_deadlock (test.test_io.CMiscIOTest)
    self.assertIn("Fatal Python error: could not acquire lock "
AssertionError: "Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads" not found in (...)
FAIL: test_daemon_threads_shutdown_stdout_deadlock (test.test_io.CMiscIOTest)
    self.assertIn("Fatal Python error: could not acquire lock "
AssertionError: "Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stdout>'> at interpreter shutdown, possibly due to daemon threads" not found in ''
Re-running failed tests in verbose mode
Re-running test 'test_venv' in verbose mode
ERROR: test_multiprocessing (test.test_venv.BasicTest)
Re-running test 'test_asyncgen' in verbose mode
Re-running test 'test_multiprocessing_spawn' in verbose mode
FAIL: test_mymanager (test.test_multiprocessing_spawn.WithManagerTestMyManager)
FAIL: test_mymanager_context (test.test_multiprocessing_spawn.WithManagerTestMyManager)
Re-running test 'test_io' in verbose mode
FAIL: test_daemon_threads_shutdown_stderr_deadlock (test.test_io.CMiscIOTest)
    self.assertIn("Fatal Python error: could not acquire lock "
AssertionError: "Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads" not found in (...)
FAIL: test_daemon_threads_shutdown_stdout_deadlock (test.test_io.CMiscIOTest)
    self.assertIn("Fatal Python error: could not acquire lock "
AssertionError: "Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stdout>'> at interpreter shutdown, possibly due to daemon threads" not found in ''
msg337221 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-03-05 16:16
This is resolved with gh-12159, no?
msg337265 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 01:37
> This is resolved with gh-12159, no?

I was waiting to see if buildbot workers feel better. It's the case, so I close the issue.
History
Date User Action Args
2019-03-06 01:37:25vstinnersetstatus: open -> closed
priority: release blocker -> normal
messages: + msg337265

resolution: fixed
stage: resolved
2019-03-05 16:16:39eric.snowsetnosy: + eric.snow
messages: + msg337221
2019-03-04 08:41:01vstinnersetmessages: + msg337077
2019-03-04 08:13:27vstinnersetpriority: high -> release blocker

messages: + msg337072
2019-03-03 17:54:44pablogsalsetmessages: + msg337039
2019-03-01 16:29:58vstinnersetmessages: + msg336928
2019-02-27 22:19:58eryksunsetnosy: + eryksun
messages: + msg336782
2019-02-27 19:42:44steve.dowersetmessages: + msg336778
2019-02-27 17:30:02pablogsalsetpriority: normal -> high
nosy: + lukasz.langa
messages: + msg336774

2019-02-26 12:26:22vstinnersetnosy: + vstinner
2019-02-26 07:28:34pablogsalsetnosy: + pitrou
messages: + msg336627
2019-02-26 07:21:47pablogsalsetmessages: + msg336626
2019-02-26 07:13:43pablogsalcreate