classification
Title: AIX: test_importlib regression (ENV change)
Type: Stage: resolved
Components: Tests Versions: Python 3.10, Python 3.9
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Crash in _PyThreadState_DeleteExcept() at fork in the process child
View: 40092
Assigned To: Nosy List: Michael.Felt, vstinner
Priority: normal Keywords:

Created on 2021-03-30 07:30 by Michael.Felt, last changed 2021-03-31 22:20 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
OpenPGP_0x722BFDB61F396FC2.asc Michael.Felt, 2021-03-31 14:56
OpenPGP_0x722BFDB61F396FC2.asc Michael.Felt, 2021-03-31 17:20
Messages (6)
msg389797 - (view) Author: Michael Felt (Michael.Felt) * Date: 2021-03-30 07:30
Since issue43517 test_importlib 'fails' (bot status) with ENV_CHANGED.

The core dump is caused by SIGTRAP. I need help to learn how to stop the core dump from being cleaned up so I can load it into dbx and hopefully understand/learn with sub-test is actually having issues.

e.g., see https://buildbot.python.org/all/#/builders/438/builds/1031/steps/5/logs/stdio for current bot exit status

Thx for assistance.
msg389798 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-30 07:40
A core dump is a very bad sign of health.

Can you please try to bisect which test is causing the segfault using bisect_cmd? Try the command:

./python -m test.bisect_cmd -o bisect --fail-env-changed test_importlib

The command should takes 10-60 minutes, and you should see the number of tests decreasing. I hope that at the end, the command will identify a single method triggering the crash.

At the end, the create "bisect" file contains the test methods causing the bug.

Then you can re-run tests with:

./python -m test --fail-env-changed test_importlib --matchfile=bisect


> The core dump is caused by SIGTRAP. I need help to learn how to stop the core dump from being cleaned up so I can load it into dbx and hopefully understand/learn with sub-test is actually having issues.

You can re-run the test without libregrtest by running it directly. Example:

./python -m test.test_importlib test_spec.Frozen_ModuleSpecMethodsTests.test_reload -v

It should create a core dump in the current directory, or you run the command in your favorite debugger.


My notes on debugging a Python crash: https://pythondev.readthedocs.io/debug_tools.html#core-dump
msg389911 - (view) Author: Michael Felt (Michael.Felt) * Date: 2021-03-31 14:56
On 30/03/2021 09:40, STINNER Victor wrote:
> STINNER Victor <vstinner@python.org> added the comment:
>
> A core dump is a very bad sign of health.
>
> Can you please try to bisect which test is causing the segfault using bisect_cmd? Try the command:
>
> ./python -m test.bisect_cmd -o bisect --fail-env-changed test_importlib
>
> The command should takes 10-60 minutes, and you should see the number of tests decreasing. I hope that at the end, the command will identify a single method triggering the crash.
>
> At the end, the create "bisect" file contains the test methods causing the bug.
>
> Then you can re-run tests with:
>
> ./python -m test --fail-env-changed test_importlib --matchfile=bisect
I like this!:

Fortunately, it did not take 10  minutes:

Writing 1 tests into bisect

Tests (1):
* 
test.test_importlib.test_threaded_import.ThreadedImportTests.test_multiprocessing_pool_circular_import

Output written into bisect
Bisection completed in 34 iterations and 0:00:28

>
>
>> The core dump is caused by SIGTRAP. I need help to learn how to stop the core dump from being cleaned up so I can load it into dbx and hopefully understand/learn with sub-test is actually having issues.
> You can re-run the test without libregrtest by running it directly. Example:
>
> ./python -m test.test_importlib test_spec.Frozen_ModuleSpecMethodsTests.test_reload -v

While - to use your example - using the normal verbose test syntax I do 
see the test you reference:

* test_reload 
(test.test_importlib.test_spec.Frozen_ModuleSpecMethodsTests) ... ok

but, when I run your example (and many variations) I always get 
something such as:

So, still - closer - but not quite there with:

+++++

./python -m test.test_importlib 
test_spec.Frozen_ModuleSpecMethodsTests.test_reload -v
test_spec (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: test_spec (unittest.loader._FailedTest)
----------------------------------------------------------------------
AttributeError: module '__main__' has no attribute 'test_spec'

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
+++++

> It should create a core dump in the current directory, or you run the command in your favorite debugger.
>
>
> My notes on debugging a Python crash: https://pythondev.readthedocs.io/debug_tools.html#core-dump
>
> ----------
> nosy: +vstinner
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue43665>
> _______________________________________
>
msg389921 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-31 16:46
test.test_importlib.test_threaded_import.ThreadedImportTests.test_multiprocessing_pool_circular_import

This test comes from bpo-41567 and it simply runs a script. You can run the script directly:

./python -X dev Lib/test/test_importlib/partial/pool_in_threads.py ; echo $?
msg389923 - (view) Author: Michael Felt (Michael.Felt) * Date: 2021-03-31 17:20
On 31/03/2021 18:46, STINNER Victor wrote:
> STINNER Victor <vstinner@python.org> added the comment:
>
> test.test_importlib.test_threaded_import.ThreadedImportTests.test_multiprocessing_pool_circular_import
>
> This test comes from bpo-41567 and it simply runs a script. You can run 
the script directly:
>
> ./python -X dev Lib/test/test_importlib/partial/pool_in_threads.py ; echo $?

Status is zero -0 but I get a core dump!

aixtools@cpython2:[/home/aixtools/py3a-10.0]./python -X dev 
Lib/test/test_importlib/partial/pool_in_threads.py ; echo $?
0
aixtools@cpython2:[/home/aixtools/py3a-10.0]ls -ltr core
-rw-r--r--    1 aixtools staff     129435720 Mar 31 16:58 core

+++++++++++++++
Looking at the core dump - part 1.
+++++++++++++++

The address (0xd118ff80) hints at a shared library (pthread?)

I'll try to find it - but maybe you can just remind me - the CLI 
arguments to make Python more verbose about what it is doing.
Note: all addresses beginning with 0x1 (as in 0x101ca238) are program 
code addresses. 0x3 (0x30163ad8) imply a shared memory segment being 
created and used by the application.

Perhaps more relevant aree all these lines with what appear to be 
extreme argcount values, e.g., from the bottom:

_PyEval_Vector(tstate = 0x30187038, con = 0x20107b40, locals = 
0x30111768, args = 0x301219b8, argcount = 806560272, kwnames = 
0x301314d8), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = (nil), stack = (nil), nargsf = 570570720, 
kwnames = (nil)), line 361 in "call.c"
_PyEval_EvalFrameDefault(tstate = (nil), f = 0x40000000, throwflag = 

-257396708), line 1431 in "abstract.h"
_PyEval_Vector(tstate = 0x22023cf0, con = 0x2000120c, locals = 
0x22023960, args = 0x220239f0, argcount = 3508213100, kwnames = 
0x22023cf0), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x220239f0, stack = (nil), nargsf = 
570572016, kwnames = 0x220239f0), line 361 in "call.c"
method_vectorcall(method = 0x10103bdc, args = (nil), nargsf = 0, kwnames 
= 0x20045994), line 119 in "abstract.h"
PyVectorcall_Call(callable = 0xdddddddd, tuple = 0xdddddddd, kwargs = 

0xdddddddd), line 255 in "call.c"
_PyObject_Call(tstate = (nil), callable = (nil), args = (nil), kwargs = 
(nil)), line 298 in "call.c"
thread_run(boot_raw = (nil)), line 1076 in "_threadmodule.c"

pythread_wrapper(arg = (nil)), line 240 in "thread_pthread.h"

+++++++++++++++++++++++++++++++++++++++++++
Looking at the core dump - the complete 'where' results:
+++++++++++++++++++++++++++++++++++++++++++

aixtools@cpython2:[/home/aixtools/py3a-10.0]dbx ./python core
Type 'help' for help.
[using memory image in core]
reading symbolic information ...

Trace/BPT trap in _internal_error at 0xd118ff80 ($t1)
0xd118ff80 (_internal_error+0x80) 80410014            lwz r2,0x14(r1)
(dbx) where
_internal_error(??, ??, ??) at 0xd118ff80
pth_usched._usched_dispatch_front._event_notify_locked@AF34_24(??, ??, 
??, ??, ??) at 0xd119fc2c
_event_notify(??, ??, ??) at 0xd119ee38
_cond_broadcast(??, ??, ??) at 0xd11af588
pthread_cond_signal@AF29_12(??, ??) at 0xd11b1568
pthread_cond_signal(??) at 0xd11afc88
PyThread_release_lock(lock = 0x10119354), line 692 in "thread_pthread.h"
release_sentinel(wr_raw = 0x10118f14), line 1289 in "_threadmodule.c"
PyThreadState_Clear(tstate = 0x101ca238), line 873 in "pystate.c"
_PyThreadState_DeleteExcept(runtime = 0xd5c234d0, tstate = 0x22021d30), 
line 987 in "pystate.c"
_PyEval_ReInitThreads(tstate = 0x30163ad8), line 506 in "ceval.c"
PyOS_AfterFork_Child(), line 600 in "posixmodule.c"
os_fork_impl(module = 0x3033a030), line 6659 in "posixmodule.c"
cfunction_vectorcall_NOARGS(func = 0x30337330, args = 0x302ec7d0, nargsf 
= 570564304, kwnames = 0x30337328), line 485 in "methodobject.c"
_PyEval_EvalFrameDefault(tstate = 0x302e058c, f = 0x302e847a, throwflag 
= 0), line 1431 in "abstract.h"
_PyEval_Vector(tstate = 0x302f2350, con = 0x20107b40, locals = 
0x302e8478, args = 0x302e79b8, argcount = 808016112, kwnames = 
0x302cd898), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x00000008, stack = 0x3004d6e8, nargsf = 
570564752, kwnames = 0x302f6030), line 361 in "call.c"
_PyEval_EvalFrameDefault(tstate = 0x101c9554, f = 0x20107b40, throwflag 
= 570565024), line 1431 in "abstract.h"
_PyEval_Vector(tstate = 0x00000018, con = 0x200084a0, locals = 
0x220221f0, args = 0x88000284, argcount = 269576140, kwnames = 
0x2007a390), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x2007a4e0, stack = 0x20107b40, nargsf = 
570565168, kwnames = 0x20045994), line 361 in "call.c"
_PyObject_FastCallDictTstate(tstate = 0x100a7e84, callable = 0x302ed780, 
args = 0x220222b0, nargsf = 537155988, kwargs = 0x100a8a54), line 142 in 
"call.c"
_PyObject_Call_Prepend(tstate = 0x1001fa24, callable = 0x302ce5d0, obj = 
0x22022320, args = 0x20045994, kwargs = 0x100feffc), line 431 in "call.c"
slot_tp_init(self = 0x302e06f0, args = 0x30122e76, kwds = 0x30122e70), 
line 7097 in "typeobject.c"
unnamed block in type_call(type = 0x10105284, args = 0x20107b40, kwds 
= 
0x220223e0), line 1058 in "typeobject.c"
type_call(type = 0x10105284, args = 0x20107b40, kwds = 0x220223e0), 
line 
1058 in "typeobject.c"
_PyObject_MakeTpCall(tstate = 0x101194f4, callable = 0x20107b40, args 
= 
0x22022460, nargs = 537155988, keywords = 0x10167b60), line 215 in "call.c"
_PyEval_EvalFrameDefault(tstate = 0x3030e870, f = 0x3013007e, throwflag 
= 806551608), line 1429 in "abstract.h"
_PyEval_Vector(tstate = (nil), con = 0x303384a0, locals = 0x30130038, 
args = 0x3012e0d8, argcount = 806421984, kwnames = 0x301385c8), line 46 
in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x1004067c, stack = 0x20107b40, nargsf = 
570566176, kwnames = 0x00000001), line 361 in "call.c"
_PyEval_EvalFrameDefault(tstate = 0x100feffc, f = 0x3003ad12, throwflag 
= 805547256), line 1431 in "abstract.h"
_PyEval_Vector(tstate = 0x30299248, con = 0x00000002, locals = 
0x3003acf8, args = 0x3023b570, argcount = 807246344, kwnames = 
0x3023e5c8), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x302d2ca0, stack = 0x3012dd18, nargsf = 
570566624, kwnames = 0x300426a0), line 361 in "call.c"
_PyEval_EvalFrameDefault(tstate = 0x10146304, f = 0x20107b40, throwflag 
= 0), line 1431 in "abstract.h"
_PyEval_Vector(tstate = 0x30324334, con = 0x20107b40, locals = 
0x30205cb8, args = 0x30215fd0, argcount = 807427328, kwnames = 
0x3023e550), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x10144378, stack = 0x20107b40, nargsf = 
570567072, kwnames = 0x20045994), line 361 in "call.c"
_PyEval_EvalFrameDefault(tstate = 0x200405e8, f = 0x20107b40, throwflag 
= 537971048), line 1431 in "abstract.h"
_PyEval_Vector(tstate = 0x00000052, con = 0x2010c9a8, locals = 
0x2010c968, args = 0x301d9c90, argcount = 806702304, kwnames = 
0x3023e208), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x00000008, stack = 0x3023c4a8, nargsf = 
570567520, kwnames = 0x30237e30), line 361 in "call.c"
_PyEval_EvalFrameDefault(tstate = 0x101c9554, f = (nil), throwflag = 

807029456), line 1431 in "abstract.h"
_PyEval_Vector(tstate = 0x300f9e98, con = 0x30237d18, locals = (nil), 
args = 0x302968d0, argcount = 537134568, kwnames = 0x30124a78), line 46 
in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x2007a4e0, stack = 0x20107b40, nargsf = 
570567936, kwnames = 0x20045994), line 361 in "call.c"
unnamed block in _PyObject_FastCallDictTstate(tstate = 0x10167bb8, 
callable = 0x30233d50, args = 0x22022d80, nargsf = 1, kwargs = (nil)), 
line 153 in "call.c"
_PyObject_FastCallDictTstate(tstate = 0x10167bb8, callable = 0x30233d50, 
args = 0x22022d80, nargsf = 1, kwargs = (nil)), line 153 in "call.c"
_PyObject_Call_Prepend(tstate = 0x1001fa24, callable = 0x302968d0, obj = 
0x22022df0, args = 0x20045994, kwargs = 0x100feffc), line 431 in "call.c"
slot_tp_init(self = 0x3021a8d0, args = 0x20107b40, kwds = 0x22022e70), 
line 7097 in "typeobject.c"
unnamed block in type_call(type = 0x100a8684, args = 0x30206db0, kwds 
= 
0x22022eb0), line 1058 in "typeobject.c"
type_call(type = 0x100a8684, args = 0x30206db0, kwds = 0x22022eb0), 
line 
1058 in "typeobject.c"
_PyObject_MakeTpCall(tstate = 0x10119354, callable = 0x200c8404, args 
= 
0x22022f30, nargs = 537155988, keywords = 0x10167b60), line 215 in "call.c"
_PyEval_EvalFrameDefault(tstate = 0x2007a4e0, f = 0x20107b40, throwflag 
= 570568752), line 1429 in "abstract.h"
_PyEval_Vector(tstate = 0x100aa6fc, con = 0x200084a0, locals = 
0x220230a0, args = 0x20045994, argcount = 269575348, kwnames = 
0x20036690), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x302bc760, stack = 0x20107b40, nargsf = 
570568944, kwnames = 0x10458e60), line 361 in "call.c"
method_vectorcall(method = 0x30126e68, args = 0x30239030, nargsf = 
570569072, kwnames = 0x20045994), line 119 in "abstract.h"
_PyEval_EvalFrameDefault(tstate = (nil), f = 0x3012c898, throwflag = 
0), 
line 1431 in "abstract.h"
_PyEval_Vector(tstate = 0x30123bb0, con = 0x200084a0, locals = 
0x220232d0, args = 0x00000008, argcount = 352, kwnames = 0x20036690), 
line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x3012ddd8, stack = 0x30232f80, nargsf = 
570569504, kwnames = 0x20045994), line 361 in "call.c"
PyVectorcall_Call(callable = 0x100a7e84, tuple = 0x3012c898, kwargs = 

0x22023370), line 255 in "call.c"
_PyObject_Call(tstate = 0x100aa6fc, callable = 0x20107b40, args = (nil), 
kwargs = 0x20045994), line 298 in "call.c"
do_call_core(tstate = 0x302c03b0, trace_info = 0x200084a0, func = 
0x22023460, callargs = (nil), kwdict = 0x10167b60), line 5974 in "ceval.c"
_PyEval_EvalFrameDefault(tstate = 0x302bb478, f = 0x3005d44c, throwflag 
= 805688288), line 4324 in "ceval.c"
_PyEval_Vector(tstate = 0x30187198, con = 0x3005d428, locals = 
0x3005d3e0, args = 0x301219b8, argcount = 806048208, kwnames = 
0x301316b8), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = (nil), stack = (nil), nargsf = 570570272, 
kwnames = (nil)), line 361 in "call.c"
_PyEval_EvalFrameDefault(tstate = (nil), f = (nil), throwflag = 
806556888), line 1431 in "abstract.h"
_PyEval_Vector(tstate = 0x30187038, con = 0x20107b40, locals = 
0x30111768, args = 0x301219b8, argcount = 806560272, kwnames = 
0x301314d8), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = (nil), stack = (nil), nargsf = 570570720, 
kwnames = (nil)), line 361 in "call.c"
_PyEval_EvalFrameDefault(tstate = (nil), f = 0x40000000, throwflag = 

-257396708), line 1431 in "abstract.h"
_PyEval_Vector(tstate = 0x22023cf0, con = 0x2000120c, locals = 
0x22023960, args = 0x220239f0, argcount = 3508213100, kwnames = 
0x22023cf0), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x220239f0, stack = (nil), nargsf = 
570572016, kwnames = 0x220239f0), line 361 in "call.c"
method_vectorcall(method = 0x10103bdc, args = (nil), nargsf = 0, kwnames 
= 0x20045994), line 119 in "abstract.h"
PyVectorcall_Call(callable = 0xdddddddd, tuple = 0xdddddddd, kwargs = 

0xdddddddd), line 255 in "call.c"
_PyObject_Call(tstate = (nil), callable = (nil), args = (nil), kwargs = 
(nil)), line 298 in "call.c"
thread_run(boot_raw = (nil)), line 1076 in "_threadmodule.c"
pythread_wrapper(arg = (nil)), line 240 in "thread_pthread.h"
(dbx)

>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue43665>
> _______________________________________
>
msg389936 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-31 22:20
Interesting part of the traceback:

pthread_cond_signal@AF29_12(??, ??) at 0xd11b1568
...
release_sentinel
...
_PyThreadState_DeleteExcept
...
PyOS_AfterFork_Child(), line 600 in "posixmodule.c"

It sounds like a variant of bpo-40092: "Crash in _PyThreadState_DeleteExcept() at fork in the process child".

See also the parent issue: https://bugs.python.org/issue40068#msg365031
History
Date User Action Args
2021-03-31 22:20:40vstinnersetstatus: open -> closed
superseder: Crash in _PyThreadState_DeleteExcept() at fork in the process child
resolution: duplicate
stage: resolved
2021-03-31 22:20:23vstinnersetmessages: + msg389936
2021-03-31 17:20:44Michael.Feltsetfiles: + OpenPGP_0x722BFDB61F396FC2.asc

messages: + msg389923
title: [AIX] test_importlib: test_multiprocessing_pool_circular_import() killed by SIGTRAP (coredump) -> AIX: test_importlib regression (ENV change)
2021-03-31 16:47:38vstinnersettitle: AIX: test_importlib regression (ENV change) -> [AIX] test_importlib: test_multiprocessing_pool_circular_import() killed by SIGTRAP (coredump)
2021-03-31 16:46:58vstinnersetmessages: + msg389921
2021-03-31 14:56:09Michael.Feltsetfiles: + OpenPGP_0x722BFDB61F396FC2.asc

messages: + msg389911
2021-03-30 07:40:06vstinnersetnosy: + vstinner
messages: + msg389798
2021-03-30 07:30:31Michael.Feltcreate