classification
Title: `RecursionError` during deallocation
Type: Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, andrewvaughanj
Priority: normal Keywords:

Created on 2021-03-01 10:47 by andrewvaughanj, last changed 2021-04-12 14:16 by andrewvaughanj. This issue is now closed.

Files
File name Uploaded Description Edit
python_bt.txt.gz andrewvaughanj, 2021-03-01 10:47
Messages (6)
msg387852 - (view) Author: Andrew V. Jones (andrewvaughanj) * Date: 2021-03-01 10:47
I am currently working with "porting" some code from Python 2.7.14 to Python 3.7.5, but the process running the Python code seems to terminate in the following way:

```
#0  0x00002aaaaef63337 in raise () from /lib64/libc.so.6
#1  0x00002aaaaef64a28 in abort () from /lib64/libc.so.6
#2  0x00002aaaae726e18 in fatal_error (prefix=0x0, msg=0x2aaaae8091f0 "Cannot recover from stack overflow.", status=-1) at Python/pylifecycle.c:2187
#3  0x00002aaaae727603 in Py_FatalError (msg=0x9bf0 <Address 0x9bf0 out of bounds>) at Python/pylifecycle.c:2197
#4  0x00002aaaae6ede2b in _Py_CheckRecursiveCall (where=<optimized out>) at Python/ceval.c:489
#5  0x00002aaaae62b61d in _PyMethodDef_RawFastCallDict (method=0x2aaaaeae2740 <textiowrapper_methods+160>, self=0x2aaabb1d4d70, args=0x0, nargs=0, kwargs=0x0) at Objects/call.c:464
#6  0x00002aaaae62b6a9 in _PyCFunction_FastCallDict (func=0x2aaabeaa5690, args=0x6, nargs=0, kwargs=0x0) at Objects/call.c:586
#7  0x00002aaaae62c56c in _PyObject_CallFunctionVa (callable=0x9bf0, format=<optimized out>, va=<optimized out>, is_size_t=<optimized out>) at Objects/call.c:935
#8  0x00002aaaae62cc80 in callmethod (is_size_t=<optimized out>, va=<optimized out>, format=<optimized out>, callable=<optimized out>) at Objects/call.c:1031
#9  _PyObject_CallMethodId (obj=<optimized out>, name=<optimized out>, format=0x0) at Objects/call.c:1100
#10 0x00002aaaae724c51 in flush_std_files () at Python/pylifecycle.c:1083
#11 0x00002aaaae72704f in fatal_error (prefix=0x0, msg=<optimized out>, status=-1) at Python/pylifecycle.c:2175
#12 0x00002aaaae727603 in Py_FatalError (msg=0x9bf0 <Address 0x9bf0 out of bounds>) at Python/pylifecycle.c:2197
#13 0x00002aaaae6ede2b in _Py_CheckRecursiveCall (where=<optimized out>) at Python/ceval.c:489
#14 0x00002aaaae62ba3d in _PyObject_FastCallDict (callable=0x2aaabeab8790, args=<optimized out>, nargs=<optimized out>, kwargs=0x0) at Objects/call.c:120
#15 0x00002aaaae62c2f0 in object_vacall (callable=0x2aaabeab8790, vargs=0x7ffffff54d40) at Objects/call.c:1202
#16 0x00002aaaae62c3fd in PyObject_CallFunctionObjArgs (callable=0x9bf0) at Objects/call.c:1267
#17 0x00002aaaae6c1bf0 in PyObject_ClearWeakRefs (object=<optimized out>) at Objects/weakrefobject.c:872
#18 0x00002aaaae4b26f6 in instance_dealloc () from /home/LOCAL/avj/build/vc21__90601_pyedg_improvements/vc/lib64/libboost_python37.so.1.69.0
#19 0x00002aaaae67c3e0 in subtype_dealloc (self=0x2aaabeab9e40) at Objects/typeobject.c:1176
#20 0x00002aaaae4ba63f in life_support_call () from /home/LOCAL/avj/build/vc21__90601_pyedg_improvements/vc/lib64/libboost_python37.so.1.69.0
#21 0x00002aaaae62b9c4 in _PyObject_FastCallDict (callable=0x2aaabeab87b0, args=<optimized out>, nargs=<optimized out>, kwargs=0x0) at Objects/call.c:125
#22 0x00002aaaae62c2f0 in object_vacall (callable=0x2aaabeab87b0, vargs=0x7ffffff54fd0) at Objects/call.c:1202
#23 0x00002aaaae62c3fd in PyObject_CallFunctionObjArgs (callable=0x9bf0) at Objects/call.c:1267
#24 0x00002aaaae6c1bf0 in PyObject_ClearWeakRefs (object=<optimized out>) at Objects/weakrefobject.c:872
#25 0x00002aaaae4b26f6 in instance_dealloc () from /home/LOCAL/avj/build/vc21__90601_pyedg_improvements/vc/lib64/libboost_python37.so.1.69.0
#26 0x00002aaaae67c3e0 in subtype_dealloc (self=0x2aaabeab9e90) at Objects/typeobject.c:1176
#27 0x00002aaaae4ba63f in life_support_call () from /home/LOCAL/avj/build/vc21__90601_pyedg_improvements/vc/lib64/libboost_python37.so.1.69.0
#28 0x00002aaaae62b9c4 in _PyObject_FastCallDict (callable=0x2aaabeab87d0, args=<optimized out>, nargs=<optimized out>, kwargs=0x0) at Objects/call.c:125
#29 0x00002aaaae62c2f0 in object_vacall (callable=0x2aaabeab87d0, vargs=0x7ffffff55260) at Objects/call.c:1202
#30 0x00002aaaae62c3fd in PyObject_CallFunctionObjArgs (callable=0x9bf0) at Objects/call.c:1267
#31 0x00002aaaae6c1bf0 in PyObject_ClearWeakRefs (object=<optimized out>) at Objects/weakrefobject.c:872
#32 0x00002aaaae4b26f6 in instance_dealloc () from /home/LOCAL/avj/build/vc21__90601_pyedg_improvements/vc/lib64/libboost_python37.so.1.69.0
#33 0x00002aaaae67c3e0 in subtype_dealloc (self=0x2aaabeab9ee0) at Objects/typeobject.c:1176
#34 0x00002aaaae4ba63f in life_support_call () from /home/LOCAL/avj/build/vc21__90601_pyedg_improvements/vc/lib64/libboost_python37.so.1.69.0
```

This is only the inner most 35 frames -- the actual back-trace is 7375 frames deep, and ends with:

```
#7358 0x00002aaaae4b26f6 in instance_dealloc () from /home/LOCAL/avj/build/vc21__90601_pyedg_improvements/vc/lib64/libboost_python37.so.1.69.0
#7359 0x00002aaaae67c3e0 in subtype_dealloc (self=0x2aaabeaefdf0) at Objects/typeobject.c:1176
#7360 0x00002aaaae6f2f46 in _PyEval_EvalFrameDefault (f=0x2aaabce48b30, throwflag=39920) at Python/ceval.c:1098
#7361 0x00002aaaae62a959 in function_code_fastcall (co=<optimized out>, args=0x7fffffffd088, nargs=1, globals=<optimized out>) at Objects/call.c:283
#7362 0x00002aaaae62ae44 in _PyFunction_FastCallDict (func=0x2aaabda07950, args=0x7fffffffd080, nargs=1, kwargs=0x0) at Objects/call.c:322
#7363 0x00002aaaae62bbea in _PyObject_Call_Prepend (callable=0x2aaabda07950, obj=0x2aaabea92590, args=0x2aaabb193050, kwargs=0x0) at Objects/call.c:908
#7364 0x00002aaaae62b9c4 in _PyObject_FastCallDict (callable=0x2aaabb253a50, args=<optimized out>, nargs=<optimized out>, kwargs=0x0) at Objects/call.c:125
#7365 0x00002aaaae62c677 in _PyObject_CallFunctionVa (callable=0x2aaabb253a50, format=<optimized out>, va=<optimized out>, is_size_t=<optimized out>) at Objects/call.c:956
#7366 0x00002aaaae62c93a in PyEval_CallFunction (callable=0x9bf0, format=0x9bf0 <Address 0x9bf0 out of bounds>, format@entry=0x2aaaaaedba92 "()") at Objects/call.c:998
#7367 0x00002aaaaae6ae16 in boost::python::call<boost::python::api::object> (callable=<optimized out>) at /home/BUILD64/lib/boost-1.69.0-py37/include/boost/python/call.hpp:56
#7368 0x00002aaaaae6ae5a in boost::python::api::object_operators<boost::python::api::proxy<boost::python::api::attribute_policies> >::operator() (this=<optimized out>) at /home/BUILD64/lib/boost-1.69.0-py37/include/boost/python/object_core.hpp:440
#7369 0x00002aaaabc8b287 in PyEDGInterface::py_backend (this=<optimized out>) at /home/Users/avj/vector/source/vc21__90601_pyedg_improvements/lib/libcommoncpp/src/PyEDGInterface.cpp:192
#7370 0x00002aaaabc8c136 in PyEDGInterface::backend () at /home/Users/avj/vector/source/vc21__90601_pyedg_improvements/lib/libcommoncpp/inc/PyEDGInterface.h:38
#7371 0x00000000004195b5 in back_end () at /home/Users/avj/vector/source/vc21__90601_pyedg_improvements/progs/pyedg/main.cpp:77
#7372 0x000000000050ab21 in cfe_main (argc=argc@entry=9, argv=argv@entry=0x7fffffffd6c8) at /home/Users/avj/vector/source/vc21__90601_pyedg_improvements/progs/edg/src/cfe.cpp:141
#7373 0x000000000050abda in edg_main (argc=argc@entry=9, argv=argv@entry=0x7fffffffd6c8) at /home/Users/avj/vector/source/vc21__90601_pyedg_improvements/progs/edg/src/cfe.cpp:202
#7374 0x0000000000419752 in main (argc=9, argv=0x7fffffffd6c8) at /home/Users/avj/vector/source/vc21__90601_pyedg_improvements/progs/pyedg/main.cpp:43
```

Where `progs/pyedg/main.cpp` is our `main` and uses an embedded Python interpreter (either 2.7.14 or 3.7.5).

The application actually terminates with printed to stderr:

```
Exception ignored in: <Boost.Python.life_support object at 0x2acf473c5d10>
RecursionError: maximum recursion depth exceeded while calling a Python object
Fatal Python error: Cannot recover from stack overflow.
```

The code that is running does not (itself) have any loops -- it simply walks a linked list (of length ~1400) returned via Boost Python. When moving to the next element of the list, the previous element should be "unreachable garbage" (indeed, inspecting gc.get_referrers/gc.get_referents gives 0).

I've attached the whole back-trace to this issue, and it seems like, when recursing, the `current` argument to `PyObject_ClearWeakRefs` is different (i.e., it doesn't seem to be an infinite recursion, just a very _deep_ recursion when deallocating).

Some other observations:

   1) If I increase the size of the stack (using `sys.setrecursionlimit` set to a "suitably large" value), then the process complete successfully

   2) The value of `ulimit -s` makes no difference

   3) If I run the same code, with the same Boost Python bindings, except targetting Python 2.7.14 the process completes successfully

Right now, I am not able to provide a simple reproducer, but I am wondering if this is a bug I've hit in Python 3.7.5 (maybe it is fixed by https://bugs.python.org/issue38006, which seems very similar) or if this is my "user code" that is doing something weird.

If this appears to be a new bug, I will do my utmost to create a reproducer for it, but if the cause is obvious without it, then that would be helpful (the reproducer is tied to a proprietary code, so will be hard to extricate).

One thing I will try is to update our version of Python to 3.9.2 and see the issue is still there, after the fix for #38006.
msg387862 - (view) Author: Andrew V. Jones (andrewvaughanj) * Date: 2021-03-01 13:08
Here's some representative code that triggers the issue:

```
def loop():
    a_node = boost_python_library.get_linked_list()
    all_elems = []
    while a_node is not None:
        #
        # Uncomment the below to make the crash disappear
        #
        # all_elems.append(a_node)
        #
        a_node = a_node.next
```

The *really* interesting bit is that if we save what comes off the Boost.Python linked list into a Python-proper list (`all_elems` as above), then the crash goes away.
msg387864 - (view) Author: Andrew V. Jones (andrewvaughanj) * Date: 2021-03-01 13:33
Same logic, but this crashes:

```
def loop():
    a_node = boost_python_library.get_linked_list()
    temp = []
    while True:
        assert a_node is not None
        temp.append(a_node)
        prev = a_node   # <-- comment this out to make the crash go away
        a_node = a_node.next
        if not a_node:
            break
```
msg387904 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-03-02 10:24
Can you reproduce this without boost, or for 3.9 or 3.10a?

3.7 is in security-fix only mode.
https://www.python.org/dev/peps/pep-0537/
msg387914 - (view) Author: Andrew V. Jones (andrewvaughanj) * Date: 2021-03-02 11:19
> Can you reproduce this without boost
>

I have no way of triggering this without boost.

> or for 3.9 or 3.10a?
>

I'm waiting for confirmation from member of my engineering team on this; but I believe the answer here is "no" and that this might not appear with 3.9.2.

Is it reasonable that this might have been fixed (either intentionally or by luck) between 3.7.5 and 3.9.2?
msg390852 - (view) Author: Andrew V. Jones (andrewvaughanj) * Date: 2021-04-12 14:16
For us, this issue was resolved with moving to 3.9.2.

I have closed it as it seems it was an "accidentally fixed" bug.
History
Date User Action Args
2021-04-12 14:16:38andrewvaughanjsetstatus: open -> closed

messages: + msg390852
stage: resolved
2021-03-02 11:19:17andrewvaughanjsetmessages: + msg387914
2021-03-02 10:24:45Mark.Shannonsetmessages: + msg387904
2021-03-01 22:17:33iritkatrielsetnosy: + Mark.Shannon
2021-03-01 13:33:50andrewvaughanjsetmessages: + msg387864
2021-03-01 13:08:49andrewvaughanjsetmessages: + msg387862
2021-03-01 10:47:12andrewvaughanjcreate