classification
Title: C-level destructor in PySide2 breaks gen_send_ex, which assumes it's safe to call Py_DECREF with a live exception
Type: Stage: resolved
Components: Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, lemburg, njs, tim.peters, twouters, yselivanov
Priority: normal Keywords:

Created on 2020-05-27 03:56 by njs, last changed 2020-05-28 08:42 by njs. This issue is now closed.

Messages (4)
msg370046 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2020-05-27 03:56
Consider the following short program. demo() is a trivial async function that creates a QObject instance, connects a Python signal, and then exits. When we call `send(None)` on this object, we expect to get a StopIteration exception.

-----

from PySide2 import QtCore

class MyQObject(QtCore.QObject):
    sig = QtCore.Signal()

async def demo():
    myqobject = MyQObject()
    myqobject.sig.connect(lambda: None)
    return 1

coro = demo()
try:
    coro.send(None)
except StopIteration as exc:
    print(f"OK: got {exc!r}")
except SystemError as exc:
    print(f"WTF: got {exc!r}")

-----

Actual output (tested on 3.8.2, but I think the code is present on all versions):

-----
StopIteration: 1
WTF: got SystemError("<method 'send' of 'coroutine' objects> returned NULL without setting an error")
-----

So there are two weird things here: the StopIteration exception is being printed on the console for some reason, and then the actual `send` method is raising SystemError instead of StopIteration.

Here's what I think is happening:

In genobject.c:gen_send_ex, when the coroutine finishes, we call _PyGen_SetStopIterationValue to raise the StopIteration exception:

https://github.com/python/cpython/blob/404b23b85b17c84e022779f31fc89cb0ed0d37e8/Objects/genobject.c#L241

Then, after that, gen_send_ex clears the frame object and drops references to it:

https://github.com/python/cpython/blob/404b23b85b17c84e022779f31fc89cb0ed0d37e8/Objects/genobject.c#L266-L273

At this point, the reference count for `myqobject` drops to zero, so its destructor is invoked. And this destructor ends up clearing the current exception again. Here's a stack trace:

-----
#0  0x0000000000677eb7 in _PyErr_Fetch (p_traceback=0x7ffd9fda77d0, 
    p_value=0x7ffd9fda77d8, p_type=0x7ffd9fda77e0, tstate=0x2511280)
    at ../Python/errors.c:399
#1  _PyErr_PrintEx (tstate=0x2511280, set_sys_last_vars=1) at ../Python/pythonrun.c:670
#2  0x00007f1afb455967 in PySide::GlobalReceiverV2::qt_metacall(QMetaObject::Call, int, void**) ()
   from /home/njs/.user-python3.8/lib/python3.8/site-packages/PySide2/libpyside2.abi3.so.5.14
#3  0x00007f1afaf2f657 in void doActivate<false>(QObject*, int, void**) ()
   from /home/njs/.user-python3.8/lib/python3.8/site-packages/PySide2/Qt/lib/libQt5Core.so.5
#4  0x00007f1afaf2a37f in QObject::destroyed(QObject*) ()
   from /home/njs/.user-python3.8/lib/python3.8/site-packages/PySide2/Qt/lib/libQt5Core.so.5
#5  0x00007f1afaf2d742 in QObject::~QObject() ()
   from /home/njs/.user-python3.8/lib/python3.8/site-packages/PySide2/Qt/lib/libQt5Core.so.5
#6  0x00007f1afb852681 in QObjectWrapper::~QObjectWrapper() ()
   from /home/njs/.user-python3.8/lib/python3.8/site-packages/PySide2/QtCore.abi3.so
#7  0x00007f1afbf785bb in SbkDeallocWrapperCommon ()
   from /home/njs/.user-python3.8/lib/python3.8/site-packages/shiboken2/libshiboken2.abi3.so.5.14
#8  0x00000000005a4fbc in subtype_dealloc (self=<optimized out>)
    at ../Objects/typeobject.c:1289
#9  0x00000000005e8c08 in _Py_Dealloc (op=<optimized out>) at ../Objects/object.c:2215
#10 _Py_DECREF (filename=0x881795 "../Objects/frameobject.c", lineno=430, 
    op=<optimized out>) at ../Include/object.h:478
#11 frame_dealloc (f=Frame 0x7f1afc572dd0, for file qget-min.py, line 12, in demo ())
    at ../Objects/frameobject.c:430
#12 0x00000000004fdf30 in _Py_Dealloc (
    op=Frame 0x7f1afc572dd0, for file qget-min.py, line 12, in demo ())
    at ../Objects/object.c:2215
#13 _Py_DECREF (filename=<synthetic pointer>, lineno=279, 
    op=Frame 0x7f1afc572dd0, for file qget-min.py, line 12, in demo ())
    at ../Include/object.h:478
#14 gen_send_ex (gen=0x7f1afbd08440, arg=<optimized out>, exc=<optimized out>, 
    closing=<optimized out>) at ../Objects/genobject.c:279

------

We can read the source for PySide::GlobalReceiverV2::qt_metacall here: https://sources.debian.org/src/pyside2/5.13.2-3/sources/pyside2/libpyside/globalreceiverv2.cpp/?hl=310#L310

And we see that it (potentially) runs some arbitrary Python code, and then handles any exceptions by doing:

if (PyErr_Occurred()) {
    PyErr_Print();
}

This is intended to catch exceptions caused by the code it just executed, but in this case, gen_send_ex ends up invoking it with an exception already active, so PySide2 gets confused and clears the StopIteration.

-----------------------------------

OK so... what to do. I'm actually not 100% certain whether this is a CPython bug or a PySide2 bug.

In PySide2, it could be worked around by saving the exception state before executing that code, and then restoring it afterwards.

In gen_send_ex, it could be worked around by dropping the reference to the frame before setting the StopIteration exception.

In CPython in general, it could be worked around by not invoking deallocators with a live exception... I'm actually pretty surprised that this is even possible! It seems like having a live exception when you start executing arbitrary Python code would be bad. So maybe that's the real bug? Adding both "asyncio" and "memory management" interest groups to the nosy.
msg370052 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2020-05-27 07:27
On 27.05.2020 05:56, Nathaniel Smith wrote:
> In CPython in general, it could be worked around by not invoking deallocators with a live exception... I'm actually pretty surprised that this is even possible! It seems like having a live exception when you start executing arbitrary Python code would be bad. So maybe that's the real bug? Adding both "asyncio" and "memory management" interest groups to the nosy.

Exception handlers can execute arbitrary Python code, so it's not
surprising that objects get allocated, deallocated, etc.

What you're describing sounds more like a problem with the PySide2
code not being reentrant. Clearing exceptions always has to be done
with some care. It's normally only applied to replace the exception
with a more specific one, when the exception is expected and handled
in the C code, or when there is no way to report the exception back
up the stack.

Note: Even the PyErr_Print() can result in Python code being
executed and because it's likely that PySide2 objects are part
of the stack trace, even PySide2 methods may be called as a result.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, May 27 2020)
>>> Python Projects, Coaching and Support ...    https://www.egenix.com/
>>> Python Product Development ...        https://consulting.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               https://www.egenix.com/company/contact/
                     https://www.malemburg.com/
msg370169 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2020-05-28 07:51
I don't think I understand what you mean by "reentrant"... we don't have any single piece of code that's calling back into itself. The question is about what the tp_dealloc contract is.

Digging into it more, it looks like the standard is for typeobject.c:slot_tp_finalize to save/restore exceptions when invoking Python-level __del__ methods, rather than it being the responsibility of the tp_dealloc or tp_finalizer caller. (And finding that code also answers my next question, which was going to be whether there are any other dance steps you have to do when re-entering CPython from a tp_dealloc!)

So I guess this is a PySide2 bug.
msg370178 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2020-05-28 08:42
Filed with PySide2: https://bugreports.qt.io/browse/PYSIDE-1313
History
Date User Action Args
2020-05-28 08:42:20njssetstatus: open -> closed
resolution: not a bug
messages: + msg370178

stage: resolved
2020-05-28 07:51:18njssetmessages: + msg370169
2020-05-27 07:27:20lemburgsetmessages: + msg370052
2020-05-27 03:56:31njscreate