classification
Title: Cython compiler run crashes CPython 3.1.1 and 3.2
Type: crash Stage: needs patch
Components: Interpreter Core Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, amaury.forgeotdarc, benjamin.peterson, flox, pitrou, scoder, vstinner
Priority: high Keywords: patch

Created on 2009-10-20 09:21 by scoder, last changed 2010-03-07 17:11 by benjamin.peterson. This issue is now closed.

Files
File name Uploaded Description Edit
crash-patch.patch scoder, 2009-10-20 09:21 patch against Cython that shows the crash
yield_crasher.py amaury.forgeotdarc, 2010-02-15 18:11
Messages (20)
msg94268 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2009-10-20 09:21
Running the Cython compiler under Python 3.1.1 and 3.2 (SVN) corrupts
PyThreadState->exc_value by leaving a dead reference. Printing the value
then leads to a crash.

This bug is about plain Python code, no Cython built extension modules
involved.

Steps to reproduce the problem:

- get Cython from http://hg.cython.org/cython-devel (see the bz2/zip/gz
links on the left to get an archive without doing a checkout)

- apply the attached patch, which simply prints sys.exc_info() during
the compiler run at a place where it's known to be corrupted

- execute the following to run a single test in the test suite:

   python3.1 runtests.py --no-cpp --no-pyregr --no-doctest -vv \
                                  'compile\.first_assignment'

- expect a crash after printing "HERE1"

Before exc_info gets corrupted, we make heavy use of generators to
traverse the parse tree (see Cython/Compiler/TreePath.py), while being
inside of a recursive traversal of the tree already (see the
"VisitorTransform" class and its base class in
Cython/Compiler/Visitor.py). The code section that links the two is in
the class "TreeAssertVisitor" at the end of Cython/TestUtils.py, where
the patch applies. As exc_info doesn't get corrupted during the normal
recursive tree transformation traversals before that, the generator
usage in TreePath.py is likely related to the crash.

The crash was found and reproduced under two different Linux x86 systems.
msg94269 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2009-10-20 09:24
I should add that the crash doesn't necessarily happen during the first
test run, which also converts the Cython source to Py3 using 2to3.
However, once that's done, running the test a second time crashes reliably.
msg95590 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-11-21 23:58
Have you tried using 3.1.0 and even 3.0?
msg95594 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2009-11-22 04:35
I hadn't, but it looks like the 2to3-ed Cython also runs on 3.0 now, so
I tested that, but I failed to get the procedure below to crash for me.
And that's both in 3.0 *and* 3.1.1! :-/

But I can still provoke the crash in 3.0, 3.0.1, 3.1.1 and the latest
3.2 when running the test suite normally, so it's still there, and it's
been there for a while.

At least a debug build of the latest py3k SVN rev. 76441 (3.2a0) seems
to crash reliably for the latest Cython (cython-devel, rev.
76a814a1fc57) with the attached crash patch applied. Running the
following crashes after printing the "HERE1":

    python3.2 runtests.py --no-cpp --no-pyregr --no-doctest \
         --no-cpp --no-fork --no-refnanny -vv first_assignment

with the error:
"""
python3.2: Objects/unicodeobject.c:885: PyUnicodeUCS4_FromFormatV:
Assertion `obj' failed.
Aborted
"""
which occurs when it's trying to print() the dead exception. Again, no
Cython modules have been loaded up to this point, so this is still pure
Python code that crashes the interpreter.
msg95597 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-11-22 12:59
I don't reproduce the problem on Windows. But the class NodeTypeWriter is 
not even used at all; did I miss something? The test builds a python 
extension and runs it, successfully it seems.
msg95599 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2009-11-22 14:21
The patch is supposed to apply near the end of the class
TreeAssertVisitor at the end of the file Cython/TestUtils.py, not in the
class NodeTypeWriter.

And the test doesn't run (or even import) the extension, it just builds it.
msg95603 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-11-22 15:55
Sorry, my mistake. Now the prints are there, but the test run without 
error:

Running tests against Cython 0.12.rc1
Python 3.2a0 (py3k, Nov 22 2009, 12:04:23) [MSC v.1500 32 bit (Intel)]
HERE1
(None, None, None)
HERE2
HERE1
(None, None, None)
HERE2
HERE1
(None, None, None)
HERE2
building 'first_assignment' extension
[...More output...]
msg95608 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-11-22 21:07
I reproduce the crash on Linux. Some debug prints showed that the failing 
exception object is "tp_clear'ed", because it belongs to a reference 
cycle. Now if it is cleared it should not be reachable...
msg98585 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-31 01:20
I'm unable to reproduce the crash. Can someone write a shorter code reproducing the issue? I downloaded Cython tip and applied the patch attached to this issue.

I tried py3k, py3k compiled in pydebug mode, python 3.1 branch: none crashed. Should I clear generated files before running a new test? If yes, how?
msg98608 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2010-01-31 07:39
Looks like this is one of those bugs that run away when you look too close...

I can get it to crash reliably for me with the latest py3k branch (and all Python 3.x release versions) when I run the test suite completely, so here's a new recipe. The intention is to keep CPython from importing Cython built modules to make sure it's not related to the C code that Cython generates. I thought it was when I hit this problem ages ago (1 or 2 years?), but I can now assure you, it's not. To reproduce, run:

    python3.2 runtests.py --no-cpp --no-pyregr --no-doctest \
          --no-fork --cython-only --no-refnanny -vv 'run\.'

The "--cython-only" switch makes sure the modules are not run through the C compiler, so no binary modules are built and only the pure Python code of the Cython compiler runs here. The output is a little badly formatted, but it basically prints the name of the modules it compiles and crashes after a while (also without the patch).

You don't have to do anything for cleanup, the test runner does that.

Writing a shorter code snippet is really not easy as there seem to be various things involved here, apparently including exception handling, recursion and generators.
msg98611 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2010-01-31 12:39
I'll add a couple of comments about the relevant parts of the code that appears to trigger the crash.

The code runs through the parse tree and applies transformations to it.

1) For node matching, we use a dispatcher (in Visitor.py) that looks up methods for node class names (or the nearest matching class in its MRO hierarchy), e.g. a plain "Node" type would be handled by a "visit_Node" method, whereas an ExprNode(Node) type would match "visit_ExprNode" if it exists and "visit_Node" otherwise. The handler then recursively calls back into the dispatch code to visit its children.

For performance reasons, we sometimes reuse existing methods for this through assignment, e.g. instead of reimplementing the default fallback "visit_Node" all over the place, we assign an existing superclass method to it as in "visit_Node = VisitorTransform.recurse_to_children". This may or may not have an impact on this problem (it shouldn't - normally).

So far, all of this appears to work perfectly fine also in Py3. It's the next step that seems to be required to trigger the crash.

2) To specify assertions in our test suite, we use an XPath-like language for node tests (poor man's Schematron, sort-of). The implementation in TreePath.py (based on "ElementPath" in ElementTree 1.3) makes heavy use of generators to traverse the tree. It is used by the TreeAssertVisitor in TestUtils.py, which is itself a recursive tree visitor (as described above).

So the situation is as follows. Cython traverses the parse tree recursively, several times in a pipeline, dispatching to methods that further recurse into the tree. One of the pipeline steps is the tree assertion visitor, which, on matching functions, applies the generator-based path language to the tree, that tries to find nodes by iterating over a chain of generators.

At some point in this process, somehow, sys.exc_info gets corrupted, and the doctest/unittest framework crashes the runtime with a segfault when trying to print an exception (raised unexpectedly by the "pure" test).

I hope this makes it clearer a) what happens and b) why it's not easy to cut this down into a simple test case. As the previous comments show, it's already hard enough to reproduce it reliably with a reduced test setup.

Please do not hesitate to contact me if you need any further information.
msg99357 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-15 10:27
It's still reproduceable, but without a short easy-to-study snippet it's difficult to do anything.
msg99360 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2010-02-15 11:46
I tried several times to debug it myself, but I don't understand the exception cleanup macros in ceval.c (UNWIND_EXCEPTION_HANDLER and friends, new in Py3). If someone can get me set up to debug them, I can give it another shot. I assume there are a couple of invariants involved in them? Is there any interaction with generators that I should know about?
msg99362 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-15 12:37
> I tried several times to debug it myself, but I don't understand the
> exception cleanup macros in ceval.c (UNWIND_EXCEPTION_HANDLER and
> friends, new in Py3). If someone can get me set up to debug them, I
> can give it another shot. I assume there are a couple of invariants
> involved in them? Is there any interaction with generators that I
> should know about?

The exception state (what sys.exc_info() gives you) in 3.x is lexically
scoped (which it wasn't in 2.x).
Which means that given:
    try:
        1/0
    except ZeroDivisionError:
        # A
        try:
            [][0]
        except IndexError:
            # B
            pass
        # C

The exception state in C will hold the ZeroDivisionError, not the
IndexError (it is the reverse in 2.x). For that, each try/except block
needs to save the previous exception state; it is saved on the frame's
stack. When the try/except block is left, UNWIND_EXC_HANDLER is used to
restore the previous exception state.

And, yes, there's an interaction with generators, because "yield" can be
invoked from an except block. In that case, we have to temporarily
restore the caller's exception state. See the SWAP_EXC_STATE() call in
YIELD_VALUE. Also, the generator's exception state is restored when
resuming it, which is done in line 1162 and following in ceval.c.
msg99366 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-02-15 18:11
OK, here is a short test case with no dependencies.
I kept the name of the original functions in Cython/Compiler/TreePath.py.
The script crashes because the RuntimeError has been cleared as part of a reference cycle, all its attributes are NULL but it is still referenced in exc_info()...
msg99368 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-02-15 19:28
Is it normal that even after the call to next(), the generator frame contains a reference to the RuntimeError?

Long explanation:
- When the function exits, there is a cycle [parent frame]-> ['it' variable]-> [generator frame]-> [RuntimeError]-> [parent frame].
- gc.collect() collects them all, destroys the generator, this causes GeneratorExit exception to be sent into the generator frame.
- This executes SWAP_EXC_STATE() in ceval.c ("/* We were in an except handler when we left, restore the exception state which was put aside */"), this *moves* the reference to the exception from the frame to the thread state.
- The exception is resurrected, but this does not prevent the gc from calling its tp_clear method! and setting exc_info() as a side-effect of gc.collect() is really weird...
msg99370 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-15 19:54
> Is it normal that even after the call to next(), the generator frame 
> contains a reference to the RuntimeError?

As long as it's executing the generator it's quite normal. The generator must be able to restore the caller's RuntimeError when interrupted by a yield. But after the yield, the generator normally doesn't retain a reference (thanks to SWAP_EXC_STATE() in YIELD_VALUE).

> - This executes SWAP_EXC_STATE() in ceval.c ("/* We were in an except handler when we left, restore the exception state which was put aside */"), this *moves* the reference to the exception from the frame to the thread state.

Actually, it (does/should) execute SAVE_EXC_STATE() instead, because the thread's exception state was empty when yielding.

The following patch seems to do the trick:

diff -r 4465a45b8876 Python/ceval.c
--- a/Python/ceval.c	Mon Feb 15 09:35:16 2010 +0100
+++ b/Python/ceval.c	Mon Feb 15 20:51:58 2010 +0100
@@ -1159,7 +1159,7 @@ PyEval_EvalFrameEx(PyFrameObject *f, int
 	assert(stack_pointer != NULL);
 	f->f_stacktop = NULL;	/* remains NULL unless yield suspends frame */
 
-	if (f->f_code->co_flags & CO_GENERATOR) {
+	if (!throwflag && (f->f_code->co_flags & CO_GENERATOR)) {
 		if (f->f_exc_type != NULL && f->f_exc_type != Py_None) {
 			/* We were in an except handler when we left,
 			   restore the exception state which was put aside
msg99372 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-15 21:05
Also, perhaps BaseException_str() should be more robust against a NULL args member.
msg99419 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2010-02-16 16:47
Antoine's patch totally works for me. Cython just ran its entire test suite on a patched 3.2 HEAD without crash.

Thanks everyone! Hope this will make it into 3.1.2.

Stefan
msg100593 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-03-07 17:11
Fixed in r78766.
History
Date User Action Args
2010-03-07 17:11:07benjamin.petersonsetstatus: open -> closed
resolution: fixed
messages: + msg100593
2010-02-16 16:47:15scodersetmessages: + msg99419
2010-02-15 21:05:22pitrousetmessages: + msg99372
2010-02-15 20:17:01floxsetpriority: high
nosy: + flox

stage: needs patch
2010-02-15 19:54:38pitrousetmessages: + msg99370
2010-02-15 19:28:08amaury.forgeotdarcsetmessages: + msg99368
2010-02-15 18:11:18amaury.forgeotdarcsetfiles: + yield_crasher.py
nosy: + benjamin.peterson
messages: + msg99366

2010-02-15 12:37:40pitrousetmessages: + msg99362
2010-02-15 11:46:55scodersetmessages: + msg99360
2010-02-15 10:27:26pitrousetmessages: + msg99357
2010-01-31 12:39:04scodersetmessages: + msg98611
2010-01-31 07:39:07scodersetmessages: + msg98608
2010-01-31 01:20:46vstinnersetnosy: + vstinner
messages: + msg98585
2009-11-22 21:07:56amaury.forgeotdarcsetmessages: + msg95608
2009-11-22 15:55:42amaury.forgeotdarcsetmessages: + msg95603
2009-11-22 14:21:54scodersetmessages: + msg95599
2009-11-22 12:59:35amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg95597
2009-11-22 04:35:29scodersetmessages: + msg95594
2009-11-21 23:58:41pitrousetnosy: + pitrou
messages: + msg95590
2009-11-21 19:38:44Arfreversetnosy: + Arfrever
2009-10-20 09:24:22scodersetmessages: + msg94269
2009-10-20 09:21:15scodercreate