classification
Title: segfault during shutdown attempting to log ResourceWarning
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, emptysquare, pitrou, serhiy.storchaka, vstinner, xdegaye
Priority: normal Keywords: patch

Created on 2014-11-18 21:26 by emptysquare, last changed 2017-10-26 15:54 by xdegaye. This issue is now closed.

Files
File name Uploaded Description Edit
dump.txt emptysquare, 2014-11-18 21:26 Core dump
runtimerror_singleton.py vstinner, 2014-11-19 16:32
warn.patch vstinner, 2014-11-19 16:32 review
runtimerror_singleton_2.py xdegaye, 2014-11-25 20:51
warn_2.patch xdegaye, 2014-11-25 20:51
warn_3.patch xdegaye, 2014-11-26 11:33
warn_4.patch xdegaye, 2014-11-26 13:57
warn_5.patch xdegaye, 2014-11-26 20:55
remove_singleton.patch xdegaye, 2014-11-29 13:44
runtimerror_singleton_3.py xdegaye, 2014-12-01 11:20
mymodule.c xdegaye, 2014-12-01 11:20
setup.py xdegaye, 2014-12-01 11:21
immutable-recursion-error-inst.diff serhiy.storchaka, 2017-06-11 15:24 review
Pull Requests
URL Status Linked Edit
PR 1981 closed xdegaye, 2017-06-07 11:37
Messages (26)
msg231345 - (view) Author: A. Jesse Jiryu Davis (emptysquare) * Date: 2014-11-18 21:26
Running a unittest suite for Motor, my MongoDB driver which uses PyMongo, greenlet, and Tornado. The suite commonly segfaults during interpreter shutdown. I've reproduced this crash with Python 3.3.5, 3.4.1, and 3.4.2. Python 2.6 and 2.7 do *not* crash. The Python interpreters are all built like:

./configure --prefix=/mnt/jenkins/languages/python/rX.Y.Z --enable-shared && make && make install

This is Amazon Linux AMI release 2014.09.

The unittest suite's final output is:

----------------------------------------------------------------------
Ran 15 tests in 265.947s

OK
Segmentation fault (core dumped)

Backtrace from a Python 3.4.2 coredump attached.
msg231347 - (view) Author: A. Jesse Jiryu Davis (emptysquare) * Date: 2014-11-18 21:37
The crash can ignore whether or not I specify "-Wignore" on the python command line. I was hoping to avoid the crash by short-circuiting the ResourceWarning code path, since the following line appears in the backtrace:

#5  PyErr_WarnFormat (category=<optimized out>, stack_level=stack_level@entry=1, format=format@entry=0x7f5f1ca8b377 "unclosed file %R") at Python/_warnings.c:813

But "-Wignore" has no effect.
msg231360 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-19 08:20
Looks as globals in setup_context() is NULL. I suppose it is PyThreadState_Get()->interp->sysdict. interp->sysdict is cleared in PyInterpreterState_Clear(). Other code is executed after setting interp->sysdict to NULL (clearing interp->sysdict content, interp->builtins, interp->builtins_copy and interp->importlib) and this potentially can emit warnings.
msg231389 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-11-19 16:32
It looks like the problem is that raising the PyExc_RecursionErrorInst singleton creates a traceback object which contains frames. The singleton keeps the frames alive longer than expected.

I tried to write a script to raise this singleton, but it looks like the local variables of the frames are not deleted, even if frames are deleted (by _PyExc_Fini).

You may try to finish my attached runtimerror_singleton.py script.
msg231390 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-11-19 16:32
Oh, I also wrote a draft of patch fixing the issue, but I was unable to reproduce the issue. See attached warn.patch (not tested).
msg231683 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2014-11-25 20:51
The attached script raises the PyExc_RecursionErrorInst singleton and reproduces the issue.
The attached patch fixes the issue by ignoring the warning when clearing PyExc_RecursionErrorInst and clearing the frames associated with its traceback, in _PyExc_Fini().
msg231686 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-11-25 21:03
+    /* during Python finalization, warnings may be emited after interp->sysdict
+       is cleared: see issue #22898 */

I would prefer to see this comment in the else block.
msg231705 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2014-11-26 11:33
> +    /* during Python finalization, warnings may be emited after interp->sysdict
> +       is cleared: see issue #22898 */
>
> I would prefer to see this comment in the else block.

Indeed.
New updated patch attached.
msg231706 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-26 12:08
Why recursion limit is restored? Couldn't the test be simpler without it?

%a should be used instead of '%s' (file path can contain backslashes).

And it would be more robust to open file in binary mode (may be even in non-buffered). It can contain non-ascii characters.

May be the test should be marked as CPython only.

To check that script is executed at all we should print something from it and than test the out. Otherwise syntax error in script will make all test passed.
msg231710 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2014-11-26 13:57
> Why recursion limit is restored? Couldn't the test be simpler without it?

For the sake of explicitness, so that the interpreter will not raise a RuntimeError during finalization when checking for the recursion limit after g.throw(MyException) has raised PyExc_RecursionErrorInst.


> %a should be used instead of '%s' (file path can contain backslashes).
> And it would be more robust to open file in binary mode (may be even in non-buffered). It can contain non-ascii characters.
> May be the test should be marked as CPython only.
> To check that script is executed at all we should print something from it and than test the out. Otherwise syntax error in script will make all test passed.

Thanks.
New patch attached.

The reason why the PyExc_RecursionErrorInst singleton keeps the frames alive longer than expected is that the reference count of the PyExc_RecursionErrorInst static variable never reaches zero until _PyExc_Fini().  So decrementing the reference count of this exception after the traceback has been printed in PyErr_PrintEx() does not decrement the reference count of its traceback attribute (as it is the case with the other exceptions) and the traceback is not freed. The following patch to PyErr_PrintEx() does that.  With this new patch and without the changes made by warn_4.patch, the interpreter does not crash with the runtimerror_singleton_2.py reproducer and the ResourceWarning is now printed instead of being ignored as with the warn_4.patch:

diff --git a/Python/pythonrun.c b/Python/pythonrun.c
--- a/Python/pythonrun.c
+++ b/Python/pythonrun.c
@@ -1876,6 +1876,8 @@
         PyErr_Display(exception, v, tb);
     }
     Py_XDECREF(exception);
+    if (v == PyExc_RecursionErrorInst)
+        Py_CLEAR(((PyBaseExceptionObject *)v)->traceback);
     Py_XDECREF(v);
     Py_XDECREF(tb);
 }

If both patches were to be included, the test case in warn_4.patch would test the above patch and not the changes made in Python/_warnings.c.
msg231717 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-26 15:50
out can be b'Done.\r\n'. Use self.assertIn.

> If both patches were to be included, the test case in warn_4.patch would test the above patch and not the changes made in Python/_warnings.c.

You can test err for warning message.

The traceback should be cleared before decrementing the reference count. And only if Py_REFCNT(v) is 2.
msg231721 - (view) Author: A. Jesse Jiryu Davis (emptysquare) * Date: 2014-11-26 16:43
With warn_4.patch applied I can no longer reproduce my original segfault, looks like the fix works.
msg231729 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2014-11-26 20:55
> out can be b'Done.\r\n'. Use self.assertIn.

Ok, new patch attached.


>> If both patches were to be included, the test case in warn_4.patch would test the above patch and not the changes made in Python/_warnings.c.
> You can test err for warning message.

In the case where PyExc_RecursionErrorInst would not leak frames, the code path followed by the test case would not run any of the changes made in _warnings.c.


> The traceback should be cleared before decrementing the reference count. And only if Py_REFCNT(v) is 2.

I believe that attempting to fix the frames leak by clearing the traceback implies the following changes:
    * The previous patch to PyErr_PrintEx()
    * v->context and v->cause should also be tested against equality with PyExc_RecursionErrorInst.
    * In PyErr_PrintEx() the variable v2 may also be PyExc_RecursionErrorInst.
    * The sames change should also be done when freeing the exception value at least:
        in PyErr_WriteUnraisable() called when an exception occurs during finalization
        in PyErr_Restore()
        [1] when sys.last_value is cleared in PyImport_Cleanup()
And that would miss the case [1] where sys.last_value is set to None by some python user code.

Note [1]:
Unless it is acceptable to clear the PyExc_RecursionErrorInst traceback even when sys.last_value has been set (then Py_REFCNT(v) is 3 instead of 2).

Not sure if this is worth the trouble.
msg231860 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2014-11-29 13:44
Out of curiosity I have tried to figure out how to build another test case using the model provided by runtimerror_singleton.py. This cannot be done, and for the following reasons:

The infinite recursion of PyErr_NormalizeException() is supposed to occur as follows: when a RuntimeError caused by recursion is normalized, PyErr_NormalizeException() calls the RuntimeError class to instantiate the exception, the recursion limit is reached again, triggering a new RuntimeError that needs also to be normalized causing PyErr_NormalizeException() to recurse infinitely.

But the low/high water mark level heuristic of the anti-recursion protection mechanism described in a comment of ceval.h prevents this. Let's assume the infinite recursion is possible:
* At iteration 'n' of the infinite recursion, the instantiation of the RuntimeError exception fails because of recursion with a new RuntimeError and tstate->overflowed is true: PyErr_NormalizeException() recurses.
* At iteration 'n + 1', the instantiation of this new RuntimeError is successfull because the recursion level is not checked when tstate->overflowed is true: the recursion of PyErr_NormalizeException() terminates and infinite recursion is not possible.

This explains the paradox that, if you remove entirely the check against infinite recursion in PyErr_NormalizeException(), then the runtimerror_singleton_2.py reproducer does not crash and the ResourceWarning is printed even though the recursion limit has been reached.

The attached patch implements this fix, includes the previous changes in _warning.c, and moves the test case to test_exceptions.

History (for reference):
The PyExc_RecursionErrorInst singleton was added by svn revision 58032 [1] to fix the issue titled "a bunch of infinite C recursions" [2].
In parallel, changeset cd125fe83051 [3] added the 'overflowed' member to the thread state.
Interestingly changeset cd125fe83051 was committed before revision 58032, but the whole discussion on issue [2] took place well before this commit was done, and so the fact that the infinite recursion problem of PyErr_NormalizeException() was being fixed by changeset cd125fe83051 as a side effect, went unnoticed.

[1] http://svn.python.org/view?view=revision&revision=58032
[2] http://bugs.python.org/issue1202533
[3] https://hg.python.org/cpython/rev/cd125fe83051
msg231933 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2014-12-01 11:20
When tstate->overflowed is already set to 1 before entering PyErr_NormalizeException() to normalize an exception, the following cases may occur:
  1) Normalizing a built-in exception => instantiation ok.
  2) Normalizing a python exception that fails with a built-in exception => next recursion of PyErr_NormalizeException() ok.
  3) Normalizing a python exception that fails with a python exception that fails with a python exception and so on infinitely...
     => PyObject_Call() never returns and the interpreter aborts with a fatal error when the high warter mark is exceeded, the infinite recursion is in PyObject_Call().
  4) Normalizing a python exception defined in an extension module and the instantiation returns NULL and sets the same exception:
      a) Without any patch, we get a segfault caused by another bug in PyErr_NormalizeException() at Py_DECREF(*val), just before setting val to PyExc_RecursionErrorInst.
         This is fixed by changing Py_DECREF(*val) to Py_XDECREF(*val).
         With the above fix, we get the same abort as the one caused by runtimerror_singleton_2.py, so this is another reproducer of the current issue.
      b) The test is ok with patch warn_5.patch, and the above fix.
      c) With patch remove_singleton.patch the interpreter aborts with a fatal error when the high warter mark is exceeded, the infinite recursion is in PyErr_NormalizeException().

Cases 3) and 4) can be tested with runtimerror_singleton_3.py (install mymodule with setup.py for all three test cases in 4).

remove_singleton.patch introduces a regression in case c), but IMHO the abort in case c) is consistent with the abort in case 3), they
are both related to a more general problem involving the low/high water mark heuristic and described by Antoine in [1].

[1] http://thread.gmane.org/gmane.comp.python.devel/97016
msg262356 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-24 15:20
I tried the following script on Python 3.5 and Python 3.6 and I failed to reproduce the bug:
---
import sys, traceback

class MyException(Exception):
    def __init__(self, *args):
        1/0

def gen():
    f = open(__file__, mode='rb', buffering=0)
    yield

g = gen()
next(g)
recursionlimit = sys.getrecursionlimit()
sys.setrecursionlimit(len(traceback.extract_stack())+3)
try:
    g.throw(MyException)
finally:
    sys.setrecursionlimit(recursionlimit)
    print('Done.')
---

Note: I had to add "+3" to the sys.setrecursionlimit() call, otherwise the limit is too low and you get a RecursionError (it's a recent bugfix, issue #25274).

Can somone else please confirm that the bug is fixed?
msg262372 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2016-03-24 19:33
When tested with runtimerror_singleton_3.py (see msg 231933 above), the latest Python 3.6.0a0 (default:3eec7bcc14a4, Mar 24 2016, 20:16:19) still crashes:

$ python runtimerror_singleton_3.py
Importing mymodule.
Traceback (most recent call last):
  File "runtimerror_singleton_3.py", line 26, in <module>
    foo()
  File "runtimerror_singleton_3.py", line 23, in foo
    g.throw(MyException)    # Entering PyErr_NormalizeException()
  File "runtimerror_singleton_3.py", line 14, in gen
    yield
RecursionError: maximum recursion depth exceeded
Segmentation fault (core dumped)
msg262473 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-26 00:10
warn_5.patch: The patch cannot be reviewed on Rietveld :-( You must not use the git format for diff.

warn_5.patch: "if (globals == NULL) { (...) return 0; }"

It looks like filename is not initialized. I suggest to use:

   *filename = f->f_code->co_filename;

It looks like you have to add:

   if (PyUnicode_Check(*filename)) *filename = NULL;

To mimick the code below.
msg262494 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2016-03-26 17:57
Victor,

With warn_5.patch *filename is not set when globals is NULL: setup_context() returns 0, and so do_warn() returns NULL without calling warn_explicit().

This is different from your initial warn.patch where setup_context() returns 1 in that case and an attempt is made to issue the warning.
msg294911 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-01 07:27
The issue #17852 is still alive and has a reference to this issue. It would be nice to rebase the latest patch on master and create a PR ;-)
msg295331 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-06-07 11:49
With PR 1981, a ResourceWarning is printed when a RecursionError occurs while normalizing another exception and its traceback holds a reference to a non-closed file object.

For information, issue 5437 removed the MemoryError singleton for the same reasons as PR 1981 does.
msg295628 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-06-10 10:08
Antoine asked in PR 1981:
> Did you verify the removed code here wasn't needed anymore?

Just checked that crasher infinite_rec_2.py (removed by 1e534b5) does not crash with PR 1981. The other crashers listed at 1e534b5 are not valid Python 3.7 code.  Does anyone know how to translate them into Python 3.7 ?

With PR 1981 infinite recursion does not occur in PyErr_NormalizeException() when the tstate->overflowed flag is false upon entering this function and:
    * either (obviously) the normalizing of this exception does not fail
    * or the normalizing of this exception fails with an exception whose normalization won't fail (for example a RecursionError).

Removing the PyExc_RecursionErrorInst singleton decreases the cases covered by the recursion checks because the test made upon using PyExc_RecursionErrorInst (in the 'finally' label of PyErr_NormalizeException()) has the side effect of adding another recursion check to the normal recursion machinery of _Py_CheckRecursiveCall().  Those are corner cases though, such as for example the following case that will abort instead now with PR 1981 [1]:
    * tstate->overflowed has been set to true outside PyErr_NormalizeException() and the corresponding RecursionError has been discarded
    * PyErr_NormalizeException() attempts normalizing a python exception that raises a python exception that raises ... (and so on indefinitely)
IMO it is ok to abort in such cases. As Brett wrote 9 years ago in https://mail.python.org/pipermail/python-dev/2008-August/082107.html:
    "I have always viewed the check as a bonus sanity check, but not something to heavily rely upon."
One can also note that this other recursion check added with the use of PyExc_RecursionErrorInst does not respect the tstate->overflowed flag so that it adds another level of complexity to the recursion machinery.

[1] But with PR 1981, a RecursionError is raised when replacing MyException in test_recursion_normalizing_exception() at Lib/test/test_exceptions.py with:

      class MyException(Exception):
          def __init__(self):
              raise MyException
msg295707 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-11 12:46
The fact that the traceback of PyExc_RecursionErrorInst causes an issue means that PyExc_RecursionErrorInst is used. We can't just remove PyExc_RecursionErrorInst since this can cause a stack overflow or, with merged PR 2035, an infinite loop.

Perhaps the solution of this issue is clearing __traceback__, __cause__ and __context__ attributes of PyExc_RecursionErrorInst as early as possible.
msg295719 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-11 15:24
The simplest solution -- make BaseException_set_tb(), BaseException_set_context() and BaseException_set_cause() no-ops for PyExc_RecursionErrorInst.
msg295940 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-06-13 17:09
> We can't just remove PyExc_RecursionErrorInst since this can cause a stack overflow

Please give an example where a stack overflow occurs when PyExc_RecursionErrorInst has been removed.
msg305067 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-10-26 15:54
Fixed by issue 30697.
History
Date User Action Args
2017-10-26 15:54:05xdegayesetstatus: open -> closed
versions: - Python 3.5
messages: + msg305067

resolution: out of date
stage: patch review -> resolved
2017-06-13 17:09:45xdegayesetmessages: + msg295940
2017-06-11 15:24:54serhiy.storchakasetfiles: + immutable-recursion-error-inst.diff

messages: + msg295719
2017-06-11 12:46:44serhiy.storchakasetmessages: + msg295707
2017-06-10 10:11:03xdegayesetnosy: + brett.cannon
2017-06-10 10:08:43xdegayesetmessages: + msg295628
2017-06-07 11:49:30xdegayesetmessages: + msg295331
versions: + Python 3.6, Python 3.7, - Python 3.4
2017-06-07 11:37:05xdegayesetpull_requests: + pull_request2049
2017-06-01 07:27:48vstinnersetmessages: + msg294911
2016-03-26 17:57:38xdegayesetmessages: + msg262494
2016-03-26 00:10:45vstinnersetmessages: + msg262473
2016-03-24 19:33:08xdegayesetmessages: + msg262372
2016-03-24 15:20:41vstinnersetmessages: + msg262356
2014-12-01 11:21:08xdegayesetfiles: + setup.py
2014-12-01 11:20:46xdegayesetfiles: + mymodule.c
2014-12-01 11:20:30xdegayesetfiles: + runtimerror_singleton_3.py

messages: + msg231933
2014-11-29 13:44:54xdegayesetfiles: + remove_singleton.patch

messages: + msg231860
2014-11-26 20:55:18xdegayesetfiles: + warn_5.patch

messages: + msg231729
2014-11-26 16:43:11emptysquaresetmessages: + msg231721
2014-11-26 15:50:51serhiy.storchakasetmessages: + msg231717
2014-11-26 13:57:53xdegayesetfiles: + warn_4.patch

messages: + msg231710
2014-11-26 12:08:19serhiy.storchakasetstage: patch review
messages: + msg231706
versions: + Python 3.5
2014-11-26 11:33:08xdegayesetfiles: + warn_3.patch

messages: + msg231705
2014-11-25 21:04:00vstinnersetmessages: + msg231686
2014-11-25 20:51:56xdegayesetfiles: + warn_2.patch
2014-11-25 20:51:44xdegayesetfiles: + runtimerror_singleton_2.py
nosy: + xdegaye
messages: + msg231683

2014-11-19 16:32:51vstinnersetfiles: + warn.patch
keywords: + patch
messages: + msg231390
2014-11-19 16:32:13vstinnersetfiles: + runtimerror_singleton.py

messages: + msg231389
2014-11-19 15:38:37vstinnersetnosy: + vstinner
2014-11-19 08:20:17serhiy.storchakasetnosy: + serhiy.storchaka, pitrou
messages: + msg231360
2014-11-18 21:37:30emptysquaresetmessages: + msg231347
2014-11-18 21:26:16emptysquarecreate