Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault during shutdown attempting to log ResourceWarning #67087

Closed
ajdavis opened this issue Nov 18, 2014 · 26 comments
Closed

segfault during shutdown attempting to log ResourceWarning #67087

ajdavis opened this issue Nov 18, 2014 · 26 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@ajdavis
Copy link
Contributor

ajdavis commented Nov 18, 2014

BPO 22898
Nosy @brettcannon, @pitrou, @vstinner, @xdegaye, @serhiy-storchaka, @ajdavis
PRs
  • bpo-22898: Singleton RecursionError may hold a traceback in _PyExc_Fi… #1981
  • Files
  • dump.txt: Core dump
  • runtimerror_singleton.py
  • warn.patch
  • runtimerror_singleton_2.py
  • warn_2.patch
  • warn_3.patch
  • warn_4.patch
  • warn_5.patch
  • remove_singleton.patch
  • runtimerror_singleton_3.py
  • mymodule.c
  • setup.py
  • immutable-recursion-error-inst.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-10-26.15:54:05.986>
    created_at = <Date 2014-11-18.21:26:16.533>
    labels = ['interpreter-core', '3.7', 'type-crash']
    title = 'segfault during shutdown attempting to log ResourceWarning'
    updated_at = <Date 2017-10-26.15:54:05.984>
    user = 'https://github.com/ajdavis'

    bugs.python.org fields:

    activity = <Date 2017-10-26.15:54:05.984>
    actor = 'xdegaye'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-10-26.15:54:05.986>
    closer = 'xdegaye'
    components = ['Interpreter Core']
    creation = <Date 2014-11-18.21:26:16.533>
    creator = 'emptysquare'
    dependencies = []
    files = ['37224', '37229', '37230', '37278', '37279', '37286', '37288', '37289', '37316', '37332', '37333', '37334', '46940']
    hgrepos = []
    issue_num = 22898
    keywords = ['patch']
    message_count = 26.0
    messages = ['231345', '231347', '231360', '231389', '231390', '231683', '231686', '231705', '231706', '231710', '231717', '231721', '231729', '231860', '231933', '262356', '262372', '262473', '262494', '294911', '295331', '295628', '295707', '295719', '295940', '305067']
    nosy_count = 6.0
    nosy_names = ['brett.cannon', 'pitrou', 'vstinner', 'xdegaye', 'serhiy.storchaka', 'emptysquare']
    pr_nums = ['1981']
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue22898'
    versions = ['Python 3.6', 'Python 3.7']

    @ajdavis
    Copy link
    Contributor Author

    ajdavis commented Nov 18, 2014

    Running a unittest suite for Motor, my MongoDB driver which uses PyMongo, greenlet, and Tornado. The suite commonly segfaults during interpreter shutdown. I've reproduced this crash with Python 3.3.5, 3.4.1, and 3.4.2. Python 2.6 and 2.7 do *not* crash. The Python interpreters are all built like:

    ./configure --prefix=/mnt/jenkins/languages/python/rX.Y.Z --enable-shared && make && make install

    This is Amazon Linux AMI release 2014.09.

    The unittest suite's final output is:

    ----------------------------------------------------------------------
    Ran 15 tests in 265.947s

    OK
    Segmentation fault (core dumped)

    Backtrace from a Python 3.4.2 coredump attached.

    @ajdavis ajdavis added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Nov 18, 2014
    @ajdavis
    Copy link
    Contributor Author

    ajdavis commented Nov 18, 2014

    The crash can ignore whether or not I specify "-Wignore" on the python command line. I was hoping to avoid the crash by short-circuiting the ResourceWarning code path, since the following line appears in the backtrace:

    #5 PyErr_WarnFormat (category=<optimized out>, stack_level=stack_level@entry=1, format=format@entry=0x7f5f1ca8b377 "unclosed file %R") at Python/_warnings.c:813

    But "-Wignore" has no effect.

    @serhiy-storchaka
    Copy link
    Member

    Looks as globals in setup_context() is NULL. I suppose it is PyThreadState_Get()->interp->sysdict. interp->sysdict is cleared in PyInterpreterState_Clear(). Other code is executed after setting interp->sysdict to NULL (clearing interp->sysdict content, interp->builtins, interp->builtins_copy and interp->importlib) and this potentially can emit warnings.

    @vstinner
    Copy link
    Member

    It looks like the problem is that raising the PyExc_RecursionErrorInst singleton creates a traceback object which contains frames. The singleton keeps the frames alive longer than expected.

    I tried to write a script to raise this singleton, but it looks like the local variables of the frames are not deleted, even if frames are deleted (by _PyExc_Fini).

    You may try to finish my attached runtimerror_singleton.py script.

    @vstinner
    Copy link
    Member

    Oh, I also wrote a draft of patch fixing the issue, but I was unable to reproduce the issue. See attached warn.patch (not tested).

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Nov 25, 2014

    The attached script raises the PyExc_RecursionErrorInst singleton and reproduces the issue.
    The attached patch fixes the issue by ignoring the warning when clearing PyExc_RecursionErrorInst and clearing the frames associated with its traceback, in _PyExc_Fini().

    @vstinner
    Copy link
    Member

    + /* during Python finalization, warnings may be emited after interp->sysdict
    + is cleared: see issue bpo-22898 */

    I would prefer to see this comment in the else block.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Nov 26, 2014

    • /* during Python finalization, warnings may be emited after interp->sysdict
    •   is cleared: see issue bpo-22898 \*/
      

    I would prefer to see this comment in the else block.

    Indeed.
    New updated patch attached.

    @serhiy-storchaka
    Copy link
    Member

    Why recursion limit is restored? Couldn't the test be simpler without it?

    %a should be used instead of '%s' (file path can contain backslashes).

    And it would be more robust to open file in binary mode (may be even in non-buffered). It can contain non-ascii characters.

    May be the test should be marked as CPython only.

    To check that script is executed at all we should print something from it and than test the out. Otherwise syntax error in script will make all test passed.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Nov 26, 2014

    Why recursion limit is restored? Couldn't the test be simpler without it?

    For the sake of explicitness, so that the interpreter will not raise a RuntimeError during finalization when checking for the recursion limit after g.throw(MyException) has raised PyExc_RecursionErrorInst.

    %a should be used instead of '%s' (file path can contain backslashes).
    And it would be more robust to open file in binary mode (may be even in non-buffered). It can contain non-ascii characters.
    May be the test should be marked as CPython only.
    To check that script is executed at all we should print something from it and than test the out. Otherwise syntax error in script will make all test passed.

    Thanks.
    New patch attached.

    The reason why the PyExc_RecursionErrorInst singleton keeps the frames alive longer than expected is that the reference count of the PyExc_RecursionErrorInst static variable never reaches zero until _PyExc_Fini(). So decrementing the reference count of this exception after the traceback has been printed in PyErr_PrintEx() does not decrement the reference count of its traceback attribute (as it is the case with the other exceptions) and the traceback is not freed. The following patch to PyErr_PrintEx() does that. With this new patch and without the changes made by warn_4.patch, the interpreter does not crash with the runtimerror_singleton_2.py reproducer and the ResourceWarning is now printed instead of being ignored as with the warn_4.patch:

    diff --git a/Python/pythonrun.c b/Python/pythonrun.c
    --- a/Python/pythonrun.c
    +++ b/Python/pythonrun.c
    @@ -1876,6 +1876,8 @@
             PyErr_Display(exception, v, tb);
         }
         Py_XDECREF(exception);
    +    if (v == PyExc_RecursionErrorInst)
    +        Py_CLEAR(((PyBaseExceptionObject *)v)->traceback);
         Py_XDECREF(v);
         Py_XDECREF(tb);
     }

    If both patches were to be included, the test case in warn_4.patch would test the above patch and not the changes made in Python/_warnings.c.

    @serhiy-storchaka
    Copy link
    Member

    out can be b'Done.\r\n'. Use self.assertIn.

    If both patches were to be included, the test case in warn_4.patch would test the above patch and not the changes made in Python/_warnings.c.

    You can test err for warning message.

    The traceback should be cleared before decrementing the reference count. And only if Py_REFCNT(v) is 2.

    @ajdavis
    Copy link
    Contributor Author

    ajdavis commented Nov 26, 2014

    With warn_4.patch applied I can no longer reproduce my original segfault, looks like the fix works.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Nov 26, 2014

    out can be b'Done.\r\n'. Use self.assertIn.

    Ok, new patch attached.

    > If both patches were to be included, the test case in warn_4.patch would test the above patch and not the changes made in Python/_warnings.c.
    You can test err for warning message.

    In the case where PyExc_RecursionErrorInst would not leak frames, the code path followed by the test case would not run any of the changes made in _warnings.c.

    The traceback should be cleared before decrementing the reference count. And only if Py_REFCNT(v) is 2.

    I believe that attempting to fix the frames leak by clearing the traceback implies the following changes:
    * The previous patch to PyErr_PrintEx()
    * v->context and v->cause should also be tested against equality with PyExc_RecursionErrorInst.
    * In PyErr_PrintEx() the variable v2 may also be PyExc_RecursionErrorInst.
    * The sames change should also be done when freeing the exception value at least:
    in PyErr_WriteUnraisable() called when an exception occurs during finalization
    in PyErr_Restore()
    [1] when sys.last_value is cleared in PyImport_Cleanup()
    And that would miss the case [1] where sys.last_value is set to None by some python user code.

    Note [1]:
    Unless it is acceptable to clear the PyExc_RecursionErrorInst traceback even when sys.last_value has been set (then Py_REFCNT(v) is 3 instead of 2).

    Not sure if this is worth the trouble.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Nov 29, 2014

    Out of curiosity I have tried to figure out how to build another test case using the model provided by runtimerror_singleton.py. This cannot be done, and for the following reasons:

    The infinite recursion of PyErr_NormalizeException() is supposed to occur as follows: when a RuntimeError caused by recursion is normalized, PyErr_NormalizeException() calls the RuntimeError class to instantiate the exception, the recursion limit is reached again, triggering a new RuntimeError that needs also to be normalized causing PyErr_NormalizeException() to recurse infinitely.

    But the low/high water mark level heuristic of the anti-recursion protection mechanism described in a comment of ceval.h prevents this. Let's assume the infinite recursion is possible:

    • At iteration 'n' of the infinite recursion, the instantiation of the RuntimeError exception fails because of recursion with a new RuntimeError and tstate->overflowed is true: PyErr_NormalizeException() recurses.
    • At iteration 'n + 1', the instantiation of this new RuntimeError is successfull because the recursion level is not checked when tstate->overflowed is true: the recursion of PyErr_NormalizeException() terminates and infinite recursion is not possible.

    This explains the paradox that, if you remove entirely the check against infinite recursion in PyErr_NormalizeException(), then the runtimerror_singleton_2.py reproducer does not crash and the ResourceWarning is printed even though the recursion limit has been reached.

    The attached patch implements this fix, includes the previous changes in _warning.c, and moves the test case to test_exceptions.

    History (for reference):
    The PyExc_RecursionErrorInst singleton was added by svn revision 58032 [1] to fix the issue titled "a bunch of infinite C recursions" [2].
    In parallel, changeset cd125fe83051 [3] added the 'overflowed' member to the thread state.
    Interestingly changeset cd125fe83051 was committed before revision 58032, but the whole discussion on issue [2] took place well before this commit was done, and so the fact that the infinite recursion problem of PyErr_NormalizeException() was being fixed by changeset cd125fe83051 as a side effect, went unnoticed.

    [1] http://svn.python.org/view?view=revision&revision=58032
    [2] http://bugs.python.org/issue1202533
    [3] https://hg.python.org/cpython/rev/cd125fe83051

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Dec 1, 2014

    When tstate->overflowed is already set to 1 before entering PyErr_NormalizeException() to normalize an exception, the following cases may occur:

    1. Normalizing a built-in exception => instantiation ok.
    2. Normalizing a python exception that fails with a built-in exception => next recursion of PyErr_NormalizeException() ok.
    3. Normalizing a python exception that fails with a python exception that fails with a python exception and so on infinitely...
      => PyObject_Call() never returns and the interpreter aborts with a fatal error when the high warter mark is exceeded, the infinite recursion is in PyObject_Call().
    4. Normalizing a python exception defined in an extension module and the instantiation returns NULL and sets the same exception:
      a) Without any patch, we get a segfault caused by another bug in PyErr_NormalizeException() at Py_DECREF(*val), just before setting val to PyExc_RecursionErrorInst.
      This is fixed by changing Py_DECREF(*val) to Py_XDECREF(*val).
      With the above fix, we get the same abort as the one caused by runtimerror_singleton_2.py, so this is another reproducer of the current issue.
      b) The test is ok with patch warn_5.patch, and the above fix.
      c) With patch remove_singleton.patch the interpreter aborts with a fatal error when the high warter mark is exceeded, the infinite recursion is in PyErr_NormalizeException().

    Cases 3) and 4) can be tested with runtimerror_singleton_3.py (install mymodule with setup.py for all three test cases in 4).

    remove_singleton.patch introduces a regression in case c), but IMHO the abort in case c) is consistent with the abort in case 3), they
    are both related to a more general problem involving the low/high water mark heuristic and described by Antoine in [1].

    [1] http://thread.gmane.org/gmane.comp.python.devel/97016

    @vstinner
    Copy link
    Member

    I tried the following script on Python 3.5 and Python 3.6 and I failed to reproduce the bug:
    ---

    import sys, traceback
    
    class MyException(Exception):
        def __init__(self, *args):
            1/0
    
    def gen():
        f = open(__file__, mode='rb', buffering=0)
        yield
    
    g = gen()
    next(g)
    recursionlimit = sys.getrecursionlimit()
    sys.setrecursionlimit(len(traceback.extract_stack())+3)
    try:
        g.throw(MyException)
    finally:
        sys.setrecursionlimit(recursionlimit)
        print('Done.')

    Note: I had to add "+3" to the sys.setrecursionlimit() call, otherwise the limit is too low and you get a RecursionError (it's a recent bugfix, issue bpo-25274).

    Can somone else please confirm that the bug is fixed?

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Mar 24, 2016

    When tested with runtimerror_singleton_3.py (see msg 231933 above), the latest Python 3.6.0a0 (default:3eec7bcc14a4, Mar 24 2016, 20:16:19) still crashes:

    $ python runtimerror_singleton_3.py
    Importing mymodule.
    Traceback (most recent call last):
      File "runtimerror_singleton_3.py", line 26, in <module>
        foo()
      File "runtimerror_singleton_3.py", line 23, in foo
        g.throw(MyException)    # Entering PyErr_NormalizeException()
      File "runtimerror_singleton_3.py", line 14, in gen
        yield
    RecursionError: maximum recursion depth exceeded
    Segmentation fault (core dumped)

    @vstinner
    Copy link
    Member

    warn_5.patch: The patch cannot be reviewed on Rietveld :-( You must not use the git format for diff.

    warn_5.patch: "if (globals == NULL) { (...) return 0; }"

    It looks like filename is not initialized. I suggest to use:

    *filename = f->f_code->co_filename;

    It looks like you have to add:

    if (PyUnicode_Check(*filename)) *filename = NULL;

    To mimick the code below.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Mar 26, 2016

    Victor,

    With warn_5.patch *filename is not set when globals is NULL: setup_context() returns 0, and so do_warn() returns NULL without calling warn_explicit().

    This is different from your initial warn.patch where setup_context() returns 1 in that case and an attempt is made to issue the warning.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2017

    The issue bpo-17852 is still alive and has a reference to this issue. It would be nice to rebase the latest patch on master and create a PR ;-)

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Jun 7, 2017

    With PR 1981, a ResourceWarning is printed when a RecursionError occurs while normalizing another exception and its traceback holds a reference to a non-closed file object.

    For information, bpo-5437 removed the MemoryError singleton for the same reasons as PR 1981 does.

    @xdegaye xdegaye mannequin added the 3.7 (EOL) end of life label Jun 7, 2017
    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Jun 10, 2017

    Antoine asked in PR 1981:

    Did you verify the removed code here wasn't needed anymore?

    Just checked that crasher infinite_rec_2.py (removed by 1e534b5) does not crash with PR 1981. The other crashers listed at 1e534b5 are not valid Python 3.7 code. Does anyone know how to translate them into Python 3.7 ?

    With PR 1981 infinite recursion does not occur in PyErr_NormalizeException() when the tstate->overflowed flag is false upon entering this function and:
    * either (obviously) the normalizing of this exception does not fail
    * or the normalizing of this exception fails with an exception whose normalization won't fail (for example a RecursionError).

    Removing the PyExc_RecursionErrorInst singleton decreases the cases covered by the recursion checks because the test made upon using PyExc_RecursionErrorInst (in the 'finally' label of PyErr_NormalizeException()) has the side effect of adding another recursion check to the normal recursion machinery of _Py_CheckRecursiveCall(). Those are corner cases though, such as for example the following case that will abort instead now with PR 1981 [1]:
    * tstate->overflowed has been set to true outside PyErr_NormalizeException() and the corresponding RecursionError has been discarded
    * PyErr_NormalizeException() attempts normalizing a python exception that raises a python exception that raises ... (and so on indefinitely)
    IMO it is ok to abort in such cases. As Brett wrote 9 years ago in https://mail.python.org/pipermail/python-dev/2008-August/082107.html:
    "I have always viewed the check as a bonus sanity check, but not something to heavily rely upon."
    One can also note that this other recursion check added with the use of PyExc_RecursionErrorInst does not respect the tstate->overflowed flag so that it adds another level of complexity to the recursion machinery.

    [1] But with PR 1981, a RecursionError is raised when replacing MyException in test_recursion_normalizing_exception() at Lib/test/test_exceptions.py with:

          class MyException(Exception):
              def __init__(self):
                  raise MyException

    @serhiy-storchaka
    Copy link
    Member

    The fact that the traceback of PyExc_RecursionErrorInst causes an issue means that PyExc_RecursionErrorInst is used. We can't just remove PyExc_RecursionErrorInst since this can cause a stack overflow or, with merged PR 2035, an infinite loop.

    Perhaps the solution of this issue is clearing __traceback__, __cause__ and __context__ attributes of PyExc_RecursionErrorInst as early as possible.

    @serhiy-storchaka
    Copy link
    Member

    The simplest solution -- make BaseException_set_tb(), BaseException_set_context() and BaseException_set_cause() no-ops for PyExc_RecursionErrorInst.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Jun 13, 2017

    We can't just remove PyExc_RecursionErrorInst since this can cause a stack overflow

    Please give an example where a stack overflow occurs when PyExc_RecursionErrorInst has been removed.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Oct 26, 2017

    Fixed by bpo-30697.

    @xdegaye xdegaye mannequin closed this as completed Oct 26, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants