Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generator/coroutine 'throw' discards exc_info state, which is bad #73773

Closed
njsmith opened this issue Feb 17, 2017 · 30 comments
Closed

Generator/coroutine 'throw' discards exc_info state, which is bad #73773

njsmith opened this issue Feb 17, 2017 · 30 comments
Labels
3.9 only security fixes type-bug An unexpected behavior, bug, or error

Comments

@njsmith
Copy link
Contributor

njsmith commented Feb 17, 2017

BPO 29587
Nosy @gvanrossum, @vstinner, @ned-deily, @njsmith, @cjerdonek, @hynek, @vadmium, @1st1
PRs
  • bpo-29587: Enable implicit exception chaining with gen.throw() #19811
  • Revert "bpo-29587: Enable implicit exception chaining with gen.throw()" #19821
  • bpo-29587: Enable implicit exception chaining with gen.throw() #19823
  • bpo-29587: Add another test for the gen.throw() exception chain fix #19859
  • bpo-29587: Remove the exc_value NULL check in _gen_throw() #19877
  • bpo-29587: PyErr_SetObject() rejects exception=None #19902
  • bpo-29587: Enable exception chaining for gen.throw() with "yield from" #19858
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-05-02.01:26:31.061>
    created_at = <Date 2017-02-17.11:01:49.760>
    labels = ['type-bug', '3.9']
    title = "Generator/coroutine 'throw' discards exc_info state, which is bad"
    updated_at = <Date 2020-05-17.04:14:59.324>
    user = 'https://github.com/njsmith'

    bugs.python.org fields:

    activity = <Date 2020-05-17.04:14:59.324>
    actor = 'chris.jerdonek'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-05-02.01:26:31.061>
    closer = 'chris.jerdonek'
    components = []
    creation = <Date 2017-02-17.11:01:49.760>
    creator = 'njs'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 29587
    keywords = ['patch']
    message_count = 30.0
    messages = ['287987', '287992', '306100', '306158', '367747', '367773', '367774', '367776', '367783', '367784', '367786', '367787', '367788', '367789', '367792', '367827', '367833', '367905', '367907', '367908', '367924', '367926', '367954', '367956', '368013', '368053', '368164', '368617', '368806', '369093']
    nosy_count = 8.0
    nosy_names = ['gvanrossum', 'vstinner', 'ned.deily', 'njs', 'chris.jerdonek', 'hynek', 'martin.panter', 'yselivanov']
    pr_nums = ['19811', '19821', '19823', '19859', '19877', '19902', '19858']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue29587'
    versions = ['Python 3.9']

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Feb 17, 2017

    Example 1:

    -----------

    def f():
        try:
            raise KeyError
        except Exception:
            yield
    
    gen = f()
    gen.send(None)
    gen.throw(ValueError)

    Output:
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 5, in f
    ValueError

    Expected output:

    Something involving the string "During handling of the above exception, another exception occurred", and a traceback for the original KeyError.

    Example 2:

    -----------

    import sys
    
    def f():
        try:
            raise KeyError
        except Exception:
            print(sys.exc_info())
            try:
                yield
            except Exception:
                pass
            print(sys.exc_info())
            raise
    
    gen = f()
    gen.send(None)
    gen.throw(ValueError)

    Output:

    (<class 'KeyError'>, KeyError(), <traceback object at 0x7f67ce3c3f88>)
    (None, None, None)
    Traceback (most recent call last):
      File "/tmp/foo.py", line 17, in <module>
        gen.throw(ValueError)
      File "/tmp/foo.py", line 13, in f
        raise
    RuntimeError: No active exception to reraise

    Expected output: certainly not that :-)

    This seems to happen because normally, generators save the current exc_info when yielding, and then restore it when re-entered. But, if we re-enter through 'throw' (throwflag is true), this is disabled:

    if (!throwflag && f->f_exc_type != NULL && f->f_exc_type != Py_None) {

    This check seems to have been added in ae5f2f4 as a fix for:

    https://bugs.python.org/issue7173

    which had to do with a nasty situation involving a generator object that was part of a reference cycle: the gc sometimes would free the objects stored in the generator's saved exc_info, and then try to clean up the generator by throwing in a GeneratorExit.

    AFAICT this situation shouldn't be possible anymore thanks to PEP-442, which makes it so that finalizers are run before any part of the cycle is freed. And in any case it certainly doesn't justify breaking code like the examples above.

    (Note: the examples use generators for simplicity, but of course the way I noticed this was that I had some async/await code where exceptions were mysteriously disappearing instead of showing up in __context__ and couldn't figure out why. It's likely that more people will run into this in the future as async/await becomes more widely used. As a workaround for now I'll probably modify my coroutine runner so that it never uses 'throw'.)

    @njsmith njsmith added the 3.7 (EOL) end of life label Feb 17, 2017
    @vadmium
    Copy link
    Member

    vadmium commented Feb 17, 2017

    The second example seems like the original complaint in bpo-25612. That spawned a separate bug related to the first situation, which was later closed: bpo-25683.

    @vadmium vadmium added the type-bug An unexpected behavior, bug, or error label Feb 17, 2017
    @gvanrossum
    Copy link
    Member

    Yury do you agree this is a bug? Do you have any ideas on how to fix it?

    @1st1
    Copy link
    Member

    1st1 commented Nov 13, 2017

    Guido,

    The second example ("No active exception to reraise") was an actual long open bug. It was recently fixed by Mark Shannon in [1].

    I think we should be able to fix the first case (missing __context__) although it's not critical. I'll look into it after the context PEP and few other open asyncio issues.

    [1] #1773

    @cjerdonek
    Copy link
    Member

    I made a naive attempt at addressing this issue here:
    #19811

    The code has diverged significantly since Nathaniel first linked to a specific line above. However, what I came up with is simple, and seems to work. But I could very well be missing some things.

    It would be great if this could be fixed in some way.

    @cjerdonek cjerdonek added 3.8 only security fixes 3.9 only security fixes labels Apr 30, 2020
    @gvanrossum
    Copy link
    Member

    New changeset 2514a63 by Chris Jerdonek in branch 'master':
    bpo-29587: Enable implicit exception chaining with gen.throw() (GH-19811)
    2514a63

    @gvanrossum
    Copy link
    Member

    Example 1 is now fixed too by Chris Jerdonek's PR 19811. Thanks Chris!

    @gvanrossum
    Copy link
    Member

    Oh, no! Buildbot failures. Help?!

    @vstinner
    Copy link
    Member

    New changeset 3c7f9db by Victor Stinner in branch 'master':
    Revert "bpo-29587: Enable implicit exception chaining with gen.throw() (GH-19811)" (bpo-19821)
    3c7f9db

    @gvanrossum gvanrossum reopened this Apr 30, 2020
    @vstinner
    Copy link
    Member

    bpo-29587: Enable implicit exception chaining with gen.throw() (GH-19811)
    2514a63

    Sorry, I had to revert this change since it broke like 41+ buildbots:
    https://pythondev.readthedocs.io/ci.html#revert-on-fail

    Almost all CIs passed on the PR (except of the Ubuntu job of Azure Pipelines). That's unusual. No idea why the bug occurs only on some platforms and why it wasn't seen before.

    The revert is an opportunity to get more time to investigate the issue, without having the pressure to have to fix the CI "ASAP".

    @vstinner
    Copy link
    Member

    Sorry, I had to revert this change since it broke like 41+ buildbots: (...)

    If someone wants to investigate, you can find examples of failed buildbot builds at:
    #19811

    @cjerdonek
    Copy link
    Member

    Oh, no! Buildbot failures. Help?!

    Yes, I'm sorry. There's something not so simple going on that I haven't yet reproduced locally. Will reply on the tracker.

    @ned-deily
    Copy link
    Member

    Whatever the resolution of this is, it seems to me that this sort of behavior change should not be introduced at this stage of 3.7.x's life. Whether it should go into 3.8.x should be Łukasz's call once the final change is in master and has stabilized.

    @ned-deily ned-deily removed 3.7 (EOL) end of life 3.8 only security fixes labels Apr 30, 2020
    @1st1
    Copy link
    Member

    1st1 commented Apr 30, 2020

    IMO this is a 3.9 fix.

    @gvanrossum
    Copy link
    Member

    Could it be running out of memory due to excessive chaining of tracebacks?

    (And yes, it's 3.9 only.)

    @cjerdonek
    Copy link
    Member

    Okay, I was able to get the tests passing on the other platforms.

    I'm still not sure why Mac and Fedora behaved differently with my original PR. I was able to come up with a simple script that isolated the behavior difference (posted in the PR comments). It's very strange. Maybe it signals an issue elsewhere in the Python code base.

    @cjerdonek
    Copy link
    Member

    By the way, I just posted a new issue about exception chaining getting suppressed, but this time in an asyncio setting with ensure_future():
    https://bugs.python.org/issue40466

    I first noticed this issue two or three years ago and raised it on the async-sig list. It was suggested there that this was a special case of the current issue. Having written code to fix the current issue, though, the ensure_future() issue still exists. So I filed a separate issue.

    @cjerdonek
    Copy link
    Member

    New changeset 0204726 by Chris Jerdonek in branch 'master':
    bpo-29587: Update gen.throw() to chain exceptions (bpo-19823)
    0204726

    @cjerdonek
    Copy link
    Member

    Okay, thanks everyone. I think I'll take a look at bpo-40466 next.

    @gvanrossum
    Copy link
    Member

    Wow, thanks all! FWIW I agree that ideally things should work with a NULL
    value...

    --Guido (mobile)

    @cjerdonek
    Copy link
    Member

    FYI, I just created a PR to add one more test to the prior fix, to test a slightly different aspect.

    Whereas the test in the first PR checks that exc.__context__ is set after gen.throw() is finished, the new test checks that exc.__context__ is available also from within the generator (while the generator is running). I'm adding this test partly because this behavior is different from the "awaiting on a task" case, which I'm working on next.

    Both of these tests were failing prior to the fix, which I confirmed.

    @cjerdonek
    Copy link
    Member

    FYI, I was able to finish addressing the other exception-chaining cases (the ones covered by the other issue) with a simple change to the fix for this PR.

    I moved the call to _PyErr_ChainExceptions() a bit deeper: into gen_send_ex() before the call to _PyEval_EvalFrame(), and behind an if (exc) check.

    So I think the tracebacks should improve a lot.

    @cjerdonek
    Copy link
    Member

    Okay, I was able to remove the NULL value check and get things working with a NULL exception value. I just posted a second follow-on PR that does this (which Guido approved - thanks, Guido!):
    #19877

    I wanted to tell you what I found since it raises some questions.

    To remove that check I had to add the following new check prior to calling _PyErr_ChainExceptions() with the exception info, as a replacement:
    gen->gi_exc_state.exc_type != Py_None
    Without doing this, code like the following would crash e.g. on Fedora 32 (this is the crash that was happening in the first version of my PR, reduced down):

        def g():
            try:
                raise KeyError
            except KeyError:
                pass
        try:
            yield
        except Exception:
            # Crash happens here e.g. on Fedora 32 but not Mac.
            raise RuntimeError
    
        gen = g()
        gen.send(None)
        gen.throw(ValueError)

    This raises two questions for me:

    First, I thought exc_type could only ever be NULL or an exception class. So I'm wondering if this points to a problem elsewhere in the code base.

    Second, I don't know why it would crash on Fedora but not Mac. (On Mac, you instead see the following exception chained beneath the ValueError:

    TypeError: print_exception(): Exception expected for value, NoneType found )

    @cjerdonek
    Copy link
    Member

    New changeset 21893fb by Chris Jerdonek in branch 'master':
    bpo-29587: allow chaining NULL exceptions in _gen_throw() (GH-19877)
    21893fb

    @cjerdonek
    Copy link
    Member

    Since none of you are subscribed to the other issue except for Nathaniel, I thought I'd let you know I also posted a PR to fix another issue he filed similar to this one (but this time about the stack trace being incorrect for gen.throw() in certain circumstances): https://bugs.python.org/issue29590

    That issue is the second of the two issues he cited back in 2017 about gen.throw() being broken:
    https://vorpus.org/~njs/misc/trio-language-summit-2017.pdf

    @vstinner
    Copy link
    Member

    vstinner commented May 4, 2020

    tl; dr I wrote PR 19902 to remove the "XXX" comment.

    --

    Commit 21893fb adds the following code:

        /* XXX It seems like we shouldn't have to check not equal to Py_None
           here because exc_type should only ever be a class.  But not including
           this check was causing crashes on certain tests e.g. on Fedora. */
        if (gen->gi_exc_state.exc_type && gen->gi_exc_state.exc_type != Py_None) { ... }

    I don't think that you should mention "Fedora" as a platform impacted by this issue: all platforms should be affected. It's just unclear why the issue was first seen on Fedora.

    gen_send_ex() copies tstate->exc_info to gen->gi_exc_state.

    tstate->exc_info->exc_type is set at least in the following 3 places in CPython code base:

    • ceval.c: POP_EXCEPT opcode: exc_info->exc_type = POP();
    • ceval.c: UNWIND_EXCEPT_HANDLER() macro: exc_info->exc_type = POP();
    • errors.c: PyErr_SetExcInfo()

    I saw in practice POP_EXCEPT and UNWIND_EXCEPT_HANDLER() setting exc_info->exc_type to Py_None (I didn't test PyErr_SetExcInfo()).

    _PyErr_GetTopmostException() also handles exc_info->exc_type == Py_None case:

    _PyErr_StackItem *
    _PyErr_GetTopmostException(PyThreadState *tstate)
    {
        _PyErr_StackItem *exc_info = tstate->exc_info;
        while ((exc_info->exc_type == NULL || exc_info->exc_type == Py_None) &&
               exc_info->previous_item != NULL)
        {
            exc_info = exc_info->previous_item;
        }
        return exc_info;
    }

    --

    So exc_type=None is not a bug, but it's done on purpose.

    If you don't want to get None in genobject.c, we should modify all places which set tstate->exc_info->exc_type. Problem: the structure currently exposed in the public C API (bpo-40429), and I wouldn't be surprised if Cython or greenlet modify tstate->exc_info directly.

    @vstinner
    Copy link
    Member

    vstinner commented May 5, 2020

    New changeset b0be6b3 by Victor Stinner in branch 'master':
    bpo-29587: _PyErr_ChainExceptions() checks exception (GH-19902)
    b0be6b3

    @cjerdonek
    Copy link
    Member

    Would someone be able to approve / take a look at this small PR addressing another aspect of this issue?
    #19858
    I'm hoping it can be part of 3.9, and the deadline is just a week away.

    It builds on the already merged PR to address an "Example 3" of this issue (the "yield from" case as opposed to the "yield" case, which the merged PR addressed):

    Example 3:

    def f():
        yield
    
    def g():
        try:
            raise KeyError('a')
        except Exception:
            yield from f()
    
    gen = g()
    gen.send(None)
    gen.throw(ValueError)

    Output without PR:

    Traceback (most recent call last):
      File "/.../test-example3.py", line 12, in <module>
        gen.throw(ValueError)
      File "/.../test-example3.py", line 8, in g
        yield from f()
      File "/.../test-example3.py", line 2, in f
        yield
    ValueError

    Output with PR:

    Traceback (most recent call last):
      File "/.../test-example3.py", line 6, in g
        raise KeyError('a')
    KeyError: 'a'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/.../test-example3.py", line 12, in <module>
        gen.throw(ValueError)
      File "/.../test-example3.py", line 8, in g
        yield from f()
      File "/.../test-example3.py", line 2, in f
        yield
    ValueError

    @cjerdonek
    Copy link
    Member

    New changeset 75cd8e4 by Chris Jerdonek in branch 'master':
    bpo-29587: Make gen.throw() chain exceptions with yield from (GH-19858)
    75cd8e4

    @cjerdonek
    Copy link
    Member

    New changeset d7184d3 by Chris Jerdonek in branch 'master':
    bpo-29587: Add another test for the gen.throw() fix. (GH-19859)
    d7184d3

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants