Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

with statements are not ensuring that __exit__ is called if __enter__ succeeds #74174

Closed
njsmith opened this issue Apr 4, 2017 · 60 comments
Closed
Labels
3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@njsmith
Copy link
Contributor

njsmith commented Apr 4, 2017

BPO 29988
Nosy @rhettinger, @gpshead, @ncoghlan, @nirs, @njsmith, @markshannon, @embray, @serhiy-storchaka, @jdemeyer, @1st1, @ryanhiebert, @xgid, @jack1142
PRs
  • Protect C-implemented context managers, like lock, from ctrl-C. Fixes bpo-29988. #1799
  • bpo-29988: Only check evalbreaker after calls and on backwards egdes.  #18334
  • Dependencies
  • bpo-31344: f_trace_opcodes frame attribute to switch to per-opcode tracing
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-12-10.17:37:50.631>
    created_at = <Date 2017-04-04.23:42:57.982>
    labels = ['interpreter-core', 'type-bug', '3.9', '3.10']
    title = 'with statements are not ensuring that __exit__ is called if __enter__ succeeds'
    updated_at = <Date 2021-12-10.17:37:50.630>
    user = 'https://github.com/njsmith'

    bugs.python.org fields:

    activity = <Date 2021-12-10.17:37:50.630>
    actor = 'Mark.Shannon'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-12-10.17:37:50.631>
    closer = 'Mark.Shannon'
    components = ['Interpreter Core']
    creation = <Date 2017-04-04.23:42:57.982>
    creator = 'njs'
    dependencies = ['31344']
    files = []
    hgrepos = []
    issue_num = 29988
    keywords = ['patch']
    message_count = 60.0
    messages = ['291148', '291157', '294272', '294296', '294400', '294402', '294404', '294434', '297177', '297179', '297180', '297182', '297185', '297186', '297189', '297190', '301235', '301248', '301274', '301277', '301278', '301285', '301289', '301360', '301429', '301553', '301557', '301566', '301573', '301601', '301609', '301610', '301611', '301620', '301630', '301660', '301667', '301668', '301672', '301675', '301682', '301698', '301717', '301721', '301771', '301869', '312874', '321225', '321318', '350123', '350134', '352026', '352039', '352199', '352212', '352457', '353415', '372499', '386711', '389922']
    nosy_count = 15.0
    nosy_names = ['rhettinger', 'gregory.p.smith', 'ncoghlan', 'nirs', 'njs', 'Mark.Shannon', 'erik.bray', 'serhiy.storchaka', 'jdemeyer', 'yselivanov', 'deleted0524', 'ryanhiebert', 'xgdomingo', 'jack1142', 'Ami Fischman']
    pr_nums = ['1799', '18334']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue29988'
    versions = ['Python 3.9', 'Python 3.10']

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Apr 4, 2017

    You might hope the interpreter would enforce the invariant that for 'with' and 'async with' blocks, either '__(a)enter__' and '__(a)exit__' are both called, or else neither of them is called. But it turns out that this is not true once KeyboardInterrupt gets involved – even if we write our (async) context manager in C, or otherwise guarantee that the actual '__(a)enter__' and '__(a)exit__' methods are immune to KeyboardInterrupt.

    The invariant is *almost* preserved for 'with' blocks: there's one instruction (SETUP_WITH) that atomically (wrt signals) calls '__enter__' and then enters the implicit 'try' block, and there's one instruction (WITH_CLEANUP_START) that atomically enters the implicit 'finally' block and then calls '__exit__'. But there's a gap between exiting the 'try' block and WITH_CLEANUP_START where a signal can arrive and cause us to exit without running the 'finally' block at all. In this disassembly, the POP_BLOCK at offset 7 is the end of the 'try' block; if a KeyboardInterrupt is raised between POP_BLOCK and WITH_CLEANUP_START, then it will propagate out without '__exit__' being run:

    In [2]: def f():
    ...: with a:
    ...: pass
    ...:

    In [3]: dis.dis(f)
    2 0 LOAD_GLOBAL 0 (a)
    3 SETUP_WITH 5 (to 11)
    6 POP_TOP

    3 7 POP_BLOCK
    8 LOAD_CONST 0 (None)
    >> 11 WITH_CLEANUP_START
    12 WITH_CLEANUP_FINISH
    13 END_FINALLY
    14 LOAD_CONST 0 (None)
    17 RETURN_VALUE

    For async context managers, the race condition is substantially worse, because the 'await' dance is inlined into the bytecode:

    In [4]: async def f():
    ...: async with a:
    ...: pass
    ...:

    In [5]: dis.dis(f)
    2 0 LOAD_GLOBAL 0 (a)
    3 BEFORE_ASYNC_WITH
    4 GET_AWAITABLE
    5 LOAD_CONST 0 (None)
    8 YIELD_FROM
    9 SETUP_ASYNC_WITH 5 (to 17)
    12 POP_TOP

    3 13 POP_BLOCK
    14 LOAD_CONST 0 (None)
    >> 17 WITH_CLEANUP_START
    18 GET_AWAITABLE
    19 LOAD_CONST 0 (None)
    22 YIELD_FROM
    23 WITH_CLEANUP_FINISH
    24 END_FINALLY
    25 LOAD_CONST 0 (None)
    28 RETURN_VALUE

    Really the sequence from 3 BEFORE_ASYNC_WITH to 9 SETUP_ASYNC_WITH should be atomic wrt signal delivery, and from 13 POP_BLOCK to 22 YIELD_FROM likewise.

    This probably isn't the highest priority bug in practice, but I feel like it'd be nice if this kind of basic language invariant could be 100% guaranteed, not just 99% guaranteed :-). And the 'async with' race condition is plausible to hit in practice, because if I have an '__aenter__' that's otherwise protected from KeyboardInterrupt, then it can run for some time, and any control-C during that time will get noticed just before the WITH_CLEANUP_START, so e.g. 'async with lock: ...' might complete while still holding the lock.

    The traditional solution would be to define single "super-instructions" that do all of the work we want to be atomic. This would be pretty tricky here though, because WITH_CLEANUP_START is a jump target (so naively we'd need to jump into the "middle" of a hypothetical new super-instruction), and because the implementation of YIELD_FROM kind of assumes that it's a standalone instruction exposed directly in the bytecode. Probably there is some solution to these issues but some cleverness would be required.

    A alternative approach would be to keep the current bytecode, but somehow mark certain stretches of bytecode as bad places to run signal handlers. The eval loop's "check for signal handlers" code is run rarely, so we could afford to do relatively expensive things like check a lookaside table that says "no signal handlers when 13 < f_lasti <= 22". Or we could steal a bit in the opcode encoding or something.

    @njsmith njsmith added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Apr 4, 2017
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 5, 2017

    My first thought would be to inject a wordcode instruction that's essentially an eval loop directive: "ATOMIC_UNTIL <offset>"

    ATOMIC_UNTIL would have at least the following properties:

    • checks for signals are skipped until the given offset is reached
    • checks to release the GIL are skipped until the given offset is reached

    It may also have the following defensive coding property:

    • only wordcode instructions on a pre-approved whitelist are permitted during atomic execution blocks (i.e. we'd only add them to the whitelist on an as needed basis, to minimise the risk of undesirable side effects, like being able to use wordcode manipulation to put an entire loop inside an atomic block)

    The last bit would likely have undesirable performance implications, so it would probably be reasonable to only enable the check for debug builds, rather than always enforcing it.

    @deleted0524
    Copy link
    Mannequin

    deleted0524 mannequin commented May 23, 2017

    This is either a "won't fix" or an "impossible to fix" depending on your point of view.

    PATIENT:
    It hurts whenever I press ctrl-C
    DOCTOR:
    Then don't press ctrl-C

    The problem is that ctrl-C can provoke an interrupt at any point in the program, and thus break presumed invariants.
    try-finally has the same problem.

    From the observable side effects it is indistinguishable whether an interrupt occurs after the last instruction in an __enter__ function or after the first (side-effect-less) instruction after the __enter__.
    Likewise the last pre-exit and first in-exit instructions are effectively the same from the point of view from observable side-effects, but are different from the point of view of the handling of exceptions.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented May 23, 2017

    Right, fixing this bug alone can't make programs control-C safe in general. But there exist techniques to make __enter__/exit methods safe WRT control-C, and if we fix this bug *and* apply those techniques then you can get some meaningful guarantees.

    For example, threading.Lock's __enter__ and __exit__ methods are written in C, so they are guaranteed to happen atomically WRT to control-C. Right now, it's possible for a 'with lock: ...' to raise KeyboardInterrupt with the lock still held, which is surprising and not what we would like. If this bug were fixed, then it would be guaranteed that the lock was always released.

    (And I originally discovered this while implementing control-C handling in trio, which is pure-Python but also has a way to protect __(a)enter__/(a)exit methods against control-C. ...though that won't be 100% reliable until bpo-12857 or similar is fixed. See https://vorpus.org/blog/control-c-handling-in-python-and-trio/ for exhaustive details.)

    @markshannon
    Copy link
    Member

    If all you need is that

    with foo:
       pass

    guarantees that either both or neither of __enter__ and __exit__ are called, for C context managers, and only C context managers, then the fix is trivial.

    To protect Python code would need a custom context manager wrapper

    with ProtectsAgainstInterrupt(user_ctx_mngr()):
        do_stuff()

    ProtectsAgainstInterrupt would need to be implemented in C and install a custom signal handler.

    @markshannon
    Copy link
    Member

    Nathaniel,

    Do you have any way to reliably test for this failure mode?

    @njsmith
    Copy link
    Contributor Author

    njsmith commented May 24, 2017

    If all you need is that with foo: pass guarantees that either both or neither of __enter__ and __exit__ are called, for C context managers, and only C context managers, then the fix is trivial.

    It would be nice to have it for 'async with foo: pass' as well, which is a little less trivial because the 'await' dance is inlined into the bytecode (see the initial post in this bug), but basically yes.

    Do you have any way to reliably test for this failure mode?

    Unfortunately no, I haven't implemented one. Let's see, though...

    The test I wrote for bpo-30039 demonstrates one way to trigger a signal on a precise bytecode, by writing some code in C that calls raise(SIGNALNUMBER), and then calling it immediately before the bytecode where we want the signal to be raised (simulating the situation where the signal happens to arrive while the C function is running -- note that raise() is convenient because unlike kill() it works cross-platform, even on Windows).

    This might be sufficient for testing the 'async with' version; it looks like an __await__ method or iterator implemented in C and calling raise() would deliver a signal at a point that should be protected but isn't.

    The tricky case is plain 'with'. We can write something like:

    with lock:
        raise_signal()

    and this gives bytecode like:

    1 0 LOAD_NAME 0 (lock)
    2 SETUP_WITH 12 (to 16)
    4 POP_TOP

    2 6 LOAD_NAME 1 (raise_signal)
    8 CALL_FUNCTION 0
    10 POP_TOP
    12 POP_BLOCK
    14 LOAD_CONST 0 (None)
    >> 16 WITH_CLEANUP_START

    So the problem is that at offset 8 is where we can run arbitrary code, but the race condition is if a signal arrives between offsets 12 and 16.

    One possibility would be to set up a chain of Py_AddPendingCall handlers, something like:

    int chain1(void* _) {
        Py_AddPendingCall(chain2, 0);
        return 0;
    }
    int chain2(void* _) {
        Py_AddPendingCall(chain3, 0);
        return 0;
    }
    int chain3(void* _) {
        raise(SIGINT);
        return 0;
    }

    (or to reduce brittleness, maybe use the void* to hold an int controlling the length of the chain, which would make it easy to run tests with chains of length 1, 2, 3, ...)

    ......except consulting ceval.c I see that currently this won't work, because it looks like if you call Py_AddPendingCall from inside a pending call callback, then Py_MakePendingCalls will execute the newly added callback immediately after the first one returns. It could be made to work by having Py_MakePendingCalls do a first pass to check the current length of the queue, and then use that as the bound on how many calls it makes.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented May 25, 2017

    On further thought, I think the way I'd write a test for this is:

    (1) add a testing primitive that waits for N instructions and then injects a SIGINT. Probably this would require tweaking the definition of Py_MakePendingCalls like I described in my previous comment, and then some code like:

    int sigint_timebomb(void* count_ptr) {
        intptr_t count = (intptr_t) count_ptr;
        if (count == 0)
            raise(SIGINT);
        else
            Py_AddPendingCall(sigint_timebomb, (void*) (count - 1));
        return 0;
    }

    (2) write a test like:

    reached_crashpad = False
    lock = threading.Lock()
    i = 0
    while not reached_crashpad:
        i += 1
        try:
            sigint_timebomb(i)
            with lock:
                1 + 1
            # this loop keeps executing instructions until any still-armed
            # timebombs go off
            while True:
                reached_crashpad = True
        except KeyboardInterrupt:
            # key invariant: signal can't leave the lock held
            assert not lock.locked()
        else:
            assert False  # can't happen

    The idea here is that this sets off SIGINTs at every possible place throughout the execution of the 'with lock', while checking that the lock is always released, and without needing to hard code any information at all about what bytecode the compiler generated.

    @jdemeyer
    Copy link
    Contributor

    Nice analysis. I always assumed that with was safe from such race conditions, but it isn't.

    @ncoghlan
    Copy link
    Contributor

    One testing possibility worth considering: perhaps it would be make sense to have "start_try" and "start_finally" trace events, such that a trace function could be installed that injected signals at the worst possible time in the opcode evaluation?

    I haven't fully thought through the possible implications of that approach, but it seems like it would be less fragile than counting code offsets.

    @ncoghlan
    Copy link
    Contributor

    Sorry, I misread Nathaniel's suggestion - the exhaustive search of all possible offsets already avoids the need to hardcode any code generation dependent details in the tests. So the trace function approach would just avoid the need for that search.

    @jdemeyer
    Copy link
    Contributor

    Or we could steal a bit in the opcode encoding or something.

    That seems like a very reasonable and easy-to-implement solution. It would generalize this check:

    cpython/Python/ceval.c

    Lines 1067 to 1071 in e82cf86

    if (_Py_OPCODE(*next_instr) == SETUP_FINALLY) {
    /* Make the last opcode before
    a try: finally: block uninterruptible. */
    goto fast_next_opcode;
    }

    (@nathaniel Smith: nice blog post!)

    @embray
    Copy link
    Contributor

    embray commented Jun 28, 2017

    > Or we could steal a bit in the opcode encoding or something.

    That seems like a very reasonable and easy-to-implement solution. It > would generalize this check:

    cpython/Python/ceval.c

    Lines 1067 to 1071 in e82cf86

    if (_Py_OPCODE(*next_instr) == SETUP_FINALLY) {
    /* Make the last opcode before
    a try: finally: block uninterruptible. */
    goto fast_next_opcode;
    }

    Interesting; I didn't notice that bit before. It seems like that does at least try to guarantee that a signal can't interrupt between:

    lock.acquire()
    try:
        ...

    which previously I assumed we couldn't make any guarantees about. But CPython at least does, or tries to. It would be good to generalize that.

    @jdemeyer
    Copy link
    Contributor

    It seems like that does at least try to guarantee that a signal can't interrupt between:

    lock.acquire()
    try:
    ...

    Actually, I think it's between the end of the try and the beginning of the finally (which is precisely the same place that breaks for a with statement).

    @embray
    Copy link
    Contributor

    embray commented Jun 28, 2017

    Actually I just finished reading njs's blog post, and as he points out that special case for SETUP_FINALLY is basically broken. There are other cases where it doesn't really work either. For example if you have:

    if ...:
        do_something()
    else:
        do_something_else()
    try:
        ...
    finally:
        ...

    then (ignoring the issue about POP_TOP for a moment) the last instruction in *one* of these branches if followed immediately by SETUP_FINALLY, while in the other branch there's a JUMP_FORWARD, then the SETUP_FINALLY.

    All the more reason for a more generic solution to this that doesn't depend strictly on the next op locally in the byte code.

    @jdemeyer
    Copy link
    Contributor

    Actually, I think it's between the end of the try and the beginning of the finally (which is precisely the same place that breaks for a with statement).

    Sorry, this is wrong and Erik was right. But the special case doesn't work anyway, so it doesn't really matter.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 4, 2017

    Greg Smith & I are looking at this at the core dev sprint, and we think some variant of the "atomic until" idea should work, but there's a prerequisite change to the way "async with" works: the "GET_AWAITABLE" opcodes need to be avoided in this case, as they call __await__, and hence may run arbitrary Python code.

    We can't see any immediate barriers to moving those calls up into BEFORE_ASYNC_WITH, such that what ends up on the eval loop's stack is the already resolved iterable for use by YIELD FROM, rather than combining GET_AWAITABLE+YIELD_FROM the way a normal await expression does.

    That would then give the preamble:

    BEFORE_ASYNC_WITH (resolves __aenter__ and __aexit__ to iterables)
    LOAD_CONST 0
    YIELD_FROM (need to skip signal processing here)
    SETUP_ASYNC_WITH
    

    And the postamble:

    POP_BLOCK (need to skip signal processing until YIELD_FROM)
    LOAD_CONST 0
    WITH_CLEANUP_START
    LOAD_CONST 0
    YIELD_FROM
    WITH_CLEANUP_FINISH
    

    We also agree that adding some kind of test injection hook (potentially limited to pydebug builds, depending on exactly how it works) is likely to be a necessary to be able to test this.

    @ncoghlan ncoghlan added the type-bug An unexpected behavior, bug, or error label Sep 4, 2017
    @gpshead
    Copy link
    Member

    gpshead commented Sep 4, 2017

    Yury's in the room here and pointed out how Nick and I are wrong. :) [yay sprints!]

    async def __aexit__(self, *e):
      spamity_spam
      return Awaitable

    If we resolve the GET_AWAITABLE at BEFORE_ASYNC_WITH time, the spamity_spam within __aexit__ is executed before the with block instead of after as happens today. :/

    See also https://www.python.org/dev/peps/pep-0492/#asynchronous-context-managers-and-async-with

    Nick is actively working on making a test.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 4, 2017

    https://github.com/ncoghlan/cpython/pull/2/files provides a test case that reliably reproduces the problem for both synchronous and asynchronous context managers.

    It's inspired by Nathaniel's proposal above, but relies on a modified version of sys.settrace that runs the trace function after every opcode, not just every time the line number changes or we jump backwards in the bytecode.

    bpo-31344 is a separate issue to add that underlying capability so we can write this test case.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Sep 4, 2017

    This would still provide value even if you have to do a GET_AWAITABLE in the protected region: the most common case is that __aenter__ is a coroutine function, which means that its __await__ is implemented in C and already protected against interrupts.

    Also, to make this fully useful we need some way to protect arbitrary callables against interrupts anyway (to handle the case where the signal arrives during the actual __enter__ or __exit__ body), so saying that we also need to be able to protect __await__ isn't a big deal.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 4, 2017

    Right, if you look at the comments in the draft test case, we realised there are three things we currently need to protect:

    1. POP_BLOCK -> WITH_CLEANUP_START (synchronous CM)
    2. POP_BLOCK -> GET_AWAITABLE (asynchronous CM)
    3. GET_AWAITABLE -> YIELD_FROM (asynchronous CM)

    Now that I have a test case, I'm going to start looking at defining a new DEFER_PENDING_UNTIL opcode that skips pending call processing (and hence signals) until that particular opcode offset is reached.

    @gpshead
    Copy link
    Member

    gpshead commented Sep 4, 2017

    I do wonder if this is going to raise the urgency of https://bugs.python.org/issue16565 (Increase Py_AddPendingCall array size) or other refactoring of how pending calls are tracked.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Sep 5, 2017

    In bpo-30703 Antoine fixed signal handling so it doesn't use Py_AddPendingCall anymore. At this point the only time the interpreter itself uses Py_AddPendingCall is when there's an error writing to the signal_fd, which should never happen in normal usage.

    If there are third-party libraries making heavy use of Py_AddPendingCall then it could be an issue, but any such libraries should already be making sure they never have more than one Py_AddPendingCall pending at a time, because you already have to assume that the processing might be arbitrarily delayed.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 5, 2017

    https://github.com/ncoghlan/cpython/pull/2/files now adds a DEFER_PENDING_UNTIL opcode that relies on the jump offset calculation machinery to decide how long pending call processing should be deferred.

    Right now, the test case is also emulating the "skip pending call" logic based on a frame attribute, but I'm going to try to replace that with a suitable call to _testcapi._pending_threadfunc, which would avoid the need to expose those details on the frame, and ensure that we're testing the logic that actually matters.

    However, the good news is that:

    1. the patch is complete enough that I think it demonstrates that this approach is potentially viable
    2. because it only suspends pending call process in the *current* frame, it's safe to cross boundaries that may call arbitrary Python code (this turns out to be important for the async with case)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 6, 2017

    I've updated the draft PR to actually raise the exception from a pending call hook, rather than emulating it in a trace hook.

    This was a bit trickier than I first expected, as it required moving the trace hook itself into C, as otherwise the "return" bytecode at the end of the trace function would trigger the just registered pending call (and the pending call processing isn't deferred in the trace function's frame).

    Right now, the revised tests are failing though - I'm assuming that's a sign that the previous fix wasn't actually sufficient, and it only looked sufficient due to the way the errors were being emulated.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 7, 2017

    The updated PR fully resolves the synchronous CM case by switching to using threading.Lock as the test CM (as per Nathaniel's original suggestion). Without that, the pending exception was being processed as soon as __exit__() started running, so the test failed due to the lack of signal safety in the test CM itself.

    For the asynchronous case, Greg and I aren't seeing any way to resolve the situation without somehow making the pending call processing event loop aware - otherwise even if the current frame postpones dealing with the pending call, it will still get processed as soon as YIELD_FROM returns control to the asyncio event loop. The one thing we think may work is to provide a way for asyncio (et al) to configure a thread such that all pending calls are routed to the thread's event loop for handling, rather than being processed directly by the eval loop.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 8, 2017

    In addition to adding a test case to ensure exceptions were handled correctly, I also added test cases for return, break, and continue, and all four new tests initially failed.

    Adding a special case for WITH_CLEANUP_START resolved those, but the tests need an adjustment to account for the fact that some of the opcode offsets are unreachable in the test case that covers the early return.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Sep 8, 2017

    a special case for WITH_CLEANUP_START

    I guess this works for the sync case, but it seems kinda fragile, and obviously won't work for the async case. Do you think it's worth revisiting this idea from the OP?: "The eval loop's "check for signal handlers" code is run rarely, so we could afford to do relatively expensive things like check a lookaside table that says "no signal handlers when 13 < f_lasti <= 22"."

    (Of course it should be 13 <= f_lasti < 22, apparently I was writing in a mirror or something.)

    BTW, I looked at your branch, and figured out the problem with your async test. Asyncio isn't a problem (YIELD_FROM doesn't yield to the event loop, only an actual 'yield' statement does that); the problem was just that your test case's __aexit__ is written in Python, so you were exceptions delayed until the __aexit__ call, and then they were raised inside __aexit__ and the test couldn't detect that properly. I have a patch against 9b8891a that makes it pass. (It switches the pass condition to "either tracking_cm.enter_without_exit is False OR AsyncTrackingCM.__aexit__ is in the exception's traceback".) But of course it still has the problem with jumping past DEFER_PENDING_UNTIL. Do you want me to post the patch here, or in the other issue?

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 8, 2017

    Yeah, I was already thinking the lookaside table might be a better idea (similar to co_lnotab replacing the old SET_LINENO opcodes), especially since that would also provide a quick way of introspecting code objects to see if they included any try/finally constructs. The special case was just a quick check to see if the problem was what I thought it was.

    That's also a good point regarding __aexit__ (I figured out that was the problem in the sync case, but hadn't realised it also applied to the async case), so I'll restore those tests and opcode updates.

    I'm also wondering how far we might be able to get by adjusting the pending signal processing such that we always execute at least one line from a function before checking for pending signals (that should be enough to protect the test's __aexit__, and it would also be enough to get inside a with statement that guarded a function body against signals).

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 8, 2017

    I updated the PR at https://github.com/ncoghlan/cpython/pull/2/files with another attempt at reinstating the asynchronous CM test case that also tweaks the eval loop to ensure that the first few opcodes in a function are always executed before pending calls are checked.

    This version highlights why I think asyncio really needs work before the async behaviour can be tested robustly in the standard library: it *hangs* somewhere in _asyncio.TaskStepMethWrapper once pending call handling is deferred long enough for the __aexit__ method to actually finish.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Sep 8, 2017

    Here's the patch I mentioned: njsmith@62547dc

    It has a crude but effective technique for handling the hanging issue :-). Alternatively one could use the World's Stupidest Coroutine Runner:

    def run(coro):
        try:
            coro.send(None)
        except StopIteration:
            pass
        else:
            raise RuntimeError("yielded")

    I'm also wondering how far we might be able to get by adjusting the pending signal processing such that we always execute at least one line from a function before checking for pending signals

    Huh, that's a neat idea. I think it's defeated by 'while True: pass', though :-(. (You have to write it on two lines, but it compiles down to a single instruction that jumps to itself.)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 9, 2017

    After chatting to Facebook's Carl Shapiro here at the core dev sprint, I'm going to start exploring a hopefully more robust white-listing based approach where we check for pending calls in the following cases:

    • the instruction pointer either hasn't changed or has gone backwards and the instruction isn't a YIELD_FROM (the "return to the start of a loop" case)
    • the next instruction is RETURN_VALUE (the function postamble case)

    For now, I'm going to *skip* checking for additional pending calls when exiting a frame via exception, based on the theory that if we're already handling an exception, it's OK to wait until something catches it before checking for potential additional exception sources.

    @ncoghlan
    Copy link
    Contributor

    I've pushed a variant that relies entirely on checking for instruction pointer changes, and it seems quite promising. However, a test_io failure introduced by this variant shows a new way for this version to *create* problems: because of the shift in the way signals are handled, the failing test switches from raising the exception as the last item inside the with statement to instead raising it as the first thing inside the __exit__ statement for a pure-Python context manager.

    @ncoghlan
    Copy link
    Contributor

    With bpo-17611 merged (which moves stack unwinding to the compiler), I expect the exact details of this problem to have changed, but the general problem still exists: Ctrl-C may lead to __exit__ (or __aexit__) not being called even after __enter__ (or __aenter__) returns successfully, and this may happen even for context managers implemented with uninterruptible methods (e.g. in C in CPython without calling back into any Python code).

    @serhiy-storchaka
    Copy link
    Member

    The issue with synchronous 'with' can be solved by bpo-32949.

    See also bpo-34066 for the problem with interruption before calling __enter__.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jul 9, 2018

    While Greg Smith and I both cringed at the idea when I first raised it, I'm becoming more and more convinced that the only way we're going to be able to make resource cleanup reliable is for with statements to have the ability to disable signal handling while __enter__ and __exit__ methods are running.

    When a with statement switches signal handling off in a particular execution context, there'd then need to be some absolute timing deadline for switching them back on, so resource acquisition or cleanup that got stuck in an infinite loop could still be interrupted eventually.

    If you combined that with the signal handling approach in https://github.com/ncoghlan/cpython/pull/2/files, then I think we'd have as solid a solution as CPython is likely to be able to provide.

    @gpshead gpshead added the 3.8 only security fixes label Sep 25, 2018
    @gpshead gpshead added 3.9 only security fixes and removed 3.7 (EOL) end of life labels Aug 21, 2019
    @rhettinger
    Copy link
    Contributor

    Historically, in C, Python, and and low-level interrupt programming it was the responsibility of a person writing a signal handler to make minimal variable edits and not disrupt anything else in the ecosystem.

    Are we changing that rule? (I'm still at the cringing stage and possibly haven't evolved my views yet).

    @gpshead
    Copy link
    Member

    gpshead commented Aug 21, 2019

    The signal handler in this case is CPython's internal signal handling system thus any such onus falls on us...

    The great signal handling hack of "set a flag that the interpreter loop checks on occasion" trick lasted a long time, but our VM has since evolved to be much more complicated than every bytecode being a safe interrupt point.

    FYI - Most of the in-room conversation we tried to capture on this issue thread was at the core dev sprint a couple years ago (Sept 2017). It'll similarly take me a while to swap this context back in.

    @ncoghlan
    Copy link
    Contributor

    There is also the draft test case at ncoghlan#2

    That needs updating to account for the eval loop changes since I last worked on it, though. (I think that conflict may also be keeping the bots from running, even after I updated the issue title to use the modern convention)

    @rhettinger
    Copy link
    Contributor

    Is there a chance this will get resolved for 3.8? This is an important guarantee.

    @gpshead
    Copy link
    Member

    gpshead commented Sep 12, 2019

    While there is technically still time, it'd take people spending dedicated time on it before the 3.8 release. I won't be.

    Important? Sure. But not urgent; this bug is not new. We've lived with it for many releases already. I believe it has been around since 2.5 when with statements were introduced.

    @markshannon
    Copy link
    Member

    I expect to fix this is as part of the general improvements I am making to the bytecode, interpreter and compiler.
    So it should be fixed for 3.9 but not for 3.8

    @ncoghlan
    Copy link
    Contributor

    It's also not unique to with statements - it applies to all finally clauses. The longstanding workaround when deterministic cleanup is absolutely critical has been to run the "real" application in a subthread, and devote the main thread to gracefully terminating the subthread when requested.

    When cleanup is critical, but doing it in a deterministic order is less so, __del__ methods are often used to fill the gap (although they too can be interrupted by a subsequent Ctrl-C).

    I also realized that allowing infinite loops in cleanup code to ignore Ctrl-C may actually be a tolerable outcome: in the worst case, users can still escalate to Ctrl-Break/kill -9/Force stop/etc and pull the entire OS process out from under the interpreter. It's not good, but may be worth it in order to better handle users pressing Ctrl-C multiple times.

    @gpshead
    Copy link
    Member

    gpshead commented Sep 27, 2019

    As a note on the general pattern, a user at work diagnosed a ^C problem in their code when running on 2.7 to be due to Queue.get's

    acquire()
    try:
      ...
    finally:
      release()

    Pattern, with the KeyboardInterrupt triggering after acquire() but before the try is entered. so release() is never called.

    A try finally pattern that probably alleviates this by entering the try block first might look like:

    try:
    acquire()
    ...
    finally:
    try:
    release()
    except ThreadError:
    pass # interrupted before acquire() succeeded.

    It'd be a shame if any with statements lock acquisition context managers need to be turned into that, but I _believe_ it'd be a viable workaround for the time being if this race is found to be biting anyone.

    @ncoghlan
    Copy link
    Contributor

    Belatedly clearing the issue assignment here - while I do still sometimes ponder this problem, I haven't been actively working on it since the 2017 core sprint where Greg & I made our last serious attempt at trying to improve the situation.

    Mark's PR at #18334 looks very promising to me, though - my main request was just to bring over the tests I wrote at the 2017 core dev sprints, and confirm that the revised eval breaker logic solves the issue.

    @ncoghlan ncoghlan removed their assignment Jun 28, 2020
    @gpshead gpshead added 3.10 only security fixes and removed 3.8 only security fixes labels Sep 10, 2020
    @nirs
    Copy link
    Mannequin

    nirs mannequin commented Feb 9, 2021

    Does #1799 solve this issue
    for synchronous with?

        with closing(this), closing(that):

    If it does, can we backport this fix to python 3.6?

    3.6 is used as system python for RHEL/Centos 8, will be used for at
    least 5 years or so.

    @markshannon
    Copy link
    Member

    The bytecode instruction set has changed a lot since 3.6, so I think a backport would be impractical.

    3.6 is in security fix only mode, so you'd need to take this up with Red Hat.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants