Author njs
Recipients ncoghlan, njs, yselivanov
Date 2017-04-04.23:42:57
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1491349378.03.0.0671954460373.issue29988@psf.upfronthosting.co.za>
In-reply-to
Content
You might hope the interpreter would enforce the invariant that for 'with' and 'async with' blocks, either '__(a)enter__' and '__(a)exit__' are both called, or else neither of them is called. But it turns out that this is not true once KeyboardInterrupt gets involved – even if we write our (async) context manager in C, or otherwise guarantee that the actual '__(a)enter__' and '__(a)exit__' methods are immune to KeyboardInterrupt.

The invariant is *almost* preserved for 'with' blocks: there's one instruction (SETUP_WITH) that atomically (wrt signals) calls '__enter__' and then enters the implicit 'try' block, and there's one instruction (WITH_CLEANUP_START) that atomically enters the implicit 'finally' block and then calls '__exit__'. But there's a gap between exiting the 'try' block and WITH_CLEANUP_START where a signal can arrive and cause us to exit without running the 'finally' block at all. In this disassembly, the POP_BLOCK at offset 7 is the end of the 'try' block; if a KeyboardInterrupt is raised between POP_BLOCK and WITH_CLEANUP_START, then it will propagate out without '__exit__' being run:

In [2]: def f():
   ...:     with a:
   ...:         pass
   ...:     

In [3]: dis.dis(f)
  2           0 LOAD_GLOBAL              0 (a)
              3 SETUP_WITH               5 (to 11)
              6 POP_TOP

  3           7 POP_BLOCK
              8 LOAD_CONST               0 (None)
        >>   11 WITH_CLEANUP_START
             12 WITH_CLEANUP_FINISH
             13 END_FINALLY
             14 LOAD_CONST               0 (None)
             17 RETURN_VALUE

For async context managers, the race condition is substantially worse, because the 'await' dance is inlined into the bytecode:

In [4]: async def f():
   ...:     async with a:
   ...:         pass
   ...:     

In [5]: dis.dis(f)
  2           0 LOAD_GLOBAL              0 (a)
              3 BEFORE_ASYNC_WITH
              4 GET_AWAITABLE
              5 LOAD_CONST               0 (None)
              8 YIELD_FROM
              9 SETUP_ASYNC_WITH         5 (to 17)
             12 POP_TOP

  3          13 POP_BLOCK
             14 LOAD_CONST               0 (None)
        >>   17 WITH_CLEANUP_START
             18 GET_AWAITABLE
             19 LOAD_CONST               0 (None)
             22 YIELD_FROM
             23 WITH_CLEANUP_FINISH
             24 END_FINALLY
             25 LOAD_CONST               0 (None)
             28 RETURN_VALUE

Really the sequence from 3 BEFORE_ASYNC_WITH to 9 SETUP_ASYNC_WITH should be atomic wrt signal delivery, and from 13 POP_BLOCK to 22 YIELD_FROM likewise.

This probably isn't the highest priority bug in practice, but I feel like it'd be nice if this kind of basic language invariant could be 100% guaranteed, not just 99% guaranteed :-). And the 'async with' race condition is plausible to hit in practice, because if I have an '__aenter__' that's otherwise protected from KeyboardInterrupt, then it can run for some time, and any control-C during that time will get noticed just before the WITH_CLEANUP_START, so e.g. 'async with lock: ...' might complete while still holding the lock.

The traditional solution would be to define single "super-instructions" that do all of the work we want to be atomic. This would be pretty tricky here though, because WITH_CLEANUP_START is a jump target (so naively we'd need to jump into the "middle" of a hypothetical new super-instruction), and because the implementation of YIELD_FROM kind of assumes that it's a standalone instruction exposed directly in the bytecode. Probably there is some solution to these issues but some cleverness would be required.

A alternative approach would be to keep the current bytecode, but somehow mark certain stretches of bytecode as bad places to run signal handlers. The eval loop's "check for signal handlers" code is run rarely, so we could afford to do relatively expensive things like check a lookaside table that says "no signal handlers when 13 < f_lasti <= 22". Or we could steal a bit in the opcode encoding or something.
History
Date User Action Args
2017-04-04 23:42:58njssetrecipients: + njs, ncoghlan, yselivanov
2017-04-04 23:42:58njssetmessageid: <1491349378.03.0.0671954460373.issue29988@psf.upfronthosting.co.za>
2017-04-04 23:42:57njslinkissue29988 messages
2017-04-04 23:42:57njscreate