classification
Title: Move unwinding of stack for "pseudo exceptions" from interpreter to compiler.
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Demur Rumed, Mark.Shannon, benjamin.peterson, christian.heimes, mark.dickinson, nascheme, ncoghlan, pitrou, rhettinger, serhiy.storchaka, trent
Priority: normal Keywords: patch

Created on 2013-04-01 16:31 by Mark.Shannon, last changed 2017-12-07 23:04 by Mark.Shannon.

Files
File name Uploaded Description Edit
b16527f84774.diff Mark.Shannon, 2013-04-01 16:31 review
unwinding.patch pitrou, 2014-04-29 18:21 review
perf_unwind_stack.txt nascheme, 2017-11-06 22:41
Pull Requests
URL Status Linked Edit
PR 2827 open pitrou, 2017-07-23 19:20
PR 4682 open nascheme, 2017-12-02 22:16
Repositories containing patches
https://bitbucket.org/markshannon/cpython_simpler_bytecode
Messages (45)
msg185741 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2013-04-01 16:31
The handling of "pseudo exceptions" (return, break and continue) are currently handled in the interpreter. This make the interpreter loop more complex and slower than it needs to be. This change moves the handling of pseudo exceptions into the compiler. 

The net effects of this patch are:

Simpler interpreter loop: no 'psuedo-exceptions', fewer bytecodes and some simplifed bytecodes.

Eliminate the 'why_code' state variable in the  interpreter. Execution is always in the 'normal' state except during explicit exception handling.

Small increase in size and complexity of compiler.

Speedup of 1.5% (Intel i7); this should be considered a happy side-effect rather than a motivation for the change.
msg185909 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-03 10:20
Simplifying the eval loop is a rather good thing. Could you post examples of bytecode disassembly before and after the patch?
msg185916 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-04-03 11:23
The change seems to slow down the pi float benchmark. Run this and hit
Ctrl-C after the first decimal result appears:

./python Modules/_decimal/tests/bench.py


I've run the benchmark seven times and the average for float is
something like 8-10% slower (64-bit Linux, Core 2 Duo). Decimal
is not affected.
msg185946 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2013-04-03 19:39
Stefan,
I tried running your pi_bench (increasing the numer of iterations by x10 to reduce jitter). The run to run variation exceed the difference between the two versions. Here are the fastest of 7 runs on my machine.
i3-2370M CPU @ 2.40GHz × 4. linux 64bit. 

With patch:
1.154152s

Without patch:
1.157730s

I'm somewhat surprised by the slowdown on your machine. There should be no difference in the bytecodes executed or their implementation in the pi_float loop.
msg185950 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2013-04-03 20:25
Antoine,
Two bytecode examples.


For the following function:
>>> def f(x):
...     try:
...         return x
...     except:
...         pass

Without the patch:
  2           0 SETUP_EXCEPT             8 (to 11) 

  3           3 LOAD_FAST                0 (x) 
              6 RETURN_VALUE         
              7 POP_BLOCK            
              8 JUMP_FORWARD             8 (to 19) 

  4     >>   11 POP_TOP              
             12 POP_TOP              
             13 POP_TOP              

  5          14 POP_EXCEPT           
             15 JUMP_FORWARD             1 (to 19) 
             18 END_FINALLY          
        >>   19 LOAD_CONST               0 (None) 
             22 RETURN_VALUE         

With the patch:
  2           0 SETUP_EXCEPT             9 (to 12) 

  3           3 LOAD_FAST                0 (x) 
              6 POP_BLOCK            
              7 RETURN_VALUE         
              8 POP_BLOCK            
              9 JUMP_FORWARD             8 (to 20) 

  4     >>   12 POP_TOP              
             13 POP_TOP              
             14 POP_TOP              

  5          15 POP_EXCEPT           
             16 JUMP_FORWARD             1 (to 20) 
             19 RERAISE              
        >>   20 LOAD_CONST               0 (None) 
             23 RETURN_VALUE         

The difference is the 'POP_BLOCK' inserted at offset 7 *before* the 'RETURN_VALUE', so that the RETURN_VALUE bytecode does not have to do any unwinding of the block stack.

For loops the SETUP_LOOP and the matching POP_BLOCK are eliminated:

>>> def f(x):
...    for y in x:
...        g(x)

Without the patch:
  2           0 SETUP_LOOP              24 (to 27) 
              3 LOAD_FAST                0 (x) 
              6 GET_ITER             
        >>    7 FOR_ITER                16 (to 26) 
             10 STORE_FAST               1 (y) 

  3          13 LOAD_GLOBAL              0 (g) 
             16 LOAD_FAST                1 (y) 
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             22 POP_TOP              
             23 JUMP_ABSOLUTE            7 
        >>   26 POP_BLOCK            
        >>   27 LOAD_CONST               0 (None) 
             30 RETURN_VALUE         

With the patch:
  2           0 LOAD_FAST                0 (x) 
              3 GET_ITER             
        >>    4 FOR_ITER                16 (to 23) 
              7 STORE_FAST               1 (y) 

  3          10 LOAD_GLOBAL              0 (g) 
             13 LOAD_FAST                0 (x) 
             16 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             19 POP_TOP              
             20 END_ITER                 4 
        >>   23 LOAD_CONST               0 (None) 
             26 RETURN_VALUE        

END_ITER is a synonym for JUMP_ABSOLUTE. It is needed so that frame.set_lineno() can identify loop blocks.
msg185951 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-04-03 20:32
I'm surprised, too, but after a couple of retries the results stay the
same. On an i7 there's also a difference, but not quite as large. I'm
using b16527f84774.diff, which applies cleanly apart from importlib.h
(which I just regenerated).

With an increased loop count the difference is also the same (10% on
the Core2 Duo).

Core2 Duo, 3.16GHz, Linux 64-bit
================================

patched:
--------
(0.10855+0.10901+0.1168+0.1119+0.1095+0.1090+0.1084+0.1130+0.1235+0.1183) / 10 = 0.113

unpatched:
----------
(0.0995+0.1049+0.1009+0.1021+0.0986+0.1005+0.0988+0.10171+0.0987+0.0992) / 10 = 0.10

(i7, 2.80GHz,  Linux 64-bit)
============================

patched:
--------
(0.1021+0.1078+0.1072+0.1058+0.1091+0.1067+0.1064+0.1071+0.1069+0.1067) / 10 = 0.107

unpatched:
----------
(0.1029+0.1033+0.1023+0.1029+0.1050+0.1020+0.1031+0.1024+0.10149+0.10247) / 10 = 0.103
msg185952 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-04-03 20:53
BTW, there's no general slowdown on the Core2 Duo: In the patched version,
the telco benchmark is consistently 4% faster.
msg185954 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-03 20:57
> BTW, there's no general slowdown on the Core2 Duo: In the patched version,
> the telco benchmark is consistently 4% faster.

It's customary for even innocent changes in ceval to produce small
changes in some benchmarks. It's only really important to consider the
global average.
msg185955 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-04-03 21:28
Antoine Pitrou <report@bugs.python.org> wrote:
> It's customary for even innocent changes in ceval to produce small
> changes in some benchmarks. It's only really important to consider the
> global average.

Yes, the float benchmark appears to be particularly sensitive. In the
3.3.0 release it also runs 8% slower than it does now (unpatched).

I'm trying to keep track of changes because in 2.7 the (float) benchmark runs
something like 25% faster than in 3.3.0.
msg217534 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-29 18:21
Here is an updated patch keeping up with recent changes in the source tree.
msg228816 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-08 20:53
> END_ITER is a synonym for JUMP_ABSOLUTE. It is needed so that frame.set_lineno() can identify loop blocks.

I was thinking about this. Isn't it enough to consider all backwards edges as ending a loop block?
msg228817 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-08 20:53
Ah, no, I guess "continue" would also create a backwards edge. Nevermind, sorry.
msg231250 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-16 15:06
I like the idea. I have not make faultfinding review yet, but one thing is questionable to me.

Is it necessary to get rid of the END_FINALLY opcode?

Currently Python code

    try:
        stmt1
    finally:
        stmt2

is compiled to:

    SETUP_FINALLY L1
    // stmt1
    POP_BLOCK
    LOAD_CONST None
L1: // stmt2
    END_FINALLY

With the patch it is compiled to:

    SETUP_FINALLY L1
    // stmt1
    POP_BLOCK
    // stmt2
    JUMP_ABSOLUTE L2
L1: // stmt2
    RERAISE
L2:

Finally body is duplicated. In case of large finally body this increases the size of the bytecode. Line numbers are duplicated in disassembly listing, this is confused. And I afraid that due to increasing the size of the bytecode, some relative jumps can became too long.

If it is absolutely necessary to remove the END_FINALLY opcode, I suggest to generate following bytecode:

    SETUP_FINALLY L1
    // stmt1
    POP_BLOCK
    LOAD_CONST None
L1: // stmt2
    POP_JUMP_IF_FALSE L2
    RERAISE
L2:

But it looks to me that special END_FINALLY opcode which combines POP_JUMP_IF_FALSE/RERAISE would be better for performance and compiled code size.
msg258673 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-20 11:37
FYI The issue #26107 changed code.co_lnotab to support negative line number deltas.
msg267329 - (view) Author: Demur Rumed (Demur Rumed) * Date: 2016-06-04 22:05
Java duplicates finally bytecode too: http://cliffhacks.blogspot.ca/2008/02/java-6-tryfinally-compilation-without.html

Tho they avoid jsr instruction because it causes non trivial bytecode validation. Still, they would've had to concluded that finally blocks are often small enough to be alright to inline
msg267418 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-05 15:03
The patch no longer applied cleanly. Mark, can you please update your patch?
msg268456 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2016-06-13 16:46
This looks to be a good idea and a good time to merge it now the bytecode has changed to 16-bit.  The increase in complexity to compile.c is not bad and reducing the complexity of the eval loop is worth it, IMHO.
msg298910 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-07-23 19:22
I updated the patch for git master. Some cleanup is in order.
msg298934 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-07-24 08:09
So, the approach of duplicating finally blocks tends to lead to a non-trivial bytecode increase.  There is even a potential combinatorial explosion with nested "try..finally" block:

def f():
    try:
        ...
    finally:
        try:
            ...
        finally:
            # etc.

Such a chain of N nested "finally"s will emit O(2**N) opcodes.

I'm not sure it's a pratical concern but I find it is detrimental to the readibility of bytecode (which is a nice thing to have when doing low-level debugging or tweaking).  I think we can massage the PR to remove the "finally" block duplication while keeping the predictable stack effect.
msg298935 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-07-24 08:25
For example we could generate the following bytecode:

    SETUP_FINALLY L1
    // ... stmt1
    POP_BLOCK
    JUMP_FINALLY  L1
L1:
    // ... stmt2
    RERAISE
    JUMP_FORWARD  L2
L2:


JUMP_FINALLY would push 3 Nones on the stack.  RERAISE would raise only if non-None values are popped.
msg298943 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-07-24 09:21
Ok, I've pushed an update with removes the finally block duplication.
msg298949 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-24 09:59
Great! I tried to update the patch myself, but there were too much conflicts. Thank you very much Antoine for taking this!

Seems there are bugs in frame_setlineno() related to code duplications. And deduplicating the 'finally' blocks doesn't solve all of them. See comments on GitHub.

I don't like the new END_ITER instruction. This complicates (and slows down) the evaluation loop and the peepholer. This also adds a limitation on the peepholer (END_ITER can't be optimized out). We can use other way for determaining the end of the 'for' block. The argument of the FOR_ITER instruction points to the end of the 'for' block. Just scan all addresses from 0 to max_addr and count all FOR_ITER instructions and their targets in the range between min_addr and max_addr.
msg298960 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-07-24 11:47
> I don't like the new END_ITER instruction. This complicates (and slows down) the evaluation loop and the peepholer.

I don't think it slows down anything in the eval loop.  I agree it adds a bit of complexity.

> This also adds a limitation on the peepholer (END_ITER can't be optimized out).

It's an unconditional backedge, how could you optimize it?
msg298972 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-07-24 14:05
Ok, I've now gotten rid of END_ITER.
msg305684 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2017-11-06 22:41
The attached pyperformance report compares cpython:master to nascheme:unwind_stack.  If I've done the tests correctly, the unwind_stack version is slightly faster.
msg307477 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-12-03 00:39
I like this change, and think we should go ahead with it, but just wanted to note that I suspect it may make the "Badly timed signals may lead to __exit__ being skipped even after __enter__ succeeded" problem slightly easier to hit: https://bugs.python.org/issue29988

That's not a new problem though, and this change should make it easier to apply the conventional solutions (i.e. only checking for signals when execution jumps backwards within a function, as well as in function pre-or-post-ambles)
msg307478 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-12-03 00:49
At this point in the release cycle, it would be nice to get this PR applied soon so that there is more time see if there are any minor follow-on issues.
msg307479 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-12-03 00:50
While there are some good comments in line in the PR, I think it would be helpful if these changes were accompanied by some additions to https://devguide.python.org/compiler/ that explain how the various pieces of the solution work together to manage the execution stack.
msg307497 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-12-03 09:53
I would point out that Serhiy posted some comments in the followup PR which aren't addressed by Neil's followup PR.  They relate to tracing.
msg307548 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2017-12-04 11:46
I plan to resurrect my original design over the Christmas break.
The reason that I want to get back to the original design is its consistency and relative simplicity.

Duplicating the finally block for every exit from the try body might sound expensive, but it only increases the size of the bytecode by an estimated 0.4%.
 
Using statement counts as a proxy for bytecodes, I ran the code query https://lgtm.com/query/1505907426052/ over a range of open stack and other large projects. The size increase is in the range 0.26% to 0.44%
(Note that the statement counts include dependencies)

I plan to start from 
https://github.com/python/cpython/pull/2827/commits/693b9398b5fd202fa5864f9cc76fa1bc7f84f62e
as that adheres to the original design, but cleans up the code.
Antoine that is on your branch, are you OK with me appropriating it?
msg307549 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-12-04 11:50
Le 04/12/2017 à 12:46, Mark Shannon a écrit :
> 
> I plan to start from 
> https://github.com/python/cpython/pull/2827/commits/693b9398b5fd202fa5864f9cc76fa1bc7f84f62e
> as that adheres to the original design, but cleans up the code.
> Antoine that is on your branch, are you OK with me appropriating it?

Definitely.
msg307555 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-04 12:25
I started on Antoine's PR and work on different approach (https://github.com/serhiy-storchaka/cpython/pull/1) which don't duplicate the code for continue/break/return. Instead it uses some kind of subroutines. END_FINALLY expects the one of three cases:

1. NULL (or None). Normal execution thread in try/finally. Continue from the instruction following END_FINALLY.

2. An integer. This is an address of returning. Continue from the specified address.

3. An exception (6 items). Raises the specified exception.

WITH_CLEANUP_FINISH behaves similarly.

The statements continue/break/return insert the instruction CALL_FINALLY which pushes the address of the following instruction on the stack and jumps to the start of the finally (or with cleanup) block. There can be several CALL_FINALLY instructions if you need to execute several finally blocks. At the jump instruction is inserted for continue and break, and RETURN_VALUE for return.

Currently I'm trying to simplify the code.
msg307558 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-12-04 12:33
Le 04/12/2017 à 13:25, Serhiy Storchaka a écrit :
> 
> I started on Antoine's PR and work on different approach (https://github.com/serhiy-storchaka/cpython/pull/1) which don't duplicate the code for continue/break/return. Instead it uses some kind of subroutines. END_FINALLY expects the one of three cases:
> 
> 1. NULL (or None). Normal execution thread in try/finally. Continue from the instruction following END_FINALLY.
> 
> 2. An integer. This is an address of returning. Continue from the specified address.
> 
> 3. An exception (6 items). Raises the specified exception.

The problem is that makes the stack consumption of END_FINALLY variable.
How about always consuming 6 items, even when most of them are unused
padding?
msg307564 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-04 13:16
> The problem is that makes the stack consumption of END_FINALLY variable.
> How about always consuming 6 items, even when most of them are unused
> padding?

Right. Though the value of the stack increment makes sense only for the case 
1, because in other cases the next executed instruction is not the one 
following END_FINALLY, but we need to reserve the stack for the case 3, and 
counting additional "virtual" items  in PUSH_NO_EXCEPT and END_FINALLY will 
help to calculate the maximal stack consumption, even when in runtime they 
push/pop only one item. This is yet one argument for special purposed opcode 
PUSH_NO_EXCEPT (maybe I'll rename it to START_FINALLY or BEGIN_FINALLY). Other 
reason -- this will help to determine bounds of finally blocks in the 
frame.f_lineno setter.
msg307570 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2017-12-04 15:21
Serhiy,

I assume that you plan to use something like the JVM's JSR/RET instruction pair. https://docs.oracle.com/javase/specs/jvms/se9/html/jvms-6.html
Is that right?

My reasons for preferring the finally-block duplication approach is that it keeps the interpreter simpler and minimises the amount work done when no exception is raised. As demonstrated, the increase in the static size of the bytecode is negligible.
msg307572 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-12-04 15:49
Le 04/12/2017 à 16:21, Mark Shannon a écrit :
> 
> My reasons for preferring the finally-block duplication approach is that it keeps the interpreter simpler and minimises the amount work done when no exception is raised. As demonstrated, the increase in the static size of the bytecode is negligible.

Whichever approach wins at the end, I would like it to reuse the tests I
wrote to check that no stack size explosion happens when combining or
juxtaposing several control flow constructs.
msg307577 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-04 16:56
Right, this is similar to how the JSR/RET pair was used in for calling the 
finally block.

It seems the main drawback of duplicating of the finally code is not increasing 
the code size (which can be exponential in degenerate cases), but the problem 
with determining the boundaries of the finally block.

The main purpose of this issue (getting rid of "pseudo exceptions" in the 
interpreter) will be achieved in any case.

This is the first step. In the second step I'm going to get rid of dynamic 
PyTryBlock and instructions like SETUP_EXCEPT and POP_BLOCK. I think all 
necessary information can be determined at compile time and saved in an 
auxiliary structure, similarly as line numbers are saved in the packed 
co_lnotab array instead of be set by SET_LINENO instructions. Boundaries of 
finally block could be stored in the same structure. This will make try/except/
finally virtually zero-overhead.

In the third step we could generate duplicates of finally blocks. This will be 
easier if other changes already applied and tested.

Currently there is other problem with Antoine's PR (I didn't know about them 
when made a review). There is a gap between calling the __enter__ method and 
the SETUP_FINALLY instruction. If the exception is raised in the gap, the 
__exit__ method will be never called. For example:

    a = []
    with CM() as a[0]: # IndexError
        ...

I still haven't fixed this problem. Maybe the simplest solution would be to 
return the SETUP_WITH instruction which calls __enter__ and creates the finally 
block prior to saving the result of __enter__.
msg307582 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2017-12-04 17:19
On 04/12/17 16:56, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:
> 
> Right, this is similar to how the JSR/RET pair was used in for calling the
> finally block.
> 
> It seems the main drawback of duplicating of the finally code is not increasing
> the code size (which can be exponential in degenerate cases), but the problem
> with determining the boundaries of the finally block.

What exactly is the problem?
Why do you need to determine the boundaries of finally blocks?

...

> 
> This is the first step. In the second step I'm going to get rid of dynamic
> PyTryBlock and instructions like SETUP_EXCEPT and POP_BLOCK. I think all
> necessary information can be determined at compile time and saved in an
> auxiliary structure, similarly as line numbers are saved in the packed
> co_lnotab array instead of be set by SET_LINENO instructions. Boundaries of
> finally block could be stored in the same structure. This will make try/except/
> finally virtually zero-overhead.

Fine, but it isn't necessary for this issue. The overhead of try/except 
is already quite low.

> 
> Currently there is other problem with Antoine's PR (I didn't know about them
> when made a review). There is a gap between calling the __enter__ method and
> the SETUP_FINALLY instruction. 

That wasn't a problem with the original patch.
msg307584 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2017-12-04 17:41
Actually looking back at the original patch, the gap between __enter__ and SETUP_FINALLY *was* in the original patch. My mistake. 

The fix would to be restore SETUP_WITH. The complexity we really want to get rid of is at the end of the with statement not the start.
msg307585 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2017-12-04 17:53
After studying the patch and doing some reading, I prefer the finally-block duplication approach as well.  Java does it that way as well and it works for them.  It would be be interesting to compile a large body of packages and see what the increase in bytecode size actually is with the duplication.  My gut feeling is that it will not be a big deal.

There is a bug with the PR regarding the final bodies.  Exits from the final body cause the whole fblock stack to unwind, not just the blocks enclosing the final body.  Unwind looks at 'c' to get the fblock stack. I think we need to allocate fblockinfo on the C stack and then use a back pointer to enclosing block.  When you get into a final body that creates its own fblockinfo (e.g. a try/except inside the finally), the current code doesn't work.

The fact that the whole test suite passes with these issues tells me that the amount of stuff happening in final bodies must be pretty simple in most code.  

I spent a good part of Sunday trying to understand how the PR works.  It seems to me that the ceval/bytecode changes are pretty solid.  The compiler needs some bug fixes.  Further optimisations could be done at a later time.  I'm curious to see Serhiy's approach though.
msg307656 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2017-12-05 11:11
On 04/12/17 17:53, Neil Schemenauer wrote:

> There is a bug with the PR regarding the final bodies.  Exits from the final body cause the whole fblock stack to unwind, not just the blocks enclosing the final body.  Unwind looks at 'c' to get the fblock stack. I think we need to allocate fblockinfo on the C stack and then use a back pointer to enclosing block.  When you get into a final body that creates its own fblockinfo (e.g. a try/except inside the finally), the current code doesn't work.

I don't really follow you.
Could you provide a code example that will expose this error?
msg307671 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2017-12-05 17:21
There is some more explanation in the PR and sample code.

We unwind, if we hit a finally fblock, we emit code of the body of it.  If inside the block, there is another return statement, we unwind again.  That causes an infinite loop in the compiler.

The commit 0610860 is a fix, I think.  I have a cleaner version but haven't pushed it yet.  There are still some remaining bugs though, caused by "return" in the finally body.  The problem as I see it is that "unwind_exception" in ceval pushes a EXCEPT_HANDLER fblock if we are inside a SETUP_EXCEPT or SETUP_FINALLY block.  If there is a return in the finally body, the compiler doesn't know if it should POP_BLOCK or not.  At least, that is my understanding now.

I think the best way forward is to split this patch into multiple parts.  I have a simple patch that changes fblockinfo to be a singly linked list (like blockinfo).  Next, I think we could change the exception opcodes to have a fixed stack effect (don't think that requires unwind to be done by compiler).  Rather than pushing three values for an exception, could we just build a tuple and push that?  Seems simpler to me vs having ROT_FOUR. Finally, we can tackle the compiler based unwind.
msg307680 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2017-12-05 20:16
0610860 doesn't include any tests. What is it fixing?
3794016 passes the same set of tests.

Do have an actual code example that produces erroneous behaviour?
msg307687 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2017-12-05 21:24
def func():
    try:
        try:
            raise RuntimeError
        except:
            return 1
    finally:
        return 3
func()


Sorry, I've been collecting a whole slew of code snippets that cause issues with the "unwind_stack" branch.  I haven't organized them and made unit tests out of them though.  Two packages I found hairy try/except/finally code are "tox" and "gevent".
msg307827 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2017-12-07 23:04
Thanks, that example shows the essence of the problem. I understand you perfectly now.
History
Date User Action Args
2017-12-07 23:04:31Mark.Shannonsetmessages: + msg307827
2017-12-05 21:24:42naschemesetmessages: + msg307687
2017-12-05 20:16:25Mark.Shannonsetmessages: + msg307680
2017-12-05 17:21:11naschemesetmessages: + msg307671
2017-12-05 11:11:58Mark.Shannonsetmessages: + msg307656
2017-12-04 17:53:08naschemesetmessages: + msg307585
2017-12-04 17:41:52Mark.Shannonsetmessages: + msg307584
2017-12-04 17:19:48Mark.Shannonsetmessages: + msg307582
2017-12-04 16:56:25serhiy.storchakasetmessages: + msg307577
2017-12-04 15:49:17pitrousetmessages: + msg307572
2017-12-04 15:21:48Mark.Shannonsetmessages: + msg307570
2017-12-04 13:16:14serhiy.storchakasetmessages: + msg307564
2017-12-04 12:33:36pitrousetmessages: + msg307558
2017-12-04 12:25:34serhiy.storchakasetmessages: + msg307555
2017-12-04 11:50:46pitrousetmessages: + msg307549
2017-12-04 11:46:34Mark.Shannonsetmessages: + msg307548
2017-12-03 09:53:28pitrousetmessages: + msg307497
2017-12-03 00:50:37ncoghlansetmessages: + msg307479
2017-12-03 00:49:28rhettingersetmessages: + msg307478
2017-12-03 00:39:57ncoghlansetnosy: + ncoghlan
messages: + msg307477
2017-12-02 22:16:49naschemesetpull_requests: + pull_request4594
2017-11-06 22:41:04naschemesetfiles: + perf_unwind_stack.txt

messages: + msg305684
2017-11-06 19:51:38naschemesetnosy: nascheme, rhettinger, mark.dickinson, pitrou, christian.heimes, benjamin.peterson, trent, Mark.Shannon, serhiy.storchaka, Demur Rumed
2017-08-30 12:36:10serhiy.storchakasetpull_requests: - pull_request3290
2017-08-30 12:32:26serhiy.storchakasetpull_requests: + pull_request3290
2017-08-30 12:27:05serhiy.storchakasetpull_requests: - pull_request3288
2017-08-30 12:18:41serhiy.storchakasetpull_requests: + pull_request3288
2017-07-25 13:25:22rhettingersetmessages: - msg185785
2017-07-24 14:05:13pitrousetmessages: + msg298972
2017-07-24 11:47:57pitrousetmessages: + msg298960
2017-07-24 10:06:42vstinnersetnosy: - vstinner
2017-07-24 09:59:41serhiy.storchakasetmessages: + msg298949
2017-07-24 09:21:42pitrousetmessages: + msg298943
2017-07-24 08:25:51pitrousetmessages: + msg298935
2017-07-24 08:09:05pitrousetmessages: + msg298934
2017-07-23 19:49:42serhiy.storchakasetstage: needs patch -> patch review
2017-07-23 19:22:39pitrousetmessages: + msg298910
2017-07-23 19:20:32pitrousetpull_requests: + pull_request2879
2017-07-22 21:12:35pitrousetversions: + Python 3.7, - Python 3.6
2016-06-13 16:46:19naschemesetnosy: + nascheme
messages: + msg268456
2016-06-09 12:16:36serhiy.storchakasetstage: patch review -> needs patch
2016-06-05 15:03:23serhiy.storchakasetmessages: + msg267418
2016-06-05 04:00:43christian.heimessetnosy: + christian.heimes
2016-06-04 22:05:07Demur Rumedsetnosy: + Demur Rumed
messages: + msg267329
2016-06-04 21:16:35serhiy.storchakasetversions: + Python 3.6, - Python 3.5
2016-01-20 11:37:31vstinnersetnosy: + vstinner
messages: + msg258673
2014-11-16 15:06:13serhiy.storchakasetversions: + Python 3.5
messages: + msg231250

components: + Interpreter Core
type: enhancement
stage: patch review
2014-10-14 16:58:12skrahsetnosy: - skrah
2014-10-08 20:54:15pitrousetnosy: + serhiy.storchaka
2014-10-08 20:53:51pitrousetmessages: + msg228817
2014-10-08 20:53:25pitrousetmessages: + msg228816
2014-04-29 18:21:23pitrousetfiles: + unwinding.patch

messages: + msg217534
2013-04-03 21:46:09trentsetnosy: + trent
2013-04-03 21:28:58skrahsetmessages: + msg185955
2013-04-03 20:58:02mark.dickinsonsetnosy: + mark.dickinson
2013-04-03 20:57:22pitrousetmessages: + msg185954
2013-04-03 20:53:15skrahsetmessages: + msg185952
2013-04-03 20:32:27skrahsetmessages: + msg185951
2013-04-03 20:27:10benjamin.petersonsetnosy: + benjamin.peterson
2013-04-03 20:25:54Mark.Shannonsetmessages: + msg185950
2013-04-03 19:39:33Mark.Shannonsetmessages: + msg185946
2013-04-03 11:23:13skrahsetnosy: + skrah
messages: + msg185916
2013-04-03 10:20:13pitrousetnosy: + pitrou
messages: + msg185909
2013-04-02 01:21:43rhettingersetnosy: + rhettinger
messages: + msg185785
2013-04-01 16:32:02Mark.Shannonsetfiles: + b16527f84774.diff
keywords: + patch
2013-04-01 16:31:20Mark.Shannoncreate