This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author arnau
Recipients arnau, nascheme, pitrou
Date 2014-02-13.03:17:05
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1392261429.58.0.605484301904.issue20613@psf.upfronthosting.co.za>
In-reply-to
Content
I know that compiler module is deprecated since 2.6, but I found a regression introduced in version 2.7 by the following cleanup commit (replacing pyassem.py modified in this commit by the one just before does not trigger this bug):

http://hg.python.org/releasing/2.7.6/rev/42faa8054c3d

Actually, there is nothing wrong when executing the generated bytecode, it's just that tracebacks and anything that display source code from the generated bytecode object gets sometimes completely wrong when there is if/elif/else because of compiler.pyassem.order_blocks().

This issue was found when noticing that tracebacks in Zope Python Script (using RestrictedPython which in turn uses compiler module) were displaying completely wrong line information and stepping with pdb into the source code was displaying wrong source code and line information too.

After investigating, I found out that the problem is actually in compiler module and wrote a small program triggering the bug (I know that some variables are undefined but it does not matter here):

if if1:
  if1_entered = 1

if if2:
  if subif1:
    subif1_entered = 1
  elif subif2:
    subif2_entered = 1

  subif_ended = 1

Using compiler module to generate the bytecode with the program attached returns two different bytecodes:

$ python2.7 compiler_wrong_ordering_block_test.py 
====================================
  1           0 LOAD_NAME                0 (if1)
              3 POP_JUMP_IF_FALSE       30

  2           6 LOAD_CONST               1 (1)
              9 STORE_NAME               1 (if1_entered)
             12 JUMP_FORWARD            15 (to 30)

  7     >>   15 LOAD_NAME                2 (subif2)
             18 POP_JUMP_IF_FALSE       51

  8          21 LOAD_CONST               1 (1)
             24 STORE_NAME               3 (subif2_entered)
             27 JUMP_FORWARD            21 (to 51)
        >>   30 LOAD_NAME                4 (if2)
             33 POP_JUMP_IF_FALSE       60
             36 LOAD_NAME                5 (subif1)
             39 POP_JUMP_IF_FALSE       15
             42 LOAD_CONST               1 (1)
             45 STORE_NAME               6 (subif1_entered)
             48 JUMP_FORWARD             0 (to 51)

 10     >>   51 LOAD_CONST               1 (1)
             54 STORE_NAME               7 (subif_ended)
             57 JUMP_FORWARD             0 (to 60)
        >>   60 LOAD_CONST               0 (None)
             63 RETURN_VALUE        
====================================
  1           0 LOAD_NAME                0 (if1)
              3 POP_JUMP_IF_FALSE       15

  2           6 LOAD_CONST               1 (1)
              9 STORE_NAME               1 (if1_entered)
             12 JUMP_FORWARD             0 (to 15)

  4     >>   15 LOAD_NAME                2 (if2)
             18 POP_JUMP_IF_FALSE       60

  5          21 LOAD_NAME                3 (subif1)
             24 POP_JUMP_IF_FALSE       36

  6          27 LOAD_CONST               1 (1)
             30 STORE_NAME               4 (subif1_entered)
             33 JUMP_FORWARD            15 (to 51)

  7     >>   36 LOAD_NAME                5 (subif2)
             39 POP_JUMP_IF_FALSE       51

  8          42 LOAD_CONST               1 (1)
             45 STORE_NAME               6 (subif2_entered)
             48 JUMP_FORWARD             0 (to 51)

 10     >>   51 LOAD_CONST               1 (1)
             54 STORE_NAME               7 (subif_ended)
             57 JUMP_FORWARD             0 (to 60)
        >>   60 LOAD_CONST               0 (None)
             63 RETURN_VALUE        

As you can see from the output above, the first generated bytecode gets all the line numbers (lnotab) messed up because 'subif2' block (and its children too) is emitted *before* its 'dominator' block, namely 'if2'.

Now, compiler.pyassem.order_blocks() arranges blocks to be emitted within a 'dominators' dict where the key is a block and the value a list of blocks which must be emitted *before* that one. However, for at least 'if/elif' blocks (created by compiler.pycodegen.CodeGenerator().visitIf()), the 'elif' block (subif2) does not have prev/next (where prev and next define respectively the parent and children blocks) sets to its parent (if2) as it is only set for the first 'if' (subif1) and furthermore for some reasons I don't know the code asserts that next list length must be equal to 1 anyway (anyone knows why BTW?)...

Thus, when creating 'dominators' dict using next/prev block attributes, the 'elif' block (subif2) ends up being considered kind of an orphan block because next/prev is not set and thus could be emitted anytime... Enabling debug mode in compiler.pyassem and adding a print call in order_blocks() to display 'dominators' dict confirms this (eg 'subif2' block (<block id=9>) dominators value list is empty):

=> Blocks:
        ('SET_LINENO', 0)
        ('SET_LINENO', 1)
        ('LOAD_NAME', 'if1')
        ('POP_JUMP_IF_FALSE', <block id=3>)
end <block id=0>
    next [<block id=4>]
    prev []
    [<block id=3>, <block id=4>]
<block id=4>
        ('SET_LINENO', 2)
        ('LOAD_CONST', 1)
        ('STORE_NAME', 'if1_entered')
        ('JUMP_FORWARD', <block id=2>)
end <block id=4>
    next []
    prev [<block id=0>]
    [<block id=2>]
<block id=3>
end <block id=3>
    next [<block id=2>]
    prev []
    [<block id=2>]
<block id=2>
        ('SET_LINENO', 4)
        ('LOAD_NAME', 'if2')
        ('POP_JUMP_IF_FALSE', <block id=6>)
end <block id=2>
    next [<block id=7>]
    prev [<block id=3>]
    [<block id=6>, <block id=7>]
end <block id=7>
    next [<block id=10>]
    prev [<block id=2>]
    [<block id=9>, <block id=10>]
<block id=10>
        ('SET_LINENO', 6)
        ('LOAD_CONST', 1)
        ('STORE_NAME', 'subif1_entered')
        ('JUMP_FORWARD', <block id=8>)
end <block id=10>
    next []
    prev [<block id=7>]
    [<block id=8>]
<block id=9>
        ('SET_LINENO', 7)
        ('LOAD_NAME', 'subif2')
        ('POP_JUMP_IF_FALSE', <block id=11>)
end <block id=9>
    next [<block id=12>]
    prev []
    [<block id=11>, <block id=12>]
<block id=12>
        ('SET_LINENO', 8)
        ('LOAD_CONST', 1)
        ('STORE_NAME', 'subif2_entered')
        ('JUMP_FORWARD', <block id=8>)
end <block id=12>
    next []
    prev [<block id=9>]
    [<block id=8>]
<block id=11>
end <block id=11>
    next [<block id=8>]
    prev []
    [<block id=8>]
<block id=8>
        ('SET_LINENO', 10)
        ('LOAD_CONST', 1)
        ('STORE_NAME', 'subif_ended')
        ('JUMP_FORWARD', <block id=5>)
end <block id=8>
    next []
    prev [<block id=11>]
    [<block id=5>]
<block id=6>
end <block id=6>
    next [<block id=5>]
    prev []
    [<block id=5>]
<block id=5>
        ('LOAD_CONST', None)
        ('RETURN_VALUE',)

=> Dominators:
{<block id=0>: set([]),
 <block id=2>: set([<block id=3>, <block id=4>]),
 <block id=3>: set([<block id=4>]),
 <block id=4>: set([<block id=0>]),
 <block id=5>: set([<block id=6>, <block id=8>]),
 <block id=6>: set([<block id=8>]),
 <block id=7>: set([<block id=2>]),
 <block id=8>: set([<block id=10>, <block id=11>, <block id=12>]),
 <block id=9>: set([]),
 <block id=10>: set([<block id=7>]),
 <block id=11>: set([<block id=10>, <block id=12>]),
 <block id=12>: set([<block id=9>])}

I have tried to write a patch for this issue but to be honest I could not come up with something nice, any idea about implementing a proepr fix is welcome?
History
Date User Action Args
2014-02-13 03:17:09arnausetrecipients: + arnau, nascheme, pitrou
2014-02-13 03:17:09arnausetmessageid: <1392261429.58.0.605484301904.issue20613@psf.upfronthosting.co.za>
2014-02-13 03:17:09arnaulinkissue20613 messages
2014-02-13 03:17:06arnaucreate