msg57141 - (view) |
Author: Paul Pogonyshev (_doublep) |
Date: 2007-11-05 22:09 |
This patch improves bytecode output, by removing unreachable code. It
doesn't target special cases, as now, but provides a generic implementation.
|
msg62953 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2008-02-25 00:04 |
I am not sure what this patch would accomplish. I tried
$ cat t.py
def f():
return 1
1+2
from dis import dis
print dis(f)
Both with and without patch I get
$ ./python.exe -O t.py
2 0 LOAD_CONST 1 (1)
3 RETURN_VALUE
3 4 LOAD_CONST 1 (1)
7 LOAD_CONST 2 (2)
10 BINARY_ADD
11 POP_TOP
None
I am sure I am missing something, but it is hard to tell what without
any use cases provided.
|
msg62964 - (view) |
Author: Neal Norwitz (nnorwitz) *  |
Date: 2008-02-25 02:30 |
It would be great to see test cases with this change. That would help
answer Alexander's question too.
|
msg62992 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2008-02-25 19:09 |
I figured that out:
>>> def g():
... raise RuntimeError
...
Before the patch:
>>> dis(g)
2 0 LOAD_GLOBAL 0 (RuntimeError)
3 RAISE_VARARGS 1
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
After the patch:
>>> dis(g)
2 0 LOAD_GLOBAL 0 (RuntimeError)
3 RAISE_VARARGS 1
Looks reasonable to me. Paul, do you need help with unit tests?
|
msg62999 - (view) |
Author: Paul Pogonyshev (_doublep) |
Date: 2008-02-25 20:25 |
Yes, help with unit tests would be appreciated. Especially since it is
not supposed to fix anything, so I'm not sure what unit tests should be
like...
BTW, trying to understand why the patch didn't remove unreachable code
in your first example, I noticed this (both cases are with patch):
def f1():
return 1
x()
disassembly:
3 0 LOAD_CONST 1 (1)
3 RETURN_VALUE
4 4 LOAD_GLOBAL 0 (x)
7 CALL_FUNCTION 0
10 POP_TOP
def f2():
return 1
x()
return 2
disassembly:
3 0 LOAD_CONST 1 (1)
3 RETURN_VALUE
Looking a bit further, I noticed this:
if (codestr[codelen-1] != RETURN_VALUE)
goto exitUnchanged;
So, the patch really can't help with your first example, because peephol
optimizer just bails out before real action begins.
|
msg63000 - (view) |
Author: Paul Pogonyshev (_doublep) |
Date: 2008-02-25 20:31 |
Speaking of which, I propose that codestr[] array is made one byte
longer and RETURN_VALUE opcode wrote in that extra byte. It will be
removed by this patch anyway (if it is accepted), but we'll remove a way
to accidentally disable peephole optimizer.
I mean, it would cost almost nothing, yet will prevent making some
functions non-optimized. E.g. like this:
def foo(x):
if x >= 0:
return x * 2
raise ValueError
|
msg63004 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-02-25 21:32 |
>Yes, help with unit tests would be appreciated. Especially since it is
>not supposed to fix anything, so I'm not sure what unit tests should be
>like...
Unit tests are just for bugfixes. They let us make sure Python is doing
what we want it to do in a given case. Your unit test will probably have
functions where you optimization should take effect and assert that it
does. For starters, take a look at Lib/test/test_peephole.py
|
msg63005 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2008-02-25 21:50 |
On Mon, Feb 25, 2008 at 1:32 PM, Benjamin Peterson
<report@bugs.python.org> wrote:
> Unit tests are just for bugfixes.
I presume you meant "are *not* just for bugfixes".
|
msg63006 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-02-25 22:05 |
> I presume you meant "are *not* just for bugfixes".
(Sigh) Yes, of course.
|
msg63017 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2008-02-25 23:37 |
Attached patch adds test_elim_unreachable() unit test. Last two
assertions should fail with unpatched python.
I am still trying to convince myself that the transformation are
correct.
> I propose that codestr[] array is made one byte
> longer and RETURN_VALUE opcode wrote in that extra byte.
I don't think that will be correct. Did you consider the following
comment?
/* Verify that RETURN_VALUE terminates the codestring. This
allows
the various transformation patterns to look ahead several
instructions without additional checks to make sure they are
not
looking beyond the end of the code string.
*/
if (codestr[codelen-1] != RETURN_VALUE)
goto exitUnchanged;
|
msg63019 - (view) |
Author: Paul Pogonyshev (_doublep) |
Date: 2008-02-26 00:00 |
Thanks for writing the test.
Yes, I did read the comment. As I understood it, RETURN_VALUE is needed
only so that various optimization can assume codestr[] cannot suddenly
end without one. E.g. if you match for BINARY_ADD, you can safely check
the next command: if BINARY_ADD matched, there is a _guaranteed_ next
command, not an out-of-array failure.
Such proposed fake RETURN_VALUE _must_ be unreachable, so it must not be
problematic at all. If it was reachable, real codestr[] would end in
reachable non-return command, which must not happen during compilation.
I dunno, maybe interpreter guards against execution point falling of
the bytecode string, but in any case this must not happen in
non-corrupted files generated by the bytecode compiler.
|
msg63020 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2008-02-26 01:44 |
Paul,
You are right. I misunderstood that comment myself. I've tried the
attached modification to your patch (unreachable-code-1.diff) and it
passes all tests while fixing msg62953 problem:
$ cat t.py
def f():
return 1
1+2
from dis import dis
dis(f)
$ ./python.exe t.py
6 0 LOAD_CONST 0 (None)
3 RETURN_VALUE
|
msg63029 - (view) |
Author: Neal Norwitz (nnorwitz) *  |
Date: 2008-02-26 05:58 |
Can you add more tests? It seems that almost all the examples given in
this thread are good examples. Also something like:
while 1:
if cond: return 1
return 2
return 3
There are a bunch more cases that could be added for where the code
should be optimized or might be optimized wrong that.
The #if 0 code should be removed in the patch. Also, since
CONTINUE_LOOP is removed in this patch, it would be good to explicitly
add a test for that.
This patch looks good to me, but I'll let Raymond decide. Did all the
entire test suite run? Note, you must remove all the .pyc files to test
this patch or change the MAGIC number in Python/import.c.
|
msg63040 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2008-02-26 13:10 |
I have added more tests as Neal suggested. See unreachable-code-with-
tests.diff for a combined patch. I've rerun all tests after removing
.pyc files and also with -O option (ensuring no existing .pyo files as
well). All works.
I've removed #if 0 block and changed "Verify" to "Ensure" in the
RETURN_VALUE comment. Paul, maybe you and add an explanation why the
added RETURN_VALUE will always be eliminated. I am not sure when python
produces code with no terminating RETURN_VALUE in the first place.
|
msg63043 - (view) |
Author: Paul Pogonyshev (_doublep) |
Date: 2008-02-26 14:29 |
Well, I cannot guarantee it will _always_ be eliminated, since I too
don't really know when generated bytecode can (currently) end in
non-return. However, if before it ended in non-return, that non-return
must have been unreachable, obviously (else code flow would fall off of
the valid bytecode string). So, there is pretty good chance that this
patch will remove both the unreachable code and the fake return byte.
In the worst case, we increase bytecode length by 1 byte, but we always
enable peephole optimizer, which can be expected to give a much more
significant gain.
As your unit tests suggest, the patch doesn't always remove unreachable
code, because some opcodes after which this can be done are already
handled differently. Also, there might in theory be some false block
boundaries (e.g. a jump from unreachable code to another piece of such
code).
BTW, I think that just removing #if 0'd block is not correct. I propose
to convert it to a comment then, just to explain that there are more
opcodes after which we can expect unreachable code and don't check only
because it'd give duplicate case labels. Maybe some goto magic can
handle them, don't have sources here.
|
msg63045 - (view) |
Author: Paul Pogonyshev (_doublep) |
Date: 2008-02-26 14:38 |
Actually, it is even better. Since you don't increment codelen, that
extra RETURN_VALUE is "virtual" in that it acts as a sentinel, but will
not appear in the resulting bytecode. You can test this by backing out
the patch and disassembling something. For this reason, if() before
writing the RETURN_VALUE is not needed.
|
msg63046 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2008-02-26 15:25 |
> Since you don't increment codelen ..
Good catch!
> For this reason, if() before
> writing the RETURN_VALUE is not needed.
In this case it will be clearer to use STOP_CODE instead of RETURN_VALUE
as a sentinel.
|
msg68709 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-06-24 23:12 |
Recommend rejecting. Too much work for zero performance payoff.
|
msg68710 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2008-06-24 23:20 |
Rejecting. Too much risk as well, based on experience in the past with
supposedly harmless optimizations.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:27 | admin | set | github: 45735 |
2008-06-24 23:20:57 | gvanrossum | set | status: open -> closed resolution: rejected messages:
+ msg68710 |
2008-06-24 23:12:06 | rhettinger | set | assignee: rhettinger -> messages:
+ msg68709 |
2008-02-26 15:25:19 | belopolsky | set | messages:
+ msg63046 |
2008-02-26 14:38:20 | _doublep | set | messages:
+ msg63045 |
2008-02-26 14:29:52 | _doublep | set | messages:
+ msg63043 |
2008-02-26 13:10:04 | belopolsky | set | files:
+ unreachable-code-with-tests.diff messages:
+ msg63040 |
2008-02-26 05:58:47 | nnorwitz | set | messages:
+ msg63029 |
2008-02-26 01:44:37 | belopolsky | set | files:
+ unreachable-code-1.diff messages:
+ msg63020 |
2008-02-26 00:00:07 | _doublep | set | messages:
+ msg63019 |
2008-02-25 23:37:28 | belopolsky | set | files:
+ test_peepholer.diff messages:
+ msg63017 |
2008-02-25 22:56:42 | rhettinger | set | assignee: rhettinger nosy:
+ rhettinger |
2008-02-25 22:05:48 | benjamin.peterson | set | messages:
+ msg63006 |
2008-02-25 21:50:15 | gvanrossum | set | messages:
+ msg63005 |
2008-02-25 21:32:07 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages:
+ msg63004 |
2008-02-25 20:31:16 | _doublep | set | messages:
+ msg63000 |
2008-02-25 20:25:49 | _doublep | set | messages:
+ msg62999 |
2008-02-25 19:09:33 | belopolsky | set | messages:
+ msg62992 |
2008-02-25 02:30:30 | nnorwitz | set | nosy:
+ nnorwitz messages:
+ msg62964 |
2008-02-25 00:04:25 | belopolsky | set | nosy:
+ belopolsky messages:
+ msg62953 |
2007-11-06 17:40:44 | gvanrossum | set | nosy:
+ gvanrossum |
2007-11-06 04:50:44 | loewis | set | keywords:
+ patch |
2007-11-05 22:09:01 | _doublep | create | |