classification
Title: speedup for / while / if with better bytecode
Type: performance
Components: Interpreter Core Versions: Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: arigo, gregory.p.smith, jcea, jyasskin, lauromoura, nnorwitz, phsilva, pitrou, tzot
Priority: Keywords: patch

Created on 2008-03-22 22:25 by pitrou, last changed 2008-06-24 13:21 by tzot.

Files
File name Uploaded Description Edit Remove
loops3.patch pitrou, 2008-03-23 15:51
loops4.patch pitrou, 2008-03-23 23:02
loops5.patch pitrou, 2008-03-26 02:09
loops6.patch pitrou, 2008-03-26 23:09
loops7.patch pitrou, 2008-03-26 23:43
predict_loop.diff arigo, 2008-03-27 16:21
loops8.patch pitrou, 2008-04-30 21:46
Messages
msg64343 (view) Author: Antoine Pitrou (pitrou) Date: 2008-03-22 22:28
This is a preliminary patch to speedup for and while loops (it will also
be able to speedup list comps and gen comps, if I get to do it).
The patch works by putting the loop condition test at the end of loop,
which allows removing one JUMP_ABSOLUTE byte code out of the critical path.

For this two new opcodes are introduced:
- FOR_ITER2 is the same as FOR_ITER except that it does an /absolute/
jump if the iterator is /not/ exhausted (when other uses of FOR_ITER are
replaced with FOR_ITER2, we can of course restore the original naming)
- JUMP_ABS_IF_TRUE /pops/ the top of the stack and does an absolute jump
if its value is true

Some micro-benchmarks:

./python -m timeit "for x in xrange(10000): pass"
Before: 1000 loops, best of 3: 782 usec per loop
After: 1000 loops, best of 3: 412 usec per loop

./python -m timeit "x=100" "while x: x -= 1"
Before: 10000 loops, best of 3: 22.1 usec per loop
After: 100000 loops, best of 3: 16.6 usec per loop

./python Tools/pybench/pybench.py -t ForLoops
Before: 365ms per round
After: 234ms per round

Also, pystone gets 5% faster (from 43300 to 45800).

Now for the less shiny things:

1. I'm having problems modifying the pure Python compiler module. For
some reasons it seems to have stricter requirements than compile.c
itself does (!); I get some assertion error in
compiler.pyassem.FlowGraph.fixupOrderHonorNext as soon as a loop gets
non-trivial.

2. Line numbers probably need to be fixed. The lnotab format may even
have to be adapted in order to accomodate non-monotonically increasing
line numbers.

Is there some interest in this patch? If yes, I'd like to have your
input on the two bullet points above :)
msg64359 (view) Author: Antoine Pitrou (pitrou) Date: 2008-03-23 14:18
This new patch includes surgery to the compiler package (especially
flowgraph trickery) in order to make it work with the new opcodes. I
think my changes are sane but since the package seems basically
untested, unmaintained and almost unused, there may be some glitches.
However, test_compiler passes.

(test_dis will need to be updated for the new opcodes, not a big deal)
msg64364 (view) Author: Antoine Pitrou (pitrou) Date: 2008-03-23 14:44
Removed latest patch, it was half-baked.
msg64367 (view) Author: Antoine Pitrou (pitrou) Date: 2008-03-23 15:51
This new patch should be ok. The block ordering algorithm in
compiler.pyassem looks entirely clean now, to the extent that the
previous "fixup" hacks have been disabled.

Attaching loops3.py.
msg64383 (view) Author: Antoine Pitrou (pitrou) Date: 2008-03-23 23:02
loops4.patch adds a mechanism to avoid blocking signal catching in empty
loops (such as "for x in it: pass" or "while x: pass"). Much of the
speedup is still retained.

./python -m timeit "for x in xrange(10000): pass"
Before: 1000 loops, best of 3: 737 usec per loop
After: 1000 loops, best of 3: 438 usec per loop

./python -m timeit "x=100" "while x: x -= 1"
Before: 10000 loops, best of 3: 21.7 usec per loop
After: 100000 loops, best of 3: 16.6 usec per loop

./python Tools/pybench/pybench.py -t ForLoops
Before: 364ms per round
After: 242ms per round
msg64506 (view) Author: Antoine Pitrou (pitrou) Date: 2008-03-25 19:19
By the way, the compiler package fix has been isolated and cleaned up as
part of #2472. Maybe it would be nice to start with that one.
msg64537 (view) Author: Antoine Pitrou (pitrou) Date: 2008-03-26 02:09
This new patch also updates the code generation for list comprehensions.
Here are some micro-benchmarks:

./python -m timeit -s "l = range(100)" "[x for x in l]"
Before: 100000 loops, best of 3: 14.9 usec per loop
After: 100000 loops, best of 3: 12.5 usec per loop

./python -m timeit -s "l = range(100)" "[x for x in l if x]"
Before: 10000 loops, best of 3: 24.1 usec per loop
After: 100000 loops, best of 3: 18.8 usec per loop

./python -m timeit -s "l = range(100)" "[x for x in l if not x]"
Before: 100000 loops, best of 3: 15.5 usec per loop
After: 100000 loops, best of 3: 11.9 usec per loop

Please note that this patch is orthogonal with Neal's patch in #2183, so
the two combined should be quite interesting for the list comprehensions
bytecode.
msg64572 (view) Author: Antoine Pitrou (pitrou) Date: 2008-03-26 23:09
This new patch completes the bytecode modifications. For/while loops as
well as list comprehensions and generator expressions are a bit faster
now. Also, as a side effect I've introduced a speed improvement for "if"
statements and expressions...

Some micro-benchmarks (completing the ones already given above):

./python Tools/pybench/pybench.py -t IfThenElse
Before: 167ms per round
After: 136ms per round

./python -m timeit -s "y=range(100)" "sum(x for x in y)"
Before: 10000 loops, best of 3: 20.4 usec per loop
After: 100000 loops, best of 3: 17.9 usec per loop

./python -m timeit -s "y=range(100)" "sum(x for x in y if x)"
Before: 10000 loops, best of 3: 28.5 usec per loop
After: 10000 loops, best of 3: 23.3 usec per loop

./python -m timeit -s "y=range(100)" "sum(x for x in y if not x)"
Before: 100000 loops, best of 3: 16.4 usec per loop
After: 100000 loops, best of 3: 12.1 usec per loop

./python -m timeit -s "x,y,z=1,2,3" "x if y else z"
Before: 1000000 loops, best of 3: 0.218 usec per loop
After: 10000000 loops, best of 3: 0.159 usec per loop

A couple of tests seem to be failing in obscure ways in the test suite,
I'll try to examine them. Most of the test suite runs fine though.
msg64573 (view) Author: Antoine Pitrou (pitrou) Date: 2008-03-26 23:43
Ok, the fix for the bizarre failures was really simple. Now the only
failing tests are in test_trace (because it makes assumptions regarding
the bytecode that aren't true anymore, I'll have to adapt the tests).
msg64577 (view) Author: Neal Norwitz (nnorwitz) Date: 2008-03-27 05:04
Antoine, I hope to look at this patch eventually.  Unfortunately, there
are too many build/test problems that need to be resolved before the
next release.  If you can help out with those, I will be able to review
this patch sooner.
msg64599 (view) Author: Armin Rigo (arigo) Date: 2008-03-27 16:21
Can you see if this simpler patch also gives speed-ups?
(predict_loop.diff)
msg64603 (view) Author: Antoine Pitrou (pitrou) Date: 2008-03-27 16:56
Armin, your patch gives a speed-up for "for" loops and comprehensions,
although a bit less. Also, it doesn't speed up "while" loops and "if"
statements at all. For some reasons it also appears to make pystone a
bit slower. Here are some micro-benchmarks:

./python -m timeit "for x in xrange(10000): pass"
Before: 1000 loops, best of 3: 758 usec per loop
After: 1000 loops, best of 3: 483 usec per loop

./python -m timeit "x=100" "while x: x -= 1"
Before: 10000 loops, best of 3: 21.8 usec per loop
After: 10000 loops, best of 3: 21.6 usec per loop

./python -m timeit -s "l = range(100)" "[x for x in l]"
Before: 100000 loops, best of 3: 14.9 usec per loop
After: 100000 loops, best of 3: 13.3 usec per loop

./python -m timeit -s "l = range(100)" "[x for x in l if x]"
Before: 10000 loops, best of 3: 23.9 usec per loop
After: 10000 loops, best of 3: 22.3 usec per loop

./python -m timeit -s "l = range(100)" "[x for x in l if not x]"
Before: 100000 loops, best of 3: 15.8 usec per loop
After: 100000 loops, best of 3: 13.9 usec per loop

./python Tools/pybench/pybench.py -t IfThenElse
Before: 164ms per round
After: 166ms per round
msg66027 (view) Author: Antoine Pitrou (pitrou) Date: 2008-04-30 21:46
Finally I had to slightly change the lnotab format to have the right
tracing semantics: the change is that line number increments are now
signed bytes rather than unsigned.

Still, there is a small change in tracing behaviour (see test_trace.py):
the for statement is traced twice in the first loop iteration, that's
because the iterator object is retrieved (GET_ITER) at the beginning of
the loop while the next iterator value is fetched (FOR_ITER) at the end
of the loop. I don't believe this is very painful.

All in all, the whole test suite now passes fine. The performance
numbers are the same as before.
msg68687 (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) Date: 2008-06-24 13:21
A pointer to previous (minor) research:

http://groups.google.com/group/comp.lang.python/browse_frm/thread/72505e3cb6d9cb1a/e486759f06ec4ee5

esp. after Terry Reedy's post
History
Date User Action Args
2008-06-24 13:21:16tzotsetnosy: + tzot
messages: + msg68687
2008-04-30 21:46:31pitrousetfiles: + loops8.patch
messages: + msg66027
title: speedup loops with better bytecode -> speedup for / while / if with better bytecode
2008-04-21 16:13:34jceasetnosy: + jcea
2008-03-28 04:22:33jyasskinsetnosy: + jyasskin
2008-03-27 20:32:36lauromourasetnosy: + lauromoura
2008-03-27 16:56:14pitrousetmessages: + msg64603
2008-03-27 16:21:48arigosetfiles: + predict_loop.diff
nosy: + arigo
messages: + msg64599
2008-03-27 07:03:19phsilvasetnosy: + phsilva
2008-03-27 05:04:32nnorwitzsetnosy: + nnorwitz
messages: + msg64577
2008-03-26 23:43:06pitrousetfiles: + loops7.patch
messages: + msg64573
2008-03-26 23:09:31pitrousetfiles: + loops6.patch
messages: + msg64572
2008-03-26 02:09:51pitrousetfiles: + loops5.patch
messages: + msg64537
2008-03-25 19:19:25pitrousetmessages: + msg64506
2008-03-25 18:21:20fdrakesetmessages: - msg64344
2008-03-25 18:21:12fdrakesetmessages: - msg64342
2008-03-25 17:59:21gregory.p.smithsetnosy: + gregory.p.smith
2008-03-23 23:02:41pitrousetfiles: + loops4.patch
messages: + msg64383
2008-03-23 22:07:37pitrousetfiles: - loops2.patch
2008-03-23 15:51:03pitrousetfiles: + loops3.patch
messages: + msg64367
2008-03-23 14:44:38pitrousetmessages: + msg64364
2008-03-23 14:44:14pitrousetfiles: - loops3.patch
2008-03-23 14:18:12pitrousetfiles: + loops3.patch
messages: + msg64359
2008-03-22 23:36:41pitrousetfiles: - loops.patch
2008-03-22 23:36:36pitrousetfiles: + loops2.patch
2008-03-22 22:45:04pitrousetmessages: + msg64344
2008-03-22 22:28:28pitrousetmessages: + msg64343
2008-03-22 22:25:04pitroucreate