Issue2459
Created on 2008-03-22 22:25 by pitrou, last changed 2008-06-24 13:21 by tzot.
| msg64343 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-03-22 22:28 |
|
This is a preliminary patch to speedup for and while loops (it will also
be able to speedup list comps and gen comps, if I get to do it).
The patch works by putting the loop condition test at the end of loop,
which allows removing one JUMP_ABSOLUTE byte code out of the critical path.
For this two new opcodes are introduced:
- FOR_ITER2 is the same as FOR_ITER except that it does an /absolute/
jump if the iterator is /not/ exhausted (when other uses of FOR_ITER are
replaced with FOR_ITER2, we can of course restore the original naming)
- JUMP_ABS_IF_TRUE /pops/ the top of the stack and does an absolute jump
if its value is true
Some micro-benchmarks:
./python -m timeit "for x in xrange(10000): pass"
Before: 1000 loops, best of 3: 782 usec per loop
After: 1000 loops, best of 3: 412 usec per loop
./python -m timeit "x=100" "while x: x -= 1"
Before: 10000 loops, best of 3: 22.1 usec per loop
After: 100000 loops, best of 3: 16.6 usec per loop
./python Tools/pybench/pybench.py -t ForLoops
Before: 365ms per round
After: 234ms per round
Also, pystone gets 5% faster (from 43300 to 45800).
Now for the less shiny things:
1. I'm having problems modifying the pure Python compiler module. For
some reasons it seems to have stricter requirements than compile.c
itself does (!); I get some assertion error in
compiler.pyassem.FlowGraph.fixupOrderHonorNext as soon as a loop gets
non-trivial.
2. Line numbers probably need to be fixed. The lnotab format may even
have to be adapted in order to accomodate non-monotonically increasing
line numbers.
Is there some interest in this patch? If yes, I'd like to have your
input on the two bullet points above :)
|
| msg64359 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-03-23 14:18 |
|
This new patch includes surgery to the compiler package (especially
flowgraph trickery) in order to make it work with the new opcodes. I
think my changes are sane but since the package seems basically
untested, unmaintained and almost unused, there may be some glitches.
However, test_compiler passes.
(test_dis will need to be updated for the new opcodes, not a big deal)
|
| msg64364 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-03-23 14:44 |
|
Removed latest patch, it was half-baked.
|
| msg64367 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-03-23 15:51 |
|
This new patch should be ok. The block ordering algorithm in
compiler.pyassem looks entirely clean now, to the extent that the
previous "fixup" hacks have been disabled.
Attaching loops3.py.
|
| msg64383 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-03-23 23:02 |
|
loops4.patch adds a mechanism to avoid blocking signal catching in empty
loops (such as "for x in it: pass" or "while x: pass"). Much of the
speedup is still retained.
./python -m timeit "for x in xrange(10000): pass"
Before: 1000 loops, best of 3: 737 usec per loop
After: 1000 loops, best of 3: 438 usec per loop
./python -m timeit "x=100" "while x: x -= 1"
Before: 10000 loops, best of 3: 21.7 usec per loop
After: 100000 loops, best of 3: 16.6 usec per loop
./python Tools/pybench/pybench.py -t ForLoops
Before: 364ms per round
After: 242ms per round
|
| msg64506 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-03-25 19:19 |
|
By the way, the compiler package fix has been isolated and cleaned up as
part of #2472. Maybe it would be nice to start with that one.
|
| msg64537 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-03-26 02:09 |
|
This new patch also updates the code generation for list comprehensions.
Here are some micro-benchmarks:
./python -m timeit -s "l = range(100)" "[x for x in l]"
Before: 100000 loops, best of 3: 14.9 usec per loop
After: 100000 loops, best of 3: 12.5 usec per loop
./python -m timeit -s "l = range(100)" "[x for x in l if x]"
Before: 10000 loops, best of 3: 24.1 usec per loop
After: 100000 loops, best of 3: 18.8 usec per loop
./python -m timeit -s "l = range(100)" "[x for x in l if not x]"
Before: 100000 loops, best of 3: 15.5 usec per loop
After: 100000 loops, best of 3: 11.9 usec per loop
Please note that this patch is orthogonal with Neal's patch in #2183, so
the two combined should be quite interesting for the list comprehensions
bytecode.
|
| msg64572 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-03-26 23:09 |
|
This new patch completes the bytecode modifications. For/while loops as
well as list comprehensions and generator expressions are a bit faster
now. Also, as a side effect I've introduced a speed improvement for "if"
statements and expressions...
Some micro-benchmarks (completing the ones already given above):
./python Tools/pybench/pybench.py -t IfThenElse
Before: 167ms per round
After: 136ms per round
./python -m timeit -s "y=range(100)" "sum(x for x in y)"
Before: 10000 loops, best of 3: 20.4 usec per loop
After: 100000 loops, best of 3: 17.9 usec per loop
./python -m timeit -s "y=range(100)" "sum(x for x in y if x)"
Before: 10000 loops, best of 3: 28.5 usec per loop
After: 10000 loops, best of 3: 23.3 usec per loop
./python -m timeit -s "y=range(100)" "sum(x for x in y if not x)"
Before: 100000 loops, best of 3: 16.4 usec per loop
After: 100000 loops, best of 3: 12.1 usec per loop
./python -m timeit -s "x,y,z=1,2,3" "x if y else z"
Before: 1000000 loops, best of 3: 0.218 usec per loop
After: 10000000 loops, best of 3: 0.159 usec per loop
A couple of tests seem to be failing in obscure ways in the test suite,
I'll try to examine them. Most of the test suite runs fine though.
|
| msg64573 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-03-26 23:43 |
|
Ok, the fix for the bizarre failures was really simple. Now the only
failing tests are in test_trace (because it makes assumptions regarding
the bytecode that aren't true anymore, I'll have to adapt the tests).
|
| msg64577 (view) |
Author: Neal Norwitz (nnorwitz) |
Date: 2008-03-27 05:04 |
|
Antoine, I hope to look at this patch eventually. Unfortunately, there
are too many build/test problems that need to be resolved before the
next release. If you can help out with those, I will be able to review
this patch sooner.
|
| msg64599 (view) |
Author: Armin Rigo (arigo) |
Date: 2008-03-27 16:21 |
|
Can you see if this simpler patch also gives speed-ups?
(predict_loop.diff)
|
| msg64603 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-03-27 16:56 |
|
Armin, your patch gives a speed-up for "for" loops and comprehensions,
although a bit less. Also, it doesn't speed up "while" loops and "if"
statements at all. For some reasons it also appears to make pystone a
bit slower. Here are some micro-benchmarks:
./python -m timeit "for x in xrange(10000): pass"
Before: 1000 loops, best of 3: 758 usec per loop
After: 1000 loops, best of 3: 483 usec per loop
./python -m timeit "x=100" "while x: x -= 1"
Before: 10000 loops, best of 3: 21.8 usec per loop
After: 10000 loops, best of 3: 21.6 usec per loop
./python -m timeit -s "l = range(100)" "[x for x in l]"
Before: 100000 loops, best of 3: 14.9 usec per loop
After: 100000 loops, best of 3: 13.3 usec per loop
./python -m timeit -s "l = range(100)" "[x for x in l if x]"
Before: 10000 loops, best of 3: 23.9 usec per loop
After: 10000 loops, best of 3: 22.3 usec per loop
./python -m timeit -s "l = range(100)" "[x for x in l if not x]"
Before: 100000 loops, best of 3: 15.8 usec per loop
After: 100000 loops, best of 3: 13.9 usec per loop
./python Tools/pybench/pybench.py -t IfThenElse
Before: 164ms per round
After: 166ms per round
|
| msg66027 (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-04-30 21:46 |
|
Finally I had to slightly change the lnotab format to have the right
tracing semantics: the change is that line number increments are now
signed bytes rather than unsigned.
Still, there is a small change in tracing behaviour (see test_trace.py):
the for statement is traced twice in the first loop iteration, that's
because the iterator object is retrieved (GET_ITER) at the beginning of
the loop while the next iterator value is fetched (FOR_ITER) at the end
of the loop. I don't believe this is very painful.
All in all, the whole test suite now passes fine. The performance
numbers are the same as before.
|
| msg68687 (view) |
Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) |
Date: 2008-06-24 13:21 |
|
A pointer to previous (minor) research:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/72505e3cb6d9cb1a/e486759f06ec4ee5
esp. after Terry Reedy's post
|
|
| Date |
User |
Action |
Args |
| 2008-06-24 13:21:16 | tzot | set | nosy:
+ tzot messages:
+ msg68687 |
| 2008-04-30 21:46:31 | pitrou | set | files:
+ loops8.patch messages:
+ msg66027 title: speedup loops with better bytecode -> speedup for / while / if with better bytecode |
| 2008-04-21 16:13:34 | jcea | set | nosy:
+ jcea |
| 2008-03-28 04:22:33 | jyasskin | set | nosy:
+ jyasskin |
| 2008-03-27 20:32:36 | lauromoura | set | nosy:
+ lauromoura |
| 2008-03-27 16:56:14 | pitrou | set | messages:
+ msg64603 |
| 2008-03-27 16:21:48 | arigo | set | files:
+ predict_loop.diff nosy:
+ arigo messages:
+ msg64599 |
| 2008-03-27 07:03:19 | phsilva | set | nosy:
+ phsilva |
| 2008-03-27 05:04:32 | nnorwitz | set | nosy:
+ nnorwitz messages:
+ msg64577 |
| 2008-03-26 23:43:06 | pitrou | set | files:
+ loops7.patch messages:
+ msg64573 |
| 2008-03-26 23:09:31 | pitrou | set | files:
+ loops6.patch messages:
+ msg64572 |
| 2008-03-26 02:09:51 | pitrou | set | files:
+ loops5.patch messages:
+ msg64537 |
| 2008-03-25 19:19:25 | pitrou | set | messages:
+ msg64506 |
| 2008-03-25 18:21:20 | fdrake | set | messages:
- msg64344 |
| 2008-03-25 18:21:12 | fdrake | set | messages:
- msg64342 |
| 2008-03-25 17:59:21 | gregory.p.smith | set | nosy:
+ gregory.p.smith |
| 2008-03-23 23:02:41 | pitrou | set | files:
+ loops4.patch messages:
+ msg64383 |
| 2008-03-23 22:07:37 | pitrou | set | files:
- loops2.patch |
| 2008-03-23 15:51:03 | pitrou | set | files:
+ loops3.patch messages:
+ msg64367 |
| 2008-03-23 14:44:38 | pitrou | set | messages:
+ msg64364 |
| 2008-03-23 14:44:14 | pitrou | set | files:
- loops3.patch |
| 2008-03-23 14:18:12 | pitrou | set | files:
+ loops3.patch messages:
+ msg64359 |
| 2008-03-22 23:36:41 | pitrou | set | files:
- loops.patch |
| 2008-03-22 23:36:36 | pitrou | set | files:
+ loops2.patch |
| 2008-03-22 22:45:04 | pitrou | set | messages:
+ msg64344 |
| 2008-03-22 22:28:28 | pitrou | set | messages:
+ msg64343 |
| 2008-03-22 22:25:04 | pitrou | create | |
|