Issue 4715: optimize bytecode for conditional branches

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/48965

classification

Title:	optimize bytecode for conditional branches
Type:	performance	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.1, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	blaisorblade, collinwinter, esam, jcea, jyasskin, pitrou, rhettinger
Priority:	normal	Keywords:	patch

Created on 2008-12-21 22:58 by pitrou, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
condbranches-plus.patch	pitrou, 2008-12-22 01:49	Aggregate patch for POP_JUMP_IF_TRUE, POP_JUMP_IF_FALSE, POP_OR_JUMP, JUMP_OR_POP
pybench.txt	pitrou, 2009-01-13 17:27
condbranches.patch	pitrou, 2009-01-13 23:35	Original patch for POP_JUMP_IF_TRUE, POP_JUMP_IF_FALSE
condbranches-plus2.patch	jyasskin, 2009-02-25 00:51	Updated patch with POP_JUMP_IF_TRUE, POP_JUMP_IF_FALSE, JUMP_IF_TRUE_OR_POP, JUMP_IF_FALSE_OR_POP
trunk-opt-cond-jump.patch	jyasskin, 2009-02-28 01:41	Backport to trunk

Messages (21)
msg78159 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-12-21 22:58
This patch optimizes the bytecode for conditional branches. For example, the list comprehension "[x for x in l if not x]" produced the following bytecode: 1 0 BUILD_LIST 0 3 LOAD_FAST 0 (.0) >> 6 FOR_ITER 23 (to 32) 9 STORE_FAST 1 (x) 12 LOAD_FAST 1 (x) 15 JUMP_IF_TRUE 10 (to 28) 18 POP_TOP 19 LOAD_FAST 1 (x) 22 LIST_APPEND 2 25 JUMP_ABSOLUTE 6 >> 28 POP_TOP 29 JUMP_ABSOLUTE 6 >> 32 RETURN_VALUE but after the patch it produces the following bytecode: 1 0 BUILD_LIST 0 3 LOAD_FAST 0 (.0) >> 6 FOR_ITER 18 (to 27) 9 STORE_FAST 1 (x) 12 LOAD_FAST 1 (x) 15 POP_JUMP_IF_TRUE 6 18 LOAD_FAST 1 (x) 21 LIST_APPEND 2 24 JUMP_ABSOLUTE 6 >> 27 RETURN_VALUE Notice that not only the code is shorter, but the conditional jump (POP_JUMP_IF_TRUE) jumps right to the start of the loop instead of going through the JUMP_ABSOLUTE at the end. "continue" statements are helped similarly. Furthermore, the old jump opcodes (JUMP_IF_FALSE, JUMP_IF_TRUE) could profitably be replaced by two new opcodes: - one which jumps if true and pops otherwise - one which jumps if false and pops otherwise
msg78165 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-12-22 01:49
Here is an optional patch which provides the two opcodes I was talking about (I've called them POP_OR_JUMP and JUMP_OR_POP). Together with a bit of work on the peepholer they make the bytecode expression of boolean calculations very concise. A somewhat contrived example: "f = lambda x,y,z,v: (z if x else y) or v" Without the patch: 1 0 LOAD_FAST 0 (x) 3 JUMP_IF_FALSE 7 (to 13) 6 POP_TOP 7 LOAD_FAST 2 (z) 10 JUMP_FORWARD 4 (to 17) >> 13 POP_TOP 14 LOAD_FAST 1 (y) >> 17 JUMP_IF_TRUE 4 (to 24) 20 POP_TOP 21 LOAD_FAST 3 (v) >> 24 RETURN_VALUE With the patch: 1 0 LOAD_FAST 0 (x) 3 POP_JUMP_IF_FALSE 12 6 LOAD_FAST 2 (z) 9 JUMP_FORWARD 3 (to 15) >> 12 LOAD_FAST 1 (y) >> 15 JUMP_OR_POP 21 18 LOAD_FAST 3 (v) >> 21 RETURN_VALUE
msg79747 - (view)	Author: Jeffrey Yasskin (jyasskin) *	Date: 2009-01-13 15:46
Those look nice, although I need to look at the patches in more detail. What speedup do they give you?
msg79750 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-01-13 17:27
pybench runtimes (attached) are almost the same. The big win is on list comprehensions with an "if" clause.
msg79790 - (view)	Author: Collin Winter (collinwinter) *	Date: 2009-01-13 23:20
I've backported condbranches-plus.patch to trunk, and I'm getting these results: PyBench: 1.84-2.21% faster 2to3: 3.83% faster Spitfire: 6.13-6.23% faster PyBench was run with -w=1; 2to3 is translating its entire source directory five times; Spitfire is measured by the "Spitfire -O4" line from tests/perf/bigtable.py, run 15 times. This is on a Core 2 system. My AMD system (otherwise identical) shows somewhat less improvement, but still an improvement. I've haven't tested condbranches.patch vs condbranches-plus.patch; what difference are you seeing, Antoine? I like these changes. Folding POP into JUMP_IF_{TRUE,FALSE} should have been done years ago (I think I proposed it and Raymond shot me down). I'm also in favor of making POP_JUMP_IF_* absolute jumps. This patch mostly looks good, though you still need to change Doc/ library/dis.rst and the pure-Python compiler package. I'd get someone else to look over the patch, too, though.
msg79794 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-01-13 23:25
Hello, > I've backported condbranches-plus.patch to trunk, and I'm getting these > results: Thanks! > PyBench: 1.84-2.21% faster > 2to3: 3.83% faster > Spitfire: 6.13-6.23% faster What is Spitfire? > I've haven't tested condbranches.patch vs condbranches-plus.patch; what > difference are you seeing, Antoine? condbranches.patch is the earlier version (without POP_OR_JUMP and JUMP_OR_POP), it can be ignored. > This patch mostly looks good, though you still need to change Doc/ > library/dis.rst and the pure-Python compiler package. Well, the pure-Python compiler package doesn't exist in py3k, for which I've initially made the patch. (and its state in 2.x is very sorry...)
msg79796 - (view)	Author: Collin Winter (collinwinter) *	Date: 2009-01-13 23:29
On Tue, Jan 13, 2009 at 3:25 PM, Antoine Pitrou <report@bugs.python.org> wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > > Hello, > >> I've backported condbranches-plus.patch to trunk, and I'm getting these >> results: > > Thanks! > >> PyBench: 1.84-2.21% faster >> 2to3: 3.83% faster >> Spitfire: 6.13-6.23% faster > > What is Spitfire? http://code.google.com/p/spitfire/. It's a template system designed for performance that I have some experience with. >> I've haven't tested condbranches.patch vs condbranches-plus.patch; what >> difference are you seeing, Antoine? > > condbranches.patch is the earlier version (without POP_OR_JUMP and > JUMP_OR_POP), it can be ignored. I was mostly curious whether the POP_OR_JUMP and JUMP_OR_POP opcodes had a noticeable performance impact, ie, do they make things fast enough to warrant their inclusion over the old JUMP_IF_FALSE implementations. >> This patch mostly looks good, though you still need to change Doc/ >> library/dis.rst and the pure-Python compiler package. > > Well, the pure-Python compiler package doesn't exist in py3k, for which > I've initially made the patch. > (and its state in 2.x is very sorry...) I'll update the compiler package in 2.x and post my patch once I have it working. There are also some test failures in 2.x that I'll take care of.
msg79799 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-01-13 23:35
Sorry, hadn't seen your message before removing the file. Here it is again.
msg79806 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-01-13 23:46
> http://code.google.com/p/spitfire/. It's a template system designed > for performance that I have some experience with. 6% faster on a template system simply by optimizing conditional jumps is quite neat. (the spitfire homepage itself is rather intriguing, they claim to be faster than plain string joining...) > I was mostly curious whether the POP_OR_JUMP and JUMP_OR_POP opcodes > had a noticeable performance impact, ie, do they make things fast > enough to warrant their inclusion over the old JUMP_IF_FALSE > implementations. Well, I don't remember really, although this is only a few weeks old. POP_OR_JUMP / JUMP_OR_POP will be used when the "and" and "or" keywords are involved. That's basically the only situation where you don't want to pop the top of stack after a conditional jump.
msg79823 - (view)	Author: Jeffrey Yasskin (jyasskin) *	Date: 2009-01-14 02:48
Looking through the patch... I don't really like the names for JUMP_OR_POP and POP_OR_JUMP. (They don't really indicate to me that the choice depends on the truth of the top of the stack.) How about JUMP_IF_FALSE_OR_POP and JUMP_IF_TRUE_OR_POP (or s/OR/ELSE/)? If the old opcodes weren't called JUMP_IF_XXX, I'd suggest the always-popping variants just use those names. Many other opcodes implicitly consume their arguments so these could too. But since this would be confusing with both the old meanings and the other new pair, your names are probably better. The comments in opcode.h and opcode.py are inconsistent. I now get a warning that PRED_POP_TOP is defined but not used, so you should remove the PREDICTED macro for it. I wonder if BINARY_AND and BINARY_OR should get predictions ... not for this patch. I would probably put the POP_JUMP_IF_XXX definitions next to the JUMP_OR_POP definitions to emphasize their similarity. You missed a comment referring to JUMP_IF_FALSE before compiler_try_except(). POP_JUMP_IF_TRUE is only used in one place: assertions. I wonder if anyone would cry if we compiled assertions to UNARY_NOT; POP_JUMP_IF_FALSE instead... Anyway, also not for this patch. In compiler_comprehension_generator, "compiler_use_next_block(c, skip);" is now always followed by "compiler_use_next_block(c, if_cleanup);". Should you remove the use(skip) call? In peephole.c, s/JUMP_SIGN/JUMPS_ON_TRUE/ ? test_peepholer fails for me, which makes sense because it still refers to JUMP_IF_XXX. Whoo, the peephole.c changes are complex. I'll do those in a second round.
msg79824 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2009-01-14 02:55
FWIW, the goal is to replace the opcode peephole optimizer with a more powerful equivalent working on the AST just prior to code generation.
msg79833 - (view)	Author: Jeffrey Yasskin (jyasskin) *	Date: 2009-01-14 05:50
In peephole.c: _optimize isn't a great label name, but I don't have a great replacement. Maybe reoptimize_current_index? Your change to the "LOAD_CONST trueconst JUMP_IF_FALSE xx POP_TOP" optimization doesn't preserve the optimization to: def f(): return 1 and a I suspect we don't care since "0 or a" wasn't optimized. I wonder what the "POP_TOP JUMP_FORWARD 1 POP_TOP" was ever for. Why did compiler_comprehension_generator() emit it in the first place? After "if (JUMP_SIGN(j) == JUMP_SIGN(opcode)) {", it might be nice to have a comment like, "/* The second jump will always be taken if the first was. */" and similarly for the else branch with an explanation why the POP should become unconditional. Otherwise looks good.
msg79848 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-01-14 11:17
> Your change to the "LOAD_CONST trueconst JUMP_IF_FALSE xx POP_TOP" > optimization doesn't preserve the optimization to: > def f(): > return 1 and a > I suspect we don't care since "0 or a" wasn't optimized. Yes, this optimization seems meant for "while 1" and "while True" mainly (which my patch preserves, but I might add a comment). > I wonder what the "POP_TOP JUMP_FORWARD 1 POP_TOP" was ever for. Why did > compiler_comprehension_generator() emit it in the first place? I'm as clueless as you...
msg79850 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-01-14 11:25
Le mercredi 14 janvier 2009 à 02:48 +0000, Jeffrey Yasskin a écrit : > Looking through the patch... > > I don't really like the names for JUMP_OR_POP and POP_OR_JUMP. I don't like them either, they're the only relatively short ones I could come up with. I'll change them to your proposal (JUMP_IF_{TRUE,FALSE}_OR_POP). > I wonder if BINARY_AND and BINARY_OR should get predictions ... not for > this patch. With the patch, I don't think they need predictions anymore. > POP_JUMP_IF_TRUE is only used in one place: assertions. I wonder if > anyone would cry if we compiled assertions to UNARY_NOT; > POP_JUMP_IF_FALSE instead... No, POP_JUMP_IF_TRUE is also used when optimizing the sequence "UNARY_NOT; POP_JUMP_IF_FALSE" (think "if not x: ..."). > In compiler_comprehension_generator, "compiler_use_next_block(c, skip);" > is now always followed by "compiler_use_next_block(c, if_cleanup);". > Should you remove the use(skip) call? I'll look at this. > In peephole.c, s/JUMP_SIGN/JUMPS_ON_TRUE/ ? Great, another stupid name disappears :)
msg82687 - (view)	Author: Jeffrey Yasskin (jyasskin) *	Date: 2009-02-25 00:51
I've updated Antoine's patch to incorporate my comments. This interacts with issue 2459, but I haven't yet looked at its patch to figure out how. As a first cut, I'll propose committing this patch, backporting it to trunk, syncing it into the issue 2459 patch, and then committing that patch. Let me know if that's crazy. http://codereview.appspot.com/20064
msg82689 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-02-25 01:52
Le mercredi 25 février 2009 à 00:51 +0000, Jeffrey Yasskin a écrit : > I've updated Antoine's patch to incorporate my comments. This interacts > with issue 2459, but I haven't yet looked at its patch to figure out > how. As a first cut, I'll propose committing this patch, backporting it > to trunk, syncing it into the issue 2459 patch, and then committing that > patch. Let me know if that's crazy. Doesn't sound crazier than doing it in another order :-)) There'll be a bit of work to reconcile both patches anyway (and also to adapt the 2459 in order to apply cleanly against current trunk). 2459 defines its own JUMP_ABS_IF_TRUE / JUMP_ABS_IF_FALSE (which are the same as this patch's POP_JUMP_IF_TRUE / POP_JUMP_IF_FALSE), but it also keeps the old relative non-popping conditional jumps, which as this issue shows is sub-optimal. Thank you for taking this, Jeffrey, my own priority right now being the io-c branch.
msg82694 - (view)	Author: Jeffrey Yasskin (jyasskin) *	Date: 2009-02-25 02:26
Committed as r69961. I'll post the backport to trunk here at least a day before I commit it.
msg82736 - (view)	Author: Jeffrey Yasskin (jyasskin) *	Date: 2009-02-26 00:49
Oh, and no problem with picking up the patches. Thanks for writing them in the first place. Here's the backport to trunk. I particularly enjoyed the bit in pyassem.py where I had to teach the pure-Python compiler that you can get to a block without going through a relative jump or a fallthrough. The patch is also #2 at http://codereview.appspot.com/20064. I'll post performance numbers as soon as I've measured them.
msg82742 - (view)	Author: Jeffrey Yasskin (jyasskin) *	Date: 2009-02-26 05:54
The numbers are: Intel Core 2, gcc-4.3, 32-bit 2to3: 25.24 -> 24.89: 1.38% faster Django: Min: 0.618 -> 0.607: 1.90% faster Avg: 0.621 -> 0.615: 1.04% faster PyBench: Min: 5324 -> 5280: 0.83% faster Avg: 5456 -> 5386: 1.30% faster Pickle: Min: 1.424 -> 1.376: 3.48% faster Avg: 1.427 -> 1.378: 3.55% faster Spitfire: Min: 0.701 -> 0.718: 2.32% slower Avg: 0.710 -> 0.721: 1.47% slower Unpickle: Min: 0.667 -> 0.651: 2.33% faster Avg: 0.668 -> 0.652: 2.38% faster Intel Core 2, gcc-4.3, 64-bit 2to3: 22.40 -> 22.59: 0.81% slower Django: Min: 0.575 -> 0.565: 1.74% faster Avg: 0.577 -> 0.567: 1.76% faster PyBench: Min: 4332 -> 4433: 2.28% slower Avg: 4393 -> 4519: 2.79% slower Pickle: Min: 1.177 -> 1.204: 2.25% slower Avg: 1.180 -> 1.205: 2.14% slower Spitfire: Min: 0.622 -> 0.629: 1.22% slower Avg: 0.623 -> 0.631: 1.26% slower Unpickle: Min: 0.576 -> 0.563: 2.25% faster Avg: 0.596 -> 0.564: 5.55% faster On my MacBook, gcc-4.0, 32-bit: 2to3: 29.82 -> 29.39: 1.46% faster Django: Min: 0.727 -> 0.720: 0.98% faster Avg: 0.746 -> 0.736: 1.45% faster PyBench: Min: 6303 -> 6432: 2.01% slower Avg: 6471 -> 6563: 1.40% slower Pickle: Min: 1.564 -> 1.564: 0.00% faster Avg: 1.609 -> 1.592: 1.07% faster Spitfire: Min: 0.902 -> 0.909: 0.78% slower Avg: 0.924 -> 0.920: 0.41% faster Unpickle: Min: 0.784 -> 0.763: 2.73% faster Avg: 0.794 -> 0.776: 2.26% faster The performance isn't as good as I'd like, especially on 64-bits. I suspect the difference from the py3k branch is that trunk doesn't have Antoine's dispatch patch, and POP_TOP is predicted after JUMP_IF_{TRUE,FALSE}, which means without computed-goto-dispatch, this patch usually only saves a predictable if(). The skipped JUMP_ABSOLUTEs may not happen enough in my benchmarks to matter much. On the other hand, "./python.exe -m timeit -s 'x=range(500)' '[y+3 for y in x if y%5 <2]'" shows the following differences on my MacBook For py3k: Min: 196.000 -> 172.000: 13.95% faster Avg: 200.000 -> 178.600: 11.98% faster Significant (t=5.339997, a=0.95) For trunk: Min: 108.000 -> 88.200: 22.45% faster Avg: 114.571 -> 97.571: 17.42% faster Significant (t=5.518236, a=0.95) That list comprehension definitely takes advantage of skipping the JUMP_ABSOLUTE.
msg82888 - (view)	Author: Jeffrey Yasskin (jyasskin) *	Date: 2009-02-28 01:41
Collin made some comments at http://codereview.appspot.com/20094. Here's a new patch that fixes them. I plan to commit it over the weekend and then start on issue 2459.
msg82987 - (view)	Author: Jeffrey Yasskin (jyasskin) *	Date: 2009-03-01 20:50
Backported as r70071. I also fixed a couple things I missed in the py3k branch in r70076. Thanks all!

History
Date	User	Action	Args
2022-04-11 14:56:43	admin	set	github: 48965
2009-10-17 04:33:01	esam	set	nosy: + esam
2009-03-01 20:50:45	jyasskin	set	status: open -> closed resolution: fixed messages: + msg82987
2009-02-28 01:41:41	jyasskin	set	files: - trunk-opt-cond-jump.patch
2009-02-28 01:41:31	jyasskin	set	files: + trunk-opt-cond-jump.patch messages: + msg82888
2009-02-26 05:54:46	jyasskin	set	messages: + msg82742
2009-02-26 00:49:23	jyasskin	set	files: + trunk-opt-cond-jump.patch messages: + msg82736
2009-02-25 02:26:50	jyasskin	set	type: resource usage -> performance messages: + msg82694 versions: + Python 2.7
2009-02-25 01:52:35	pitrou	set	messages: + msg82689
2009-02-25 00:51:21	jyasskin	set	files: + condbranches-plus2.patch messages: + msg82687
2009-02-16 17:14:38	jcea	set	nosy: + jcea
2009-01-14 11:33:00	blaisorblade	set	nosy: + blaisorblade
2009-01-14 11:25:39	pitrou	set	messages: + msg79850
2009-01-14 11:17:44	pitrou	set	messages: + msg79848
2009-01-14 05:50:18	jyasskin	set	messages: + msg79833
2009-01-14 02:55:37	rhettinger	set	nosy: + rhettinger messages: + msg79824
2009-01-14 02:48:37	jyasskin	set	messages: + msg79823
2009-01-13 23:46:53	pitrou	set	messages: + msg79806
2009-01-13 23:35:25	pitrou	set	files: + condbranches.patch messages: + msg79799
2009-01-13 23:30:36	pitrou	set	files: - condbranches.patch
2009-01-13 23:29:08	collinwinter	set	messages: + msg79796
2009-01-13 23:25:27	pitrou	set	messages: + msg79794
2009-01-13 23:20:12	collinwinter	set	messages: + msg79790
2009-01-13 17:27:18	pitrou	set	files: + pybench.txt messages: + msg79750
2009-01-13 15:46:54	jyasskin	set	nosy: + collinwinter, jyasskin messages: + msg79747
2008-12-22 01:49:51	pitrou	set	files: + condbranches-plus.patch messages: + msg78165
2008-12-21 22:58:05	pitrou	create