Title: Reserve COMPARE_OP for rich comparisons
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.2
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: belopolsky, benjamin.peterson, jyasskin, mark.dickinson, pitrou, rhettinger, serprex, stutzbach
Priority: normal Keywords: patch

Created on 2010-07-04 19:53 by serprex, last changed 2010-09-02 00:23 by rhettinger. This issue is now closed.

File name Uploaded Description Edit
cmpoppatch.diff serprex, 2010-07-04 19:53
cmpoppatch2.diff serprex, 2010-07-04 20:26 Without whitespace amendments
cmpoprotdup.diff serprex, 2010-07-05 01:02
cmpoprotdup2.diff serprex, 2010-07-05 01:26
cmpoprotdupalluny.diff serprex, 2010-07-05 14:07
cmpoprotdupalluny2.diff serprex, 2010-07-05 22:47
cmpoprotdupalluny3.diff serprex, 2010-07-08 18:10
pybench.txt pitrou, 2010-07-08 18:31
Messages (13)
msg109259 - (view) Author: Demur Rumed (serprex) Date: 2010-07-04 19:53
Currently, COMPARE_OP is burdened by a needless, and unorthogonal, extra layer of indirection. I've modified it to only handle the rich comparison case, moving the other five cases to BINARY_IN, BINARY_NOT_IN, BINARY_IS, BINARY_IS_NOT, and BINARY_EXC_MATCH

To consider is inlining the POP_JUMP_IF_FALSE POP_TOP which always follow BINARY_EXC_MATCH
msg109261 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-07-04 20:09
Any chance you could regenerate the patch without all the unnecessary whitespace changes?  It's kinda hard to read right now.
msg109279 - (view) Author: Demur Rumed (serprex) Date: 2010-07-05 01:02
I've attached the original patch without whitespace, and am also including modifications to this portion of the interpreter which remove ROT_FOUR, DUP_TOPX while adding ROT_THREE_TWO, DUP_TOP_TWO, DUP_ROT_THREE. I've seen a 5% speed increase with timeit.Timer("a[0]+=0","a=[0]")

Also modified the optimizer to reject code with a size over 32767 instead of 32760. That's most likely a useless modification which may be preferred ignored
msg109282 - (view) Author: Demur Rumed (serprex) Date: 2010-07-05 01:26
Fixed regression in ROT_THREE_TWO and a missed DUP_TOP2 -> DUP_TOP_TWO. Now there are only the odd segfaults
msg109319 - (view) Author: Demur Rumed (serprex) Date: 2010-07-05 14:07
I've fixed the segfaulting in IN and NOT_IN, though the code requires cleaning due to the obtusely cargo culted nature of those opcodes

I'm also including an incrementation which completely removes COMPARE_OP, using the opcode to determine the comparison type. As a result, I found it safe to remove the PyCmp_ enum, though cpickle seems to disagree. It's also lacking performance wise
msg109360 - (view) Author: Demur Rumed (serprex) Date: 2010-07-05 22:47
It seems the lack of benefits from replacing COMPARE_OP with a set of aliased bytecodes was repairable through TARGET_WITH_IMPL. Also fixed was the lack of amendments to dis
msg109514 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-07-08 05:56
Be sure to consider the impact on the peephole optimizer.  Also consider that adding volume inside the ceval.c loop may have unintended slowing effects on various compilers -- supposedly, this has been a serious issue in that past and there has been concern about growing the body of the loop.

I presume that the thrust of the efforts is directed at eliminating the an unpredictable switch branch inside the COMPARE_OP opcode.  If so, there's not much fat to be shed here.  The cost of the opcode switch is small relative the time it takes to dispatch to an object's rich comparison methods.
msg109538 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-07-08 13:14
I would like any such change to show what it improves through benchmarks.
msg109553 - (view) Author: Demur Rumed (serprex) Date: 2010-07-08 16:25
The seperation of COMPARE_OP into multiple opcodes shouldn't affect cache size since the opcodes are aliased. Spreading out the switch statement shouldn't cause issue from flattening, since GCC would inline the single use of cmp_outcome. Thus the only bloat is the wrapper overhead, which I believe the C code is relatively large with respect to the macros. As for DUP_TOP_TWO, that's smaller than DUP_TOPX

It may be suitable to benchmark on other systems besides my own due to the subversive nature of caches; I've seen improvement in any code that uses comparisons. To consider, however, is that ceval.o dropped from 268592B to 267448B. This is reflected in that the interpreter's executable dropped from 6721842B to 6719866B

Raymond, what do you mean with respect to the peepholer? I believe my changes there have shown it to be a simplification of the code
msg109557 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-08 17:33
The patch fails compiling with --with-computed-gotos:

Python/ceval.c: In function ‘PyEval_EvalFrameEx’:
Python/opcode_targets.h:101: erreur: label ‘TARGET_DUP_TOPX’ used but not defined
make: *** [Python/ceval.o] Erreur 1
msg109561 - (view) Author: Demur Rumed (serprex) Date: 2010-07-08 18:10
Replaced TARGET_DUP_TOPX with _unknown_opcode

I recompiled with --with-computed-gotos, and the results remain promising, though the interpreter instead grows from 6756023B to 6765167B
msg109564 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-08 18:31
Here, your patch actually show a slowdown on pybench with computed gotos. Results attached.
msg115345 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-02 00:23
I'm marking this one as rejected.  The timings have shown mixed results (some favorable, some not).  In general, there is a bias against expanding the number of opcodes because 1) there aren't that many codes available, 2) it grows the size of the switch-case (which has been problematic from some compilers on various systems), and because it complicates other optimization efforts such as the peephole optimizer or in other tools that depend on bytecode (such as the Unladen Swallow LLVM project, Psyco, or ByteCodeHacks). 

Also, the potential improvement is very small because it removes only one indirection out of long chain of events involved in a comparison.  Most of the time is consumed in abstract dispatch to a concrete comparison method and it that method itself.
Date User Action Args
2010-09-02 00:23:45rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg115345
2010-09-01 21:37:13stutzbachsetnosy: + stutzbach
2010-07-08 18:31:58pitrousetfiles: + pybench.txt

messages: + msg109564
2010-07-08 18:10:14serprexsetfiles: + cmpoprotdupalluny3.diff

messages: + msg109561
2010-07-08 17:33:05pitrousetmessages: + msg109557
2010-07-08 16:25:40serprexsetmessages: + msg109553
2010-07-08 13:24:32pitrousetnosy: + pitrou, jyasskin
2010-07-08 13:14:51benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg109538
2010-07-08 05:56:16rhettingersetmessages: + msg109514
2010-07-05 22:47:45serprexsetfiles: + cmpoprotdupalluny2.diff

messages: + msg109360
versions: - Python 3.3
2010-07-05 14:07:39serprexsetfiles: + cmpoprotdupalluny.diff

messages: + msg109319
2010-07-05 11:50:09eric.araujosetmessages: - msg109280
2010-07-05 02:20:12rhettingersetassignee: rhettinger

nosy: + rhettinger
2010-07-05 01:26:57serprexsetfiles: + cmpoprotdup2.diff

messages: + msg109282
2010-07-05 01:03:37serprexsetfiles: - cmpoprotdup.diff
2010-07-05 01:03:09serprexsetfiles: + cmpoprotdup.diff

messages: + msg109280
2010-07-05 01:03:00serprexsetfiles: + cmpoprotdup.diff

messages: + msg109279
2010-07-04 20:26:06serprexsetfiles: + cmpoppatch2.diff
2010-07-04 20:13:38belopolskysetnosy: + belopolsky
2010-07-04 20:09:55mark.dickinsonsetnosy: + mark.dickinson
messages: + msg109261
2010-07-04 19:54:18serprexsettitle: Reserve COMPARE_OP for RichComparisons -> Reserve COMPARE_OP for rich comparisons
2010-07-04 19:53:59serprexcreate