Issue 4896: Faster why variable manipulation in ceval.c

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49146

classification

Title:	Faster why variable manipulation in ceval.c
Type:	performance	Stage:	resolved
Components:		Versions:	Python 3.5

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	BreamoreBoy, ajaksu2, blaisorblade, collinwinter, jyasskin, pitrou, rhettinger, serhiy.storchaka, vstinner
Priority:	low	Keywords:

Created on 2009-01-09 14:31 by skip.montanaro, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
unnamed	skip.montanaro, 2009-01-09 14:31
gcc_4.2.4_linux_ia32_bench.txt	ajaksu2, 2009-01-09 17:53	pybench results
unpatched.txt	BreamoreBoy, 2015-03-15 17:09	pybench test output

Messages (16)
msg79470 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2009-01-09 14:31
The why_code enum in ceval.c has values which form a bit set. Comparison of the why variable against multiple values is going to be faster using bitwise operations instead of logical ones. For example, instead of why == WHY_RETURN \|\| why == WHY_CONTINUE the equivalent bitwise expression is why & (WHY_RETURN \| WHY_CONTINUE) which has fewer operations (one vs three treating the rhs of & as a constant). This is already done in one place. The attached patch converts all other expressions of the first form. Also, there are some further manipulations of why in the loop after the fast_block_end. The loop can only be entered if why != WHY_NOT. In the loop when it is set to WHY_NOT, the loop breaks. There is thus no reason to test its value in the while expression. Further, instead of just breaking from the loop and then checking the why != WHY_NOT again, just jump past that check by adding a why_not_here label. The attached patch implements these changes (against the py3k branch). All tests pass on my Mac except test_cmd_line (which has been failing for awhile). Skip
msg79482 - (view)	Author: Daniel Diniz (ajaksu2) *	Date: 2009-01-09 17:53
Neat, gives a 10% speedup on a Celeron M with gcc 4.2.
msg79483 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-01-09 18:31
I get a 3% speedup on x86-64 with gcc 4.3.2. The label "why_not_here" should be renamed to something more meaningful IMO. Or you could just kill the label and use "continue" instead.
msg79484 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2009-01-09 18:40
Antoine> The label "why_not_here" should be renamed to something more Antoine> meaningful IMO. Or you could just kill the label and use Antoine> "continue" instead. I thought "why_not_here" was meaningful. "Here" is where you go when why == WHY_NOT. I don't think continue will work. The goto is coming out of an inner loop. If you continue from there you just continue the inner loop. I replaced a break with a goto. Skip
msg79486 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-01-09 18:55
> I thought "why_not_here" was meaningful. I don't know, when I see "goto why_not_here" it looks like a joke to me :) > I don't think continue will work. The goto is coming out of an > inner loop. If you continue from there you just continue the inner loop. Oops, sorry for misguided advice.
msg79501 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2009-01-09 20:24
>> I thought "why_not_here" was meaningful. Antoine> I don't know, when I see "goto why_not_here" it looks like a Antoine> joke to me :) Well, I think the enum name WHY_NOT is kind of a joke itself, but it's been that way for so long I see no reason to change it. I'll add a comment to the label which describes the intent in plain(er) English. Skip
msg79598 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2009-01-11 16:00
Pystone results: apply why patch py3k% rm $TMPDIR/.[coi] ; make python.exe && rm -f /tmp/trash ; ./python.exe Lib/test/pystone.py rm: /tmp/.[coi]: No such file or directory make: `python.exe' is up to date. Pystone(1.1) time for 50000 passes = 1.08154 This machine benchmarks at 46230.2 pystones/second py3k% rm $TMPDIR/.[coi] ; make python.exe && rm -f /tmp/trash ; ./python.exe Lib/test/pystone.py rm: /tmp/.[coi]: No such file or directory make: `python.exe' is up to date. Pystone(1.1) time for 50000 passes = 1.08365 This machine benchmarks at 46140.4 pystones/second py3k% rm $TMPDIR/.[coi] ; make python.exe && rm -f /tmp/trash ; ./python.exe Lib/test/pystone.py rm: /tmp/.[coi]: No such file or directory make: `python.exe' is up to date. Pystone(1.1) time for 50000 passes = 1.08598 This machine benchmarks at 46041.4 pystones/second remove patch py3k% rm $TMPDIR/.[coi] ; make python.exe && rm -f /tmp/trash ; ./python.exe Lib/test/pystone.py rm: /tmp/.[coi]: No such file or directory make: `python.exe' is up to date. Pystone(1.1) time for 50000 passes = 1.10093 This machine benchmarks at 45416.3 pystones/second py3k% rm $TMPDIR/.[coi] ; make python.exe && rm -f /tmp/trash ; ./python.exe Lib/test/pystone.py rm: /tmp/.[coi]: No such file or directory make: `python.exe' is up to date. Pystone(1.1) time for 50000 passes = 1.10458 This machine benchmarks at 45266.1 pystones/second py3k% rm $TMPDIR/.[coi] ; make python.exe && rm -f /tmp/trash ; ./python.exe Lib/test/pystone.py rm: /tmp/.[coi]: No such file or directory make: `python.exe' is up to date. Pystone(1.1) time for 50000 passes = 1.10333 This machine benchmarks at 45317.5 pystones/second
msg79599 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2009-01-11 16:05
pybench comparison... % ./python.exe Tools/pybench/pybench.py -s stock.out -c why.out ----------------- -------------------------------------------------------------- PYBENCH 2.0 ------------------------------------------------------------------------------- * using CPython 3.1a0 (py3k:68444M, Jan 11 2009, 10:02:04) [GCC 4.0.1 (Apple Inc. build 5490)] * disabled garbage collection * system check interval set to maximum: 2147483647 * using timer: time.time ------------------------------------------------------------------------------- Benchmark: stock.out ------------------------------------------------------------------------------- Rounds: 10 Warp: 10 Timer: time.time Machine Details: Platform ID: Darwin-9.6.0-i386-32bit Processor: i386 Python: Implementation: CPython Executable: /Users/skip/src/python/py3k/python.exe Version: 3.1.0 Compiler: GCC 4.0.1 (Apple Inc. build 5490) Bits: 32bit Build: Jan 11 2009 09:57:51 (#py3k:68444) Unicode: UCS2 ------------------------------------------------------------------------------- Comparing with: why.out ------------------------------------------------------------------------------- Rounds: 10 Warp: 10 Timer: time.time Machine Details: Platform ID: Darwin-9.6.0-i386-32bit Processor: i386 Python: Implementation: CPython Executable: /Users/skip/src/python/py3k/python.exe Version: 3.1.0 Compiler: GCC 4.0.1 (Apple Inc. build 5490) Bits: 32bit Build: Jan 11 2009 10:02:04 (#py3k:68444M) Unicode: UCS2 Test minimum run-time average run-time this other diff this other diff ------------------------------------------------------------------------------- BuiltinFunctionCalls: 110ms 106ms +4.1% 111ms 107ms +3.7% BuiltinMethodLookup: 78ms 79ms -0.7% 80ms 80ms -0.4% CompareFloats: 105ms 113ms -7.3% 106ms 114ms -6.8% CompareFloatsIntegers: 161ms 146ms +10.8% 163ms 151ms +7.7% CompareIntegers: 145ms 151ms -3.8% 147ms 152ms -3.1% CompareInternedStrings: 109ms 134ms -18.9% 110ms 135ms -18.9% CompareLongs: 86ms 90ms -4.5% 86ms 91ms -4.8% CompareStrings: 89ms 88ms +0.4% 91ms 89ms +2.3% ComplexPythonFunctionCalls: 146ms 141ms +3.1% 151ms 143ms +5.9% ConcatStrings: 124ms 123ms +0.8% 126ms 129ms -2.0% CreateInstances: 146ms 144ms +1.9% 148ms 146ms +1.2% CreateNewInstances: 110ms 109ms +1.2% 111ms 110ms +0.6% CreateStringsWithConcat: 171ms 158ms +8.4% 175ms 165ms +5.8% DictCreation: 85ms 80ms +6.5% 86ms 81ms +6.8% DictWithFloatKeys: 98ms 99ms -1.0% 99ms 100ms -1.0% DictWithIntegerKeys: 94ms 97ms -3.4% 94ms 103ms -8.6% DictWithStringKeys: 84ms 88ms -5.0% 84ms 92ms -8.3% ForLoops: 85ms 82ms +3.0% 85ms 83ms +3.1% IfThenElse: 116ms 132ms -12.1% 117ms 133ms -12.5% ListSlicing: 119ms 118ms +0.1% 124ms 124ms -0.1% NestedForLoops: 141ms 129ms +9.1% 142ms 132ms +7.8% NestedListComprehensions: 164ms 159ms +3.4% 166ms 160ms +3.8% NormalClassAttribute: 215ms 218ms -1.2% 217ms 220ms -1.2% NormalInstanceAttribute: 111ms 115ms -3.0% 113ms 117ms -3.7% PythonFunctionCalls: 104ms 123ms -15.5% 104ms 124ms -15.6% PythonMethodCalls: 144ms 144ms -0.3% 148ms 149ms -0.6% Recursion: 180ms 195ms -7.8% 191ms 198ms -3.7% SecondImport: 106ms 107ms -1.1% 107ms 108ms -1.1% SecondPackageImport: 118ms 117ms +1.3% 119ms 118ms +1.1% SecondSubmoduleImport: 154ms 156ms -1.2% 156ms 157ms -0.6% SimpleComplexArithmetic: 87ms 89ms -1.9% 88ms 95ms -7.4% SimpleDictManipulation: 180ms 176ms +2.5% 183ms 178ms +2.6% SimpleFloatArithmetic: 92ms 99ms -7.3% 93ms 102ms -9.0% SimpleIntFloatArithmetic: 127ms 127ms -0.4% 128ms 129ms -0.8% SimpleIntegerArithmetic: 127ms 136ms -6.7% 128ms 138ms -7.6% SimpleListComprehensions: 128ms 129ms -0.4% 131ms 130ms +0.2% SimpleListManipulation: 95ms 108ms -12.5% 97ms 110ms -11.6% SimpleLongArithmetic: 85ms 86ms -2.0% 85ms 89ms -4.4% SmallLists: 154ms 153ms +0.8% 157ms 156ms +0.8% SmallTuples: 159ms 157ms +1.3% 161ms 159ms +1.1% SpecialClassAttribute: 357ms 341ms +4.8% 360ms 345ms +4.3% SpecialInstanceAttribute: 110ms 113ms -2.5% 111ms 115ms -3.4% StringMappings: 293ms 312ms -6.3% 298ms 315ms -5.3% StringPredicates: 122ms 126ms -2.6% 125ms 127ms -1.9% StringSlicing: 216ms 215ms +0.5% 222ms 221ms +0.8% TryExcept: 76ms 71ms +7.0% 77ms 71ms +7.5% TryFinally: 84ms 86ms -1.2% 85ms 86ms -1.4% TryRaiseExcept: 58ms 60ms -3.4% 59ms 61ms -3.6% TupleSlicing: 174ms 158ms +10.3% 181ms 162ms +11.4% WithFinally: 133ms 142ms -6.2% 138ms 143ms -3.8% WithRaiseExcept: 153ms 149ms +2.6% 154ms 151ms +2.0% ------------------------------------------------------------------------------- Totals: 6708ms 6772ms -0.9% 6820ms 6897ms -1.1% (this=stock.out, other=why.out)
msg79989 - (view)	Author: Collin Winter (collinwinter) *	Date: 2009-01-16 23:47
Another data point: I've tested this patch applied to trunk on Core 2 Duo and Opteron 8214 HE machines using both gcc 4.0.3 and 4.3.1, and I'm seeing mixed results. Pybench with warp 1 is between ~1.5% slower and ~1% faster, depending on gcc version (fairly consistent across machines). 2to3 and two template systems I've tested are between 2.5% slower and 2% faster depending on workload and gcc version.
msg79990 - (view)	Author: Paolo 'Blaisorblade' Giarrusso (blaisorblade)	Date: 2009-01-17 00:16
Given a 10% speedup on some systems, and statistically insignificant changes on other systems, I would still apply the patch, even simply because the bitmask part simply makes more sense. I'm not sure about the goto part, but still, it does straighten the code. Anyway, simply call the label "why_is_WHY_NOT", "why_set_to_WHY_NOT" or something like that. Verbosity on a use-once label used with goto should be encouraged - we're not Java programmer, but we need to pay for using goto by increasing readability in other ways. @collinwinter: since the differences you report are so low (similar to the statistical noise I get on my machine), I would expect that you're just getting statistical noise instead of different results depending on the GCC version, unless you performed statistical hypothesis testing (confidence intervals and related stuff). And if I had done the needed tens/hundreds of repetitions for hypothesis testing, I'd state that clearly, so I suppose you didn't, and that's fully acceptable since the result is likely to be statistically insignificant anyway.
msg220684 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2014-06-15 22:12
I'm guessing that a patch to ceval.c that's this old wouldn't apply cleanly now. I'll rework it but only if the changes are highly likely to be accepted. Given the mixed results previously reported this is not guaranteed. Opinions please.
msg220690 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2014-06-16 02:33
Before evaluating this further, the timings should be updated for the current 3.5 code and using the various compilers for the difference OSes. Also, it would be nice to run Antoine's suite of benchmarks.
msg237299 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2015-03-05 22:44
Where do we find "Antoine's suite of benchmarks"?
msg237303 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-03-05 23:05
> Where do we find "Antoine's suite of benchmarks"? https://hg.python.org/benchmarks
msg238154 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2015-03-15 17:09
I've finally remembered to attach the test output I got a week ago. If you want me to run Antoine's test suite with any specific parameters please feel free to ask.
msg318540 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-06-03 12:37
This issue is outdated since moving unwinding of stack for "pseudo exceptions" from interpreter to compiler in issue17611.

History
Date	User	Action	Args
2022-04-11 14:56:43	admin	set	github: 49146
2018-06-03 12:37:35	serhiy.storchaka	set	status: open -> closed resolution: out of date messages: + msg318540 stage: resolved
2015-03-15 17:09:45	BreamoreBoy	set	files: + unpatched.txt messages: + msg238154
2015-03-05 23:05:15	vstinner	set	messages: + msg237303
2015-03-05 22:44:23	BreamoreBoy	set	nosy: + serhiy.storchaka messages: + msg237299
2014-06-16 02:33:58	rhettinger	set	priority: normal -> low versions: - Python 3.4 nosy: + rhettinger, vstinner messages: + msg220690
2014-06-15 22:12:24	BreamoreBoy	set	versions: + Python 3.4, Python 3.5, - Python 3.0, Python 3.1 nosy: + BreamoreBoy messages: + msg220684 type: performance
2010-05-20 20:35:11	skip.montanaro	set	nosy: - skip.montanaro
2009-01-17 00:16:44	blaisorblade	set	messages: + msg79990
2009-01-16 23:47:54	collinwinter	set	messages: + msg79989
2009-01-16 17:15:26	jyasskin	set	nosy: + collinwinter, jyasskin
2009-01-13 05:49:03	blaisorblade	set	nosy: + blaisorblade
2009-01-11 16:05:31	skip.montanaro	set	messages: + msg79599
2009-01-11 16:00:12	skip.montanaro	set	messages: + msg79598
2009-01-09 20:24:47	skip.montanaro	set	messages: + msg79501
2009-01-09 18:55:57	pitrou	set	messages: + msg79486
2009-01-09 18:40:10	skip.montanaro	set	messages: + msg79484
2009-01-09 18:31:21	pitrou	set	nosy: + pitrou messages: + msg79483
2009-01-09 17:53:26	ajaksu2	set	files: + gcc_4.2.4_linux_ia32_bench.txt nosy: + ajaksu2 messages: + msg79482 versions: + Python 3.0, Python 3.1
2009-01-09 14:31:52	skip.montanaro	create