Issue 6033: LOOKUP_METHOD and CALL_METHOD optimization

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/50283

classification

Title:	LOOKUP_METHOD and CALL_METHOD optimization
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.2

process

Status:	closed	Resolution:
Dependencies:		Superseder:	Speedup method calls 1.2x View: 26110
Assigned To:		Nosy List:	benjamin.peterson, collinwinter, jyasskin, pitrou, r.david.murray, rnk
Priority:	low	Keywords:	patch

Created on 2009-05-16 01:43 by benjamin.peterson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
call_method.patch	benjamin.peterson, 2009-05-16 01:43

Messages (10)
msg87850 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2009-05-16 01:43
This is an optimization ported from PyPy. [1] It tries to prevent bound methods from being created by using the stack as a cache. I couldn't apply this to builtin methods because those use a method-wrapper descriptor. The results were not very impressive. However, I'm attaching the patch to see if anyone else wants to look at it. [1] http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#id1 Test minimum run-time average run-time this other diff this other diff ------------------------------------------------------------------------------- BuiltinFunctionCalls: 342ms 340ms +0.6% 378ms 361ms +4.7% BuiltinMethodLookup: 315ms 308ms +2.3% 333ms 319ms +4.5% CompareFloats: 250ms 251ms -0.3% 257ms 258ms -0.7% CompareFloatsIntegers: 266ms 265ms +0.4% 273ms 273ms -0.2% CompareIntegers: 233ms 232ms +0.4% 238ms 238ms -0.2% CompareInternedStrings: 279ms 275ms +1.3% 285ms 284ms +0.3% CompareLongs: 225ms 223ms +1.1% 231ms 229ms +0.8% CompareStrings: 238ms 235ms +1.3% 244ms 243ms +0.6% CompareUnicode: 243ms 246ms -0.9% 252ms 252ms -0.1% ComplexPythonFunctionCalls: 307ms 301ms +2.0% 315ms 309ms +1.8% ConcatStrings: 372ms 366ms +1.6% 376ms 385ms -2.1% ConcatUnicode: 260ms 259ms +0.4% 266ms 269ms -0.9% CreateInstances: 351ms 336ms +4.7% 365ms 346ms +5.5% CreateNewInstances: 265ms 256ms +3.6% 281ms 264ms +6.5% CreateStringsWithConcat: 290ms 289ms +0.1% 304ms 301ms +1.2% CreateUnicodeWithConcat: 220ms 219ms +0.8% 227ms 223ms +1.8% DictCreation: 201ms 200ms +0.4% 204ms 206ms -1.0% DictWithFloatKeys: 400ms 418ms -4.4% 410ms 424ms -3.4% DictWithIntegerKeys: 298ms 294ms +1.2% 306ms 304ms +0.6% DictWithStringKeys: 260ms 264ms -1.5% 270ms 275ms -2.1% ForLoops: 224ms 223ms +0.2% 232ms 232ms +0.2% IfThenElse: 160ms 160ms +0.0% 168ms 182ms -8.0% ListSlicing: 293ms 292ms +0.5% 302ms 306ms -1.2% NestedForLoops: 301ms 300ms +0.2% 305ms 308ms -0.8% NestedListComprehensions: 323ms 328ms -1.7% 331ms 335ms -1.3% NormalClassAttribute: 313ms 314ms -0.1% 323ms 330ms -2.1% NormalInstanceAttribute: 284ms 283ms +0.4% 289ms 288ms +0.4% PythonFunctionCalls: 259ms 278ms -6.7% 274ms 289ms -5.3% PythonMethodCalls: 358ms 357ms +0.3% 371ms 365ms +1.6% Recursion: 389ms 398ms -2.1% 395ms 407ms -2.9% SecondImport: 335ms 319ms +5.2% 346ms 380ms -8.9% SecondPackageImport: 338ms 326ms +3.6% 350ms 337ms +4.0% SecondSubmoduleImport: 413ms 403ms +2.5% 426ms 411ms +3.6% SimpleComplexArithmetic: 341ms 345ms -1.2% 351ms 355ms -1.1% SimpleDictManipulation: 288ms 298ms -3.7% 293ms 303ms -3.1% SimpleFloatArithmetic: 272ms 275ms -1.1% 279ms 286ms -2.7% SimpleIntFloatArithmetic: 211ms 204ms +3.3% 216ms 215ms +0.5% SimpleIntegerArithmetic: 207ms 203ms +1.7% 214ms 213ms +0.5% SimpleListComprehensions: 275ms 273ms +0.6% 281ms 281ms -0.1% SimpleListManipulation: 224ms 229ms -2.5% 234ms 241ms -2.9% SimpleLongArithmetic: 252ms 253ms -0.6% 263ms 266ms -1.0% SmallLists: 290ms 301ms -3.8% 299ms 311ms -3.9% SmallTuples: 254ms 253ms +0.3% 261ms 266ms -1.9% SpecialClassAttribute: 311ms 309ms +0.7% 320ms 321ms -0.2% SpecialInstanceAttribute: 358ms 358ms +0.1% 370ms 371ms -0.4% StringMappings: 817ms 833ms -1.9% 823ms 852ms -3.4% StringPredicates: 488ms 538ms -9.2% 495ms 547ms -9.4% StringSlicing: 295ms 296ms -0.2% 306ms 323ms -5.5% TryExcept: 282ms 280ms +1.0% 291ms 288ms +1.0% TryFinally: 290ms 255ms +14.1% 300ms 263ms +14.1% TryRaiseExcept: 261ms 256ms +1.7% 271ms 263ms +3.1% TupleSlicing: 281ms 270ms +4.3% 289ms 277ms +4.3% UnicodeMappings: 329ms 337ms -2.4% 337ms 348ms -3.0% UnicodePredicates: 295ms 329ms -10.4% 303ms 338ms -10.5% UnicodeProperties: 272ms 307ms -11.4% 284ms 313ms -9.3% UnicodeSlicing: 258ms 261ms -1.2% 266ms 275ms -3.2% WithFinally: 384ms 398ms -3.6% 394ms 408ms -3.5% WithRaiseExcept: 320ms 294ms +8.9% 336ms 304ms +10.6% -------------------------------------------------------------------------------
msg87870 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-05-16 09:13
Can you give results of the "richards" benchmark?
msg87889 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2009-05-16 12:49
With the patch: Richards benchmark (Python) starting... [<function entry_point at 0x63b430>] finished. Total time for 10 iterations: 8.49 secs Average time per iteration: 848.90 ms Without: Richards benchmark (Python) starting... [<function entry_point at 0x637530>] finished. Total time for 10 iterations: 10.46 secs Average time per iteration: 1045.88 ms
msg87896 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-05-16 14:25
Similar results here. With the patch: Richards benchmark (Python) starting... [<function entry_point at 0x7fc15b3f5848>] finished. Total time for 4 iterations: 1.78 secs Average time per iteration: 443.90 ms Without the patch: Richards benchmark (Python) starting... [<function entry_point at 0x7f8b9fb20848>] finished. Total time for 4 iterations: 2.02 secs Average time per iteration: 503.79 ms
msg95707 - (view)	Author: Reid Kleckner (rnk)	Date: 2009-11-25 02:21
One thing I was wondering about the current patch is what about objects that have attributes that shadow methods? For example: class C(object): def foo(self): return 1 c = c() print c.foo() c.foo = lambda: 2 print c.foo() Shouldn't the above print 1 and 2? With the current patch, it seems that you might still print 1. There's also the possible performance drawback where you're loading builtin C methods, so the optimization fails, but you end up calling _PyType_Lookup twice. :( I'm doing the same optimization for unladen swallow, and these were some of the things I ran into. I think I'm going to write a PyObject_GetMethod that tries to get a method without binding it, but if it can't for any reason, it behaves just like PyObject_GetAttr and sets a status code.
msg95710 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2009-11-25 03:27
Yes, my patch introduces that regression you mention. PyPy solves this by having the instances dictionary keep track of shadowing of the type dictionary. Not easy for CPython... I wish you luck on your patch!
msg110253 - (view)	Author: Reid Kleckner (rnk)	Date: 2010-07-14 04:56
I have an patch for unladen-swallow out for review here: http://codereview.appspot.com/160063/show It resolves the correctness issues I mentioned previously by emitting guards if necessary. If the type is predictable and uses slots, then we don't need to check the instance dict. It gives a 5% speedup on the unpickle benchmark. Presumably the other benchmarks do not do as many method calls.
msg110272 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-07-14 12:08
First, are these benchmark results jitted or non-jitted? Right now, non-jitted results are a stronger motivation for inclusion in main CPython, IMHO. Second, 2.7 won't receive any features / performance improvements anymore. It would be nice to have 3.2 (non-jitted) benchmark results. Third, if removing intermediate allocations is the kind of optimizations a JIT will do anyway, does it make sense or not to make these optimizations explicit at the bytecode level? (this is really an open question, not a rhetorical one)
msg110298 - (view)	Author: Reid Kleckner (rnk)	Date: 2010-07-14 16:45
Sorry, I was just posting it so Benjamin could see what this bought us. I'm not pushing to get this in CPython. The results are for JITed code. I forget what the interpreted results are. I think they are good for the microbenchmarks, but not as good for the macro.
msg168170 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-08-14 03:39
Benjamin confirms a regression in his patch, and the other patch was not intended for CPython. So I'm closing this issue.

History
Date	User	Action	Args
2022-04-11 14:56:48	admin	set	github: 50283
2016-01-14 17:55:24	yselivanov	set	superseder: Speedup method calls 1.2x
2012-08-14 03:39:29	r.david.murray	set	status: open -> closed nosy: + r.david.murray messages: + msg168170 stage: patch review -> resolved
2010-07-14 16:45:22	rnk	set	messages: + msg110298
2010-07-14 12:11:47	pitrou	set	stage: patch review versions: - Python 2.7
2010-07-14 12:08:10	pitrou	set	nosy: + collinwinter, jyasskin messages: + msg110272
2010-07-14 04:56:43	rnk	set	messages: + msg110253
2009-11-25 03:27:40	benjamin.peterson	set	messages: + msg95710
2009-11-25 02:21:16	rnk	set	messages: + msg95707
2009-11-24 17:15:04	rnk	set	nosy: + rnk
2009-05-16 14:25:05	pitrou	set	messages: + msg87896
2009-05-16 12:49:25	benjamin.peterson	set	messages: + msg87889
2009-05-16 09:13:54	pitrou	set	nosy: + pitrou messages: + msg87870
2009-05-16 01:43:58	benjamin.peterson	create