LOOKUP_METHOD and CALL_METHOD optimization #50283

benjaminp · 2009-05-16T01:43:58Z

BPO	6033
Nosy	@pitrou, @benjaminp, @bitdancer, @rnk
Superseder	bpo-26110: Speedup method calls 1.2x
Files	call_method.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-08-14.03:39:29.384>
created_at = <Date 2009-05-16.01:43:58.126>
labels = ['interpreter-core', 'performance']
title = 'LOOKUP_METHOD and CALL_METHOD optimization'
updated_at = <Date 2016-01-14.17:55:24.378>
user = 'https://github.com/benjaminp'

bugs.python.org fields:

activity = <Date 2016-01-14.17:55:24.378>
actor = 'yselivanov'
assignee = 'none'
closed = True
closed_date = <Date 2012-08-14.03:39:29.384>
closer = 'r.david.murray'
components = ['Interpreter Core']
creation = <Date 2009-05-16.01:43:58.126>
creator = 'benjamin.peterson'
dependencies = []
files = ['13992']
hgrepos = []
issue_num = 6033
keywords = ['patch']
message_count = 10.0
messages = ['87850', '87870', '87889', '87896', '95707', '95710', '110253', '110272', '110298', '168170']
nosy_count = 6.0
nosy_names = ['collinwinter', 'pitrou', 'jyasskin', 'benjamin.peterson', 'r.david.murray', 'rnk']
pr_nums = []
priority = 'low'
resolution = None
stage = 'resolved'
status = 'closed'
superseder = '26110'
type = 'performance'
url = 'https://bugs.python.org/issue6033'
versions = ['Python 3.2']

benjaminp · 2009-05-16T01:43:51Z

This is an optimization ported from PyPy. [1] It tries to prevent bound
methods from being created by using the stack as a cache. I couldn't
apply this to builtin methods because those use a method-wrapper
descriptor. The results were not very impressive. However, I'm attaching
the patch to see if anyone else wants to look at it.

[1]
http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#id1

Test minimum run-time average run-time
this other diff this other
diff
-------------------------------------------------------------------------------
BuiltinFunctionCalls: 342ms 340ms +0.6% 378ms 361ms
+4.7%
BuiltinMethodLookup: 315ms 308ms +2.3% 333ms 319ms
+4.5%
CompareFloats: 250ms 251ms -0.3% 257ms 258ms
-0.7%
CompareFloatsIntegers: 266ms 265ms +0.4% 273ms 273ms
-0.2%
CompareIntegers: 233ms 232ms +0.4% 238ms 238ms
-0.2%
CompareInternedStrings: 279ms 275ms +1.3% 285ms 284ms
+0.3%
CompareLongs: 225ms 223ms +1.1% 231ms 229ms
+0.8%
CompareStrings: 238ms 235ms +1.3% 244ms 243ms
+0.6%
CompareUnicode: 243ms 246ms -0.9% 252ms 252ms
-0.1%
ComplexPythonFunctionCalls: 307ms 301ms +2.0% 315ms 309ms
+1.8%
ConcatStrings: 372ms 366ms +1.6% 376ms 385ms
-2.1%
ConcatUnicode: 260ms 259ms +0.4% 266ms 269ms
-0.9%
CreateInstances: 351ms 336ms +4.7% 365ms 346ms
+5.5%
CreateNewInstances: 265ms 256ms +3.6% 281ms 264ms
+6.5%
CreateStringsWithConcat: 290ms 289ms +0.1% 304ms 301ms
+1.2%
CreateUnicodeWithConcat: 220ms 219ms +0.8% 227ms 223ms
+1.8%
DictCreation: 201ms 200ms +0.4% 204ms 206ms
-1.0%
DictWithFloatKeys: 400ms 418ms -4.4% 410ms 424ms
-3.4%
DictWithIntegerKeys: 298ms 294ms +1.2% 306ms 304ms
+0.6%
DictWithStringKeys: 260ms 264ms -1.5% 270ms 275ms
-2.1%
ForLoops: 224ms 223ms +0.2% 232ms 232ms
+0.2%
IfThenElse: 160ms 160ms +0.0% 168ms 182ms
-8.0%
ListSlicing: 293ms 292ms +0.5% 302ms 306ms
-1.2%
NestedForLoops: 301ms 300ms +0.2% 305ms 308ms
-0.8%
NestedListComprehensions: 323ms 328ms -1.7% 331ms 335ms
-1.3%
NormalClassAttribute: 313ms 314ms -0.1% 323ms 330ms
-2.1%
NormalInstanceAttribute: 284ms 283ms +0.4% 289ms 288ms
+0.4%
PythonFunctionCalls: 259ms 278ms -6.7% 274ms 289ms
-5.3%
PythonMethodCalls: 358ms 357ms +0.3% 371ms 365ms
+1.6%
Recursion: 389ms 398ms -2.1% 395ms 407ms
-2.9%
SecondImport: 335ms 319ms +5.2% 346ms 380ms
-8.9%
SecondPackageImport: 338ms 326ms +3.6% 350ms 337ms
+4.0%
SecondSubmoduleImport: 413ms 403ms +2.5% 426ms 411ms
+3.6%
SimpleComplexArithmetic: 341ms 345ms -1.2% 351ms 355ms
-1.1%
SimpleDictManipulation: 288ms 298ms -3.7% 293ms 303ms
-3.1%
SimpleFloatArithmetic: 272ms 275ms -1.1% 279ms 286ms
-2.7%
SimpleIntFloatArithmetic: 211ms 204ms +3.3% 216ms 215ms
+0.5%
SimpleIntegerArithmetic: 207ms 203ms +1.7% 214ms 213ms
+0.5%
SimpleListComprehensions: 275ms 273ms +0.6% 281ms 281ms
-0.1%
SimpleListManipulation: 224ms 229ms -2.5% 234ms 241ms
-2.9%
SimpleLongArithmetic: 252ms 253ms -0.6% 263ms 266ms
-1.0%
SmallLists: 290ms 301ms -3.8% 299ms 311ms
-3.9%
SmallTuples: 254ms 253ms +0.3% 261ms 266ms
-1.9%
SpecialClassAttribute: 311ms 309ms +0.7% 320ms 321ms
-0.2%
SpecialInstanceAttribute: 358ms 358ms +0.1% 370ms 371ms
-0.4%
StringMappings: 817ms 833ms -1.9% 823ms 852ms
-3.4%
StringPredicates: 488ms 538ms -9.2% 495ms 547ms
-9.4%
StringSlicing: 295ms 296ms -0.2% 306ms 323ms
-5.5%
TryExcept: 282ms 280ms +1.0% 291ms 288ms
+1.0%
TryFinally: 290ms 255ms +14.1% 300ms 263ms
+14.1%
TryRaiseExcept: 261ms 256ms +1.7% 271ms 263ms
+3.1%
TupleSlicing: 281ms 270ms +4.3% 289ms 277ms
+4.3%
UnicodeMappings: 329ms 337ms -2.4% 337ms 348ms
-3.0%
UnicodePredicates: 295ms 329ms -10.4% 303ms 338ms
-10.5%
UnicodeProperties: 272ms 307ms -11.4% 284ms 313ms
-9.3%
UnicodeSlicing: 258ms 261ms -1.2% 266ms 275ms
-3.2%
WithFinally: 384ms 398ms -3.6% 394ms 408ms
-3.5%
WithRaiseExcept: 320ms 294ms +8.9% 336ms 304ms
+10.6%
-------------------------------------------------------------------------------

pitrou · 2009-05-16T09:13:54Z

Can you give results of the "richards" benchmark?

benjaminp · 2009-05-16T12:49:24Z

With the patch:

Richards benchmark (Python) starting... [<function entry_point at 0x63b430>]
finished.
Total time for 10 iterations: 8.49 secs
Average time per iteration: 848.90 ms

Without:

Richards benchmark (Python) starting... [<function entry_point at 0x637530>]
finished.
Total time for 10 iterations: 10.46 secs
Average time per iteration: 1045.88 ms

pitrou · 2009-05-16T14:25:05Z

Similar results here.
With the patch:

Richards benchmark (Python) starting... [<function entry_point at
0x7fc15b3f5848>]
finished.
Total time for 4 iterations: 1.78 secs
Average time per iteration: 443.90 ms

Without the patch:

Richards benchmark (Python) starting... [<function entry_point at
0x7f8b9fb20848>]
finished.
Total time for 4 iterations: 2.02 secs
Average time per iteration: 503.79 ms

rnk · 2009-11-25T02:21:15Z

One thing I was wondering about the current patch is what about objects
that have attributes that shadow methods? For example:

class C(object):
    def foo(self):
        return 1
c = c()
print c.foo()
c.foo = lambda: 2
print c.foo()

Shouldn't the above print 1 and 2? With the current patch, it seems
that you might still print 1.

There's also the possible performance drawback where you're loading
builtin C methods, so the optimization fails, but you end up calling
_PyType_Lookup twice. :(

I'm doing the same optimization for unladen swallow, and these were some
of the things I ran into. I think I'm going to write a
PyObject_GetMethod that tries to get a method without binding it, but if
it can't for any reason, it behaves just like PyObject_GetAttr and sets
a status code.

benjaminp · 2009-11-25T03:27:39Z

Yes, my patch introduces that regression you mention. PyPy solves this
by having the instances dictionary keep track of shadowing of the type
dictionary. Not easy for CPython... I wish you luck on your patch!

rnk · 2010-07-14T04:56:40Z

I have an patch for unladen-swallow out for review here:
http://codereview.appspot.com/160063/show

It resolves the correctness issues I mentioned previously by emitting guards if necessary. If the type is predictable and uses slots, then we don't need to check the instance dict.

It gives a 5% speedup on the unpickle benchmark. Presumably the other benchmarks do not do as many method calls.

pitrou · 2010-07-14T12:08:10Z

First, are these benchmark results jitted or non-jitted? Right now, non-jitted results are a stronger motivation for inclusion in main CPython, IMHO.

Second, 2.7 won't receive any features / performance improvements anymore. It would be nice to have 3.2 (non-jitted) benchmark results.

Third, if removing intermediate allocations is the kind of optimizations a JIT will do anyway, does it make sense or not to make these optimizations explicit at the bytecode level? (this is really an open question, not a rhetorical one)

rnk · 2010-07-14T16:45:22Z

Sorry, I was just posting it so Benjamin could see what this bought us. I'm not pushing to get this in CPython.

The results are for JITed code. I forget what the interpreted results are. I think they are good for the microbenchmarks, but not as good for the macro.

bitdancer · 2012-08-14T03:39:29Z

Benjamin confirms a regression in his patch, and the other patch was not intended for CPython. So I'm closing this issue.

benjaminp added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels May 16, 2009

bitdancer closed this as completed Aug 14, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LOOKUP_METHOD and CALL_METHOD optimization #50283

LOOKUP_METHOD and CALL_METHOD optimization #50283

benjaminp commented May 16, 2009

benjaminp commented May 16, 2009

pitrou commented May 16, 2009

benjaminp commented May 16, 2009

pitrou commented May 16, 2009

rnk mannequin commented Nov 25, 2009

benjaminp commented Nov 25, 2009

rnk mannequin commented Jul 14, 2010

pitrou commented Jul 14, 2010

rnk mannequin commented Jul 14, 2010

bitdancer commented Aug 14, 2012

LOOKUP_METHOD and CALL_METHOD optimization #50283

LOOKUP_METHOD and CALL_METHOD optimization #50283

Comments

benjaminp commented May 16, 2009

benjaminp commented May 16, 2009

pitrou commented May 16, 2009

benjaminp commented May 16, 2009

pitrou commented May 16, 2009

rnk mannequin commented Nov 25, 2009

benjaminp commented Nov 25, 2009

rnk mannequin commented Jul 14, 2010

pitrou commented Jul 14, 2010

rnk mannequin commented Jul 14, 2010

bitdancer commented Aug 14, 2012