This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: LOOKUP_METHOD and CALL_METHOD optimization
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution:
Dependencies: Superseder: Speedup method calls 1.2x
View: 26110
Assigned To: Nosy List: benjamin.peterson, collinwinter, jyasskin, pitrou, r.david.murray, rnk
Priority: low Keywords: patch

Created on 2009-05-16 01:43 by benjamin.peterson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
call_method.patch benjamin.peterson, 2009-05-16 01:43
Messages (10)
msg87850 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-05-16 01:43
This is an optimization ported from PyPy. [1] It tries to prevent bound
methods from being created by using the stack as a cache. I couldn't
apply this to builtin methods because those use a method-wrapper
descriptor. The results were not very impressive. However, I'm attaching
the patch to see if anyone else wants to look at it.

[1]
http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#id1


Test                             minimum run-time        average  run-time
                                 this    other   diff    this    other 
 diff
-------------------------------------------------------------------------------
          BuiltinFunctionCalls:   342ms   340ms   +0.6%   378ms   361ms
  +4.7%
           BuiltinMethodLookup:   315ms   308ms   +2.3%   333ms   319ms
  +4.5%
                 CompareFloats:   250ms   251ms   -0.3%   257ms   258ms
  -0.7%
         CompareFloatsIntegers:   266ms   265ms   +0.4%   273ms   273ms
  -0.2%
               CompareIntegers:   233ms   232ms   +0.4%   238ms   238ms
  -0.2%
        CompareInternedStrings:   279ms   275ms   +1.3%   285ms   284ms
  +0.3%
                  CompareLongs:   225ms   223ms   +1.1%   231ms   229ms
  +0.8%
                CompareStrings:   238ms   235ms   +1.3%   244ms   243ms
  +0.6%
                CompareUnicode:   243ms   246ms   -0.9%   252ms   252ms
  -0.1%
    ComplexPythonFunctionCalls:   307ms   301ms   +2.0%   315ms   309ms
  +1.8%
                 ConcatStrings:   372ms   366ms   +1.6%   376ms   385ms
  -2.1%
                 ConcatUnicode:   260ms   259ms   +0.4%   266ms   269ms
  -0.9%
               CreateInstances:   351ms   336ms   +4.7%   365ms   346ms
  +5.5%
            CreateNewInstances:   265ms   256ms   +3.6%   281ms   264ms
  +6.5%
       CreateStringsWithConcat:   290ms   289ms   +0.1%   304ms   301ms
  +1.2%
       CreateUnicodeWithConcat:   220ms   219ms   +0.8%   227ms   223ms
  +1.8%
                  DictCreation:   201ms   200ms   +0.4%   204ms   206ms
  -1.0%
             DictWithFloatKeys:   400ms   418ms   -4.4%   410ms   424ms
  -3.4%
           DictWithIntegerKeys:   298ms   294ms   +1.2%   306ms   304ms
  +0.6%
            DictWithStringKeys:   260ms   264ms   -1.5%   270ms   275ms
  -2.1%
                      ForLoops:   224ms   223ms   +0.2%   232ms   232ms
  +0.2%
                    IfThenElse:   160ms   160ms   +0.0%   168ms   182ms
  -8.0%
                   ListSlicing:   293ms   292ms   +0.5%   302ms   306ms
  -1.2%
                NestedForLoops:   301ms   300ms   +0.2%   305ms   308ms
  -0.8%
      NestedListComprehensions:   323ms   328ms   -1.7%   331ms   335ms
  -1.3%
          NormalClassAttribute:   313ms   314ms   -0.1%   323ms   330ms
  -2.1%
       NormalInstanceAttribute:   284ms   283ms   +0.4%   289ms   288ms
  +0.4%
           PythonFunctionCalls:   259ms   278ms   -6.7%   274ms   289ms
  -5.3%
             PythonMethodCalls:   358ms   357ms   +0.3%   371ms   365ms
  +1.6%
                     Recursion:   389ms   398ms   -2.1%   395ms   407ms
  -2.9%
                  SecondImport:   335ms   319ms   +5.2%   346ms   380ms
  -8.9%
           SecondPackageImport:   338ms   326ms   +3.6%   350ms   337ms
  +4.0%
         SecondSubmoduleImport:   413ms   403ms   +2.5%   426ms   411ms
  +3.6%
       SimpleComplexArithmetic:   341ms   345ms   -1.2%   351ms   355ms
  -1.1%
        SimpleDictManipulation:   288ms   298ms   -3.7%   293ms   303ms
  -3.1%
         SimpleFloatArithmetic:   272ms   275ms   -1.1%   279ms   286ms
  -2.7%
      SimpleIntFloatArithmetic:   211ms   204ms   +3.3%   216ms   215ms
  +0.5%
       SimpleIntegerArithmetic:   207ms   203ms   +1.7%   214ms   213ms
  +0.5%
      SimpleListComprehensions:   275ms   273ms   +0.6%   281ms   281ms
  -0.1%
        SimpleListManipulation:   224ms   229ms   -2.5%   234ms   241ms
  -2.9%
          SimpleLongArithmetic:   252ms   253ms   -0.6%   263ms   266ms
  -1.0%
                    SmallLists:   290ms   301ms   -3.8%   299ms   311ms
  -3.9%
                   SmallTuples:   254ms   253ms   +0.3%   261ms   266ms
  -1.9%
         SpecialClassAttribute:   311ms   309ms   +0.7%   320ms   321ms
  -0.2%
      SpecialInstanceAttribute:   358ms   358ms   +0.1%   370ms   371ms
  -0.4%
                StringMappings:   817ms   833ms   -1.9%   823ms   852ms
  -3.4%
              StringPredicates:   488ms   538ms   -9.2%   495ms   547ms
  -9.4%
                 StringSlicing:   295ms   296ms   -0.2%   306ms   323ms
  -5.5%
                     TryExcept:   282ms   280ms   +1.0%   291ms   288ms
  +1.0%
                    TryFinally:   290ms   255ms  +14.1%   300ms   263ms
 +14.1%
                TryRaiseExcept:   261ms   256ms   +1.7%   271ms   263ms
  +3.1%
                  TupleSlicing:   281ms   270ms   +4.3%   289ms   277ms
  +4.3%
               UnicodeMappings:   329ms   337ms   -2.4%   337ms   348ms
  -3.0%
             UnicodePredicates:   295ms   329ms  -10.4%   303ms   338ms
 -10.5%
             UnicodeProperties:   272ms   307ms  -11.4%   284ms   313ms
  -9.3%
                UnicodeSlicing:   258ms   261ms   -1.2%   266ms   275ms
  -3.2%
                   WithFinally:   384ms   398ms   -3.6%   394ms   408ms
  -3.5%
               WithRaiseExcept:   320ms   294ms   +8.9%   336ms   304ms
 +10.6%
-------------------------------------------------------------------------------
msg87870 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-16 09:13
Can you give results of the "richards" benchmark?
msg87889 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-05-16 12:49
With the patch:

Richards benchmark (Python) starting... [<function entry_point at 0x63b430>]
finished.
Total time for 10 iterations: 8.49 secs
Average time per iteration: 848.90 ms

Without:

Richards benchmark (Python) starting... [<function entry_point at 0x637530>]
finished.
Total time for 10 iterations: 10.46 secs
Average time per iteration: 1045.88 ms
msg87896 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-16 14:25
Similar results here.
With the patch:

Richards benchmark (Python) starting... [<function entry_point at
0x7fc15b3f5848>]
finished.
Total time for 4 iterations: 1.78 secs
Average time per iteration: 443.90 ms

Without the patch:

Richards benchmark (Python) starting... [<function entry_point at
0x7f8b9fb20848>]
finished.
Total time for 4 iterations: 2.02 secs
Average time per iteration: 503.79 ms
msg95707 - (view) Author: Reid Kleckner (rnk) (Python committer) Date: 2009-11-25 02:21
One thing I was wondering about the current patch is what about objects
that have attributes that shadow methods?  For example:

class C(object):
    def foo(self):
        return 1
c = c()
print c.foo()
c.foo = lambda: 2
print c.foo()

Shouldn't the above print 1 and 2?  With the current patch, it seems
that you might still print 1.

There's also the possible performance drawback where you're loading
builtin C methods, so the optimization fails, but you end up calling
_PyType_Lookup twice.  :(

I'm doing the same optimization for unladen swallow, and these were some
of the things I ran into.  I think I'm going to write a
PyObject_GetMethod that tries to get a method without binding it, but if
it can't for any reason, it behaves just like PyObject_GetAttr and sets
a status code.
msg95710 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-11-25 03:27
Yes, my patch introduces that regression you mention. PyPy solves this
by having the instances dictionary keep track of shadowing of the type
dictionary. Not easy for CPython... I wish you luck on your patch!
msg110253 - (view) Author: Reid Kleckner (rnk) (Python committer) Date: 2010-07-14 04:56
I have an patch for unladen-swallow out for review here:
http://codereview.appspot.com/160063/show

It resolves the correctness issues I mentioned previously by emitting guards if necessary.  If the type is predictable and uses slots, then we don't need to check the instance dict.

It gives a 5% speedup on the unpickle benchmark.  Presumably the other benchmarks do not do as many method calls.
msg110272 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-14 12:08
First, are these benchmark results jitted or non-jitted? Right now, non-jitted results are a stronger motivation for inclusion in main CPython, IMHO. 

Second, 2.7 won't receive any features / performance improvements anymore. It would be nice to have 3.2 (non-jitted) benchmark results.

Third, if removing intermediate allocations is the kind of optimizations a JIT will do anyway, does it make sense or not to make these optimizations explicit at the bytecode level? (this is really an open question, not a rhetorical one)
msg110298 - (view) Author: Reid Kleckner (rnk) (Python committer) Date: 2010-07-14 16:45
Sorry, I was just posting it so Benjamin could see what this bought us.  I'm not pushing to get this in CPython.

The results are for JITed code.  I forget what the interpreted results are.  I think they are good for the microbenchmarks, but not as good for the macro.
msg168170 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-08-14 03:39
Benjamin confirms a regression in his patch, and the other patch was not intended for CPython.  So I'm closing this issue.
History
Date User Action Args
2022-04-11 14:56:48adminsetgithub: 50283
2016-01-14 17:55:24yselivanovsetsuperseder: Speedup method calls 1.2x
2012-08-14 03:39:29r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg168170

stage: patch review -> resolved
2010-07-14 16:45:22rnksetmessages: + msg110298
2010-07-14 12:11:47pitrousetstage: patch review
versions: - Python 2.7
2010-07-14 12:08:10pitrousetnosy: + collinwinter, jyasskin
messages: + msg110272
2010-07-14 04:56:43rnksetmessages: + msg110253
2009-11-25 03:27:40benjamin.petersonsetmessages: + msg95710
2009-11-25 02:21:16rnksetmessages: + msg95707
2009-11-24 17:15:04rnksetnosy: + rnk
2009-05-16 14:25:05pitrousetmessages: + msg87896
2009-05-16 12:49:25benjamin.petersonsetmessages: + msg87889
2009-05-16 09:13:54pitrousetnosy: + pitrou
messages: + msg87870
2009-05-16 01:43:58benjamin.petersoncreate