Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOOKUP_METHOD and CALL_METHOD optimization #50283

Closed
benjaminp opened this issue May 16, 2009 · 10 comments
Closed

LOOKUP_METHOD and CALL_METHOD optimization #50283

benjaminp opened this issue May 16, 2009 · 10 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@benjaminp
Copy link
Contributor

BPO 6033
Nosy @pitrou, @benjaminp, @bitdancer, @rnk
Superseder
  • bpo-26110: Speedup method calls 1.2x
  • Files
  • call_method.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-08-14.03:39:29.384>
    created_at = <Date 2009-05-16.01:43:58.126>
    labels = ['interpreter-core', 'performance']
    title = 'LOOKUP_METHOD and CALL_METHOD optimization'
    updated_at = <Date 2016-01-14.17:55:24.378>
    user = 'https://github.com/benjaminp'

    bugs.python.org fields:

    activity = <Date 2016-01-14.17:55:24.378>
    actor = 'yselivanov'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-08-14.03:39:29.384>
    closer = 'r.david.murray'
    components = ['Interpreter Core']
    creation = <Date 2009-05-16.01:43:58.126>
    creator = 'benjamin.peterson'
    dependencies = []
    files = ['13992']
    hgrepos = []
    issue_num = 6033
    keywords = ['patch']
    message_count = 10.0
    messages = ['87850', '87870', '87889', '87896', '95707', '95710', '110253', '110272', '110298', '168170']
    nosy_count = 6.0
    nosy_names = ['collinwinter', 'pitrou', 'jyasskin', 'benjamin.peterson', 'r.david.murray', 'rnk']
    pr_nums = []
    priority = 'low'
    resolution = None
    stage = 'resolved'
    status = 'closed'
    superseder = '26110'
    type = 'performance'
    url = 'https://bugs.python.org/issue6033'
    versions = ['Python 3.2']

    @benjaminp
    Copy link
    Contributor Author

    This is an optimization ported from PyPy. [1] It tries to prevent bound
    methods from being created by using the stack as a cache. I couldn't
    apply this to builtin methods because those use a method-wrapper
    descriptor. The results were not very impressive. However, I'm attaching
    the patch to see if anyone else wants to look at it.

    [1]
    http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#id1

    Test minimum run-time average run-time
    this other diff this other
    diff
    -------------------------------------------------------------------------------
    BuiltinFunctionCalls: 342ms 340ms +0.6% 378ms 361ms
    +4.7%
    BuiltinMethodLookup: 315ms 308ms +2.3% 333ms 319ms
    +4.5%
    CompareFloats: 250ms 251ms -0.3% 257ms 258ms
    -0.7%
    CompareFloatsIntegers: 266ms 265ms +0.4% 273ms 273ms
    -0.2%
    CompareIntegers: 233ms 232ms +0.4% 238ms 238ms
    -0.2%
    CompareInternedStrings: 279ms 275ms +1.3% 285ms 284ms
    +0.3%
    CompareLongs: 225ms 223ms +1.1% 231ms 229ms
    +0.8%
    CompareStrings: 238ms 235ms +1.3% 244ms 243ms
    +0.6%
    CompareUnicode: 243ms 246ms -0.9% 252ms 252ms
    -0.1%
    ComplexPythonFunctionCalls: 307ms 301ms +2.0% 315ms 309ms
    +1.8%
    ConcatStrings: 372ms 366ms +1.6% 376ms 385ms
    -2.1%
    ConcatUnicode: 260ms 259ms +0.4% 266ms 269ms
    -0.9%
    CreateInstances: 351ms 336ms +4.7% 365ms 346ms
    +5.5%
    CreateNewInstances: 265ms 256ms +3.6% 281ms 264ms
    +6.5%
    CreateStringsWithConcat: 290ms 289ms +0.1% 304ms 301ms
    +1.2%
    CreateUnicodeWithConcat: 220ms 219ms +0.8% 227ms 223ms
    +1.8%
    DictCreation: 201ms 200ms +0.4% 204ms 206ms
    -1.0%
    DictWithFloatKeys: 400ms 418ms -4.4% 410ms 424ms
    -3.4%
    DictWithIntegerKeys: 298ms 294ms +1.2% 306ms 304ms
    +0.6%
    DictWithStringKeys: 260ms 264ms -1.5% 270ms 275ms
    -2.1%
    ForLoops: 224ms 223ms +0.2% 232ms 232ms
    +0.2%
    IfThenElse: 160ms 160ms +0.0% 168ms 182ms
    -8.0%
    ListSlicing: 293ms 292ms +0.5% 302ms 306ms
    -1.2%
    NestedForLoops: 301ms 300ms +0.2% 305ms 308ms
    -0.8%
    NestedListComprehensions: 323ms 328ms -1.7% 331ms 335ms
    -1.3%
    NormalClassAttribute: 313ms 314ms -0.1% 323ms 330ms
    -2.1%
    NormalInstanceAttribute: 284ms 283ms +0.4% 289ms 288ms
    +0.4%
    PythonFunctionCalls: 259ms 278ms -6.7% 274ms 289ms
    -5.3%
    PythonMethodCalls: 358ms 357ms +0.3% 371ms 365ms
    +1.6%
    Recursion: 389ms 398ms -2.1% 395ms 407ms
    -2.9%
    SecondImport: 335ms 319ms +5.2% 346ms 380ms
    -8.9%
    SecondPackageImport: 338ms 326ms +3.6% 350ms 337ms
    +4.0%
    SecondSubmoduleImport: 413ms 403ms +2.5% 426ms 411ms
    +3.6%
    SimpleComplexArithmetic: 341ms 345ms -1.2% 351ms 355ms
    -1.1%
    SimpleDictManipulation: 288ms 298ms -3.7% 293ms 303ms
    -3.1%
    SimpleFloatArithmetic: 272ms 275ms -1.1% 279ms 286ms
    -2.7%
    SimpleIntFloatArithmetic: 211ms 204ms +3.3% 216ms 215ms
    +0.5%
    SimpleIntegerArithmetic: 207ms 203ms +1.7% 214ms 213ms
    +0.5%
    SimpleListComprehensions: 275ms 273ms +0.6% 281ms 281ms
    -0.1%
    SimpleListManipulation: 224ms 229ms -2.5% 234ms 241ms
    -2.9%
    SimpleLongArithmetic: 252ms 253ms -0.6% 263ms 266ms
    -1.0%
    SmallLists: 290ms 301ms -3.8% 299ms 311ms
    -3.9%
    SmallTuples: 254ms 253ms +0.3% 261ms 266ms
    -1.9%
    SpecialClassAttribute: 311ms 309ms +0.7% 320ms 321ms
    -0.2%
    SpecialInstanceAttribute: 358ms 358ms +0.1% 370ms 371ms
    -0.4%
    StringMappings: 817ms 833ms -1.9% 823ms 852ms
    -3.4%
    StringPredicates: 488ms 538ms -9.2% 495ms 547ms
    -9.4%
    StringSlicing: 295ms 296ms -0.2% 306ms 323ms
    -5.5%
    TryExcept: 282ms 280ms +1.0% 291ms 288ms
    +1.0%
    TryFinally: 290ms 255ms +14.1% 300ms 263ms
    +14.1%
    TryRaiseExcept: 261ms 256ms +1.7% 271ms 263ms
    +3.1%
    TupleSlicing: 281ms 270ms +4.3% 289ms 277ms
    +4.3%
    UnicodeMappings: 329ms 337ms -2.4% 337ms 348ms
    -3.0%
    UnicodePredicates: 295ms 329ms -10.4% 303ms 338ms
    -10.5%
    UnicodeProperties: 272ms 307ms -11.4% 284ms 313ms
    -9.3%
    UnicodeSlicing: 258ms 261ms -1.2% 266ms 275ms
    -3.2%
    WithFinally: 384ms 398ms -3.6% 394ms 408ms
    -3.5%
    WithRaiseExcept: 320ms 294ms +8.9% 336ms 304ms
    +10.6%
    -------------------------------------------------------------------------------

    @benjaminp benjaminp added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels May 16, 2009
    @pitrou
    Copy link
    Member

    pitrou commented May 16, 2009

    Can you give results of the "richards" benchmark?

    @benjaminp
    Copy link
    Contributor Author

    With the patch:

    Richards benchmark (Python) starting... [<function entry_point at 0x63b430>]
    finished.
    Total time for 10 iterations: 8.49 secs
    Average time per iteration: 848.90 ms

    Without:

    Richards benchmark (Python) starting... [<function entry_point at 0x637530>]
    finished.
    Total time for 10 iterations: 10.46 secs
    Average time per iteration: 1045.88 ms

    @pitrou
    Copy link
    Member

    pitrou commented May 16, 2009

    Similar results here.
    With the patch:

    Richards benchmark (Python) starting... [<function entry_point at
    0x7fc15b3f5848>]
    finished.
    Total time for 4 iterations: 1.78 secs
    Average time per iteration: 443.90 ms

    Without the patch:

    Richards benchmark (Python) starting... [<function entry_point at
    0x7f8b9fb20848>]
    finished.
    Total time for 4 iterations: 2.02 secs
    Average time per iteration: 503.79 ms

    @rnk
    Copy link
    Mannequin

    rnk mannequin commented Nov 25, 2009

    One thing I was wondering about the current patch is what about objects
    that have attributes that shadow methods? For example:

    class C(object):
        def foo(self):
            return 1
    c = c()
    print c.foo()
    c.foo = lambda: 2
    print c.foo()

    Shouldn't the above print 1 and 2? With the current patch, it seems
    that you might still print 1.

    There's also the possible performance drawback where you're loading
    builtin C methods, so the optimization fails, but you end up calling
    _PyType_Lookup twice. :(

    I'm doing the same optimization for unladen swallow, and these were some
    of the things I ran into. I think I'm going to write a
    PyObject_GetMethod that tries to get a method without binding it, but if
    it can't for any reason, it behaves just like PyObject_GetAttr and sets
    a status code.

    @benjaminp
    Copy link
    Contributor Author

    Yes, my patch introduces that regression you mention. PyPy solves this
    by having the instances dictionary keep track of shadowing of the type
    dictionary. Not easy for CPython... I wish you luck on your patch!

    @rnk
    Copy link
    Mannequin

    rnk mannequin commented Jul 14, 2010

    I have an patch for unladen-swallow out for review here:
    http://codereview.appspot.com/160063/show

    It resolves the correctness issues I mentioned previously by emitting guards if necessary. If the type is predictable and uses slots, then we don't need to check the instance dict.

    It gives a 5% speedup on the unpickle benchmark. Presumably the other benchmarks do not do as many method calls.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 14, 2010

    First, are these benchmark results jitted or non-jitted? Right now, non-jitted results are a stronger motivation for inclusion in main CPython, IMHO.

    Second, 2.7 won't receive any features / performance improvements anymore. It would be nice to have 3.2 (non-jitted) benchmark results.

    Third, if removing intermediate allocations is the kind of optimizations a JIT will do anyway, does it make sense or not to make these optimizations explicit at the bytecode level? (this is really an open question, not a rhetorical one)

    @rnk
    Copy link
    Mannequin

    rnk mannequin commented Jul 14, 2010

    Sorry, I was just posting it so Benjamin could see what this bought us. I'm not pushing to get this in CPython.

    The results are for JITed code. I forget what the interpreted results are. I think they are good for the microbenchmarks, but not as good for the macro.

    @bitdancer
    Copy link
    Member

    Benjamin confirms a regression in his patch, and the other patch was not intended for CPython. So I'm closing this issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants