Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove free_list for bound method objects #81521

Closed
methane opened this issue Jun 19, 2019 · 14 comments
Closed

remove free_list for bound method objects #81521

methane opened this issue Jun 19, 2019 · 14 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@methane
Copy link
Member

methane commented Jun 19, 2019

BPO 37340
Nosy @rhettinger, @vstinner, @methane, @ericsnowcurrently, @jdemeyer, @MojoVampire
PRs
  • bpo-37340: remove free_list for bound method objects #14232
  • bpo-37340: Remove PyMethod_ClearFreeList() and PyCFunction_ClearFreeList() #17284
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-06-19.12:54:21.485>
    labels = ['interpreter-core', '3.9', 'performance']
    title = 'remove free_list for bound method objects'
    updated_at = <Date 2020-04-28.21:28:25.171>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2020-04-28.21:28:25.171>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2019-06-19.12:54:21.485>
    creator = 'methane'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37340
    keywords = ['patch']
    message_count = 14.0
    messages = ['346040', '346041', '346042', '346043', '346044', '346045', '346057', '346094', '348425', '348465', '357065', '367557', '367563', '367565']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'vstinner', 'methane', 'eric.snow', 'jdemeyer', 'josh.r']
    pr_nums = ['14232', '17284']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue37340'
    versions = ['Python 3.9']

    @methane
    Copy link
    Member Author

    methane commented Jun 19, 2019

    LOAD_METHOD avoids temporary bound method object.
    PyObject_CallMethodObjArgs now use same optimization.

    Now I think there is not enough performance benefit from free_list.
    When free_list is not used often enough, it may bother obmalloc reuse memory pool.

    This is performance diff of removing free_list (with LTO, without PGO, patched=removed free_list):

    $ ./python -m pyperf compare_to master.json patched.json -G --min-speed=1
    Slower (19):
    - sqlite_synth: 4.03 us +- 0.10 us -> 4.20 us +- 0.08 us: 1.04x slower (+4%)
    - genshi_text: 41.2 ms +- 0.4 ms -> 42.6 ms +- 0.4 ms: 1.03x slower (+3%)
    - scimark_sparse_mat_mult: 6.29 ms +- 0.03 ms -> 6.50 ms +- 0.50 ms: 1.03x slower (+3%)
    - mako: 26.5 ms +- 0.1 ms -> 27.4 ms +- 0.3 ms: 1.03x slower (+3%)
    - html5lib: 130 ms +- 4 ms -> 134 ms +- 5 ms: 1.03x slower (+3%)
    - genshi_xml: 83.4 ms +- 1.1 ms -> 85.6 ms +- 1.2 ms: 1.03x slower (+3%)
    - pickle: 15.1 us +- 0.5 us -> 15.5 us +- 0.5 us: 1.03x slower (+3%)
    - float: 161 ms +- 1 ms -> 165 ms +- 1 ms: 1.02x slower (+2%)
    - logging_simple: 13.9 us +- 0.2 us -> 14.2 us +- 0.2 us: 1.02x slower (+2%)
    - xml_etree_process: 108 ms +- 1 ms -> 110 ms +- 1 ms: 1.02x slower (+2%)
    - pathlib: 28.0 ms +- 0.2 ms -> 28.5 ms +- 0.3 ms: 1.02x slower (+2%)
    - pickle_pure_python: 703 us +- 8 us -> 715 us +- 7 us: 1.02x slower (+2%)
    - sympy_expand: 553 ms +- 5 ms -> 563 ms +- 12 ms: 1.02x slower (+2%)
    - xml_etree_generate: 136 ms +- 2 ms -> 138 ms +- 1 ms: 1.02x slower (+2%)
    - logging_format: 15.3 us +- 0.2 us -> 15.5 us +- 0.2 us: 1.01x slower (+1%)
    - json_dumps: 17.4 ms +- 0.1 ms -> 17.7 ms +- 0.2 ms: 1.01x slower (+1%)
    - logging_silent: 266 ns +- 5 ns -> 269 ns +- 9 ns: 1.01x slower (+1%)
    - django_template: 163 ms +- 1 ms -> 165 ms +- 2 ms: 1.01x slower (+1%)
    - sympy_sum: 219 ms +- 2 ms -> 222 ms +- 2 ms: 1.01x slower (+1%)
    
    Faster (6):
    - regex_effbot: 4.51 ms +- 0.04 ms -> 4.44 ms +- 0.03 ms: 1.02x faster (-2%)
    - pickle_list: 5.21 us +- 0.04 us -> 5.13 us +- 0.04 us: 1.01x faster (-1%)
    - crypto_pyaes: 164 ms +- 1 ms -> 162 ms +- 1 ms: 1.01x faster (-1%)
    - xml_etree_parse: 202 ms +- 7 ms -> 200 ms +- 3 ms: 1.01x faster (-1%)
    - scimark_sor: 287 ms +- 6 ms -> 284 ms +- 6 ms: 1.01x faster (-1%)
    - raytrace: 758 ms +- 26 ms -> 750 ms +- 11 ms: 1.01x faster (-1%)
    
    Benchmark hidden because not significant (35)
    

    I think free_list is useful only when several benchmarks in pyperformance shows more than 5% speedup.
    The benefit is smaller than my threshold. I will run pyperformance again after bpo-37337 is merged.

    FWIW, In case of sqlite_synth, I think performance difference came from here:

    stepmethod = PyObject_GetAttrString(*aggregate_instance, "step");

    If performance of user-defined aggregate feature is really important, we can optimize it further.

    @methane methane added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jun 19, 2019
    @methane
    Copy link
    Member Author

    methane commented Jun 19, 2019

    PyCFunction has similar free_list.

    @jdemeyer
    Copy link
    Contributor

    I will run pyperformance again after bpo-37337 is merged.

    Good idea. bpo-37337 removes many calls of _PyObject_CallMethodId() which does NOT use the LOAD_METHOD optimization.

    @methane
    Copy link
    Member Author

    methane commented Jun 19, 2019

    #58440 uses only one free object instead of at most 256 free list.

    @jdemeyer
    Copy link
    Contributor

    #58440 uses only one free object instead of at most 256 free list.

    That sounds like a compromise which is worse than either extreme: it's worse than no free list because you still have the overhead of dealing with the one object. And it's worse than a free list of 256 because it won't for nested calls.

    @jdemeyer
    Copy link
    Contributor

    Are the benchmark results that you posted above for removing the free list completely or for the one-element free list as in PR 14232?

    @methane
    Copy link
    Member Author

    methane commented Jun 19, 2019

    Sorry, I just pushed another idea before I leave my office.
    I will benchmark both ideas after bpo-37337 is finished.

    @methane
    Copy link
    Member Author

    methane commented Jun 20, 2019

    Are the benchmark results that you posted above for removing the free list completely or for the one-element free list as in PR 14232?

    Yes. I see +3% overhead in several benchmarks and I want to see how one free obj save them.

    it's worse than no free list because you still have the overhead of dealing with the one object.

    Yes. But it still helps some common simple cases.
    (e.g. PyObject_CallMethod() (not CallMethodObjArg))

    And it's worse than a free list of 256 because it won't for nested calls.

    256 free list may bother 256 pools are reused. But one free obj keeps only one pool. And we can remove the overhead of linked list too.

    Of course, I prefer a simple code. But let's wait for benchmark result.

    @methane
    Copy link
    Member Author

    methane commented Jul 25, 2019

    Latest benchmark:

    $ pyperf compare_to master.json no-freelist.json -G --min-speed=2
    Slower (2):
    - unpickle_list: 4.19 us +- 0.02 us -> 4.28 us +- 0.03 us: 1.02x slower (+2%)
    - pathlib: 23.2 ms +- 0.2 ms -> 23.7 ms +- 0.3 ms: 1.02x slower (+2%)
    
    Faster (2):
    - nbody: 149 ms +- 2 ms -> 141 ms +- 2 ms: 1.06x faster (-6%)
    - logging_simple: 10.1 us +- 0.2 us -> 9.86 us +- 0.18 us: 1.02x faster (-2%)
    
    Benchmark hidden because not significant (56)
    

    I decided to just remove the free_list.

    @methane
    Copy link
    Member Author

    methane commented Jul 26, 2019

    New changeset 3e54b57 by Inada Naoki in branch 'master':
    bpo-37340: remove free_list for bound method objects (GH-14232)
    3e54b57

    @methane methane closed this as completed Jul 26, 2019
    @vstinner
    Copy link
    Member

    New changeset 4dedd0f by Victor Stinner in branch 'master':
    bpo-37340: Remove PyMethod_ClearFreeList() and PyCFunction_ClearFreeList() (GH-17284)
    4dedd0f

    @rhettinger
    Copy link
    Contributor

    This caused a performance regression (70%) for a fundamental operation. See bpo-37340.

    Sometimes, bound methods are used directly and not through LOAD_METHOD:

         sorted(data, key=str.upper)

    @rhettinger rhettinger reopened this Apr 28, 2020
    @vstinner
    Copy link
    Member

    This caused a performance regression (70%) for a fundamental operation. See bpo-37340.

    Which issue? This is bpo-37340.

    @vstinner
    Copy link
    Member

    sorted(data, key=str.upper)

    Oh, I guess that it's bpo-39117.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants