Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline cache for slots #87093

Closed
gvanrossum opened this issue Jan 14, 2021 · 15 comments
Closed

Inline cache for slots #87093

gvanrossum opened this issue Jan 14, 2021 · 15 comments
Labels
3.10 only security fixes performance Performance or resource usage

Comments

@gvanrossum
Copy link
Member

BPO 42927
Nosy @gvanrossum, @rhettinger, @methane, @serhiy-storchaka, @1st1, @pablogsal, @isidentical
PRs
  • bpo-42927: Inline cache for slots #24216
  • bpo-42927: Update the What's new entry for LOAD_ATTR optimizations #24383
  • Files
  • result2.zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-01-30.02:03:05.447>
    created_at = <Date 2021-01-14.04:45:09.122>
    labels = ['3.10', 'performance']
    title = 'Inline cache for slots'
    updated_at = <Date 2021-01-31.22:55:58.106>
    user = 'https://github.com/gvanrossum'

    bugs.python.org fields:

    activity = <Date 2021-01-31.22:55:58.106>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-01-30.02:03:05.447>
    closer = 'gvanrossum'
    components = []
    creation = <Date 2021-01-14.04:45:09.122>
    creator = 'gvanrossum'
    dependencies = []
    files = ['49744']
    hgrepos = []
    issue_num = 42927
    keywords = ['patch']
    message_count = 15.0
    messages = ['385063', '385064', '385066', '385076', '385077', '385078', '385079', '385080', '385081', '385091', '385095', '385103', '385124', '385965', '386046']
    nosy_count = 7.0
    nosy_names = ['gvanrossum', 'rhettinger', 'methane', 'serhiy.storchaka', 'yselivanov', 'pablogsal', 'BTaskaya']
    pr_nums = ['24216', '24383']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue42927'
    versions = ['Python 3.10']

    @gvanrossum
    Copy link
    Member Author

    I've been thinking about Python performance improvements, and I played around with an inline cache enhancement that supports slots. The results on a very simple benchmark look promising (30% speedup) but I'm terrible with our benchmarking tools, and this should be considered *very* carefully before we go ahead with it, since it could potentially pessimize the inline cache for standard attributes.

    I'll attach a PR in a minute.

    @gvanrossum gvanrossum added 3.10 only security fixes performance Performance or resource usage labels Jan 14, 2021
    @serhiy-storchaka
    Copy link
    Member

    Slot means so many different things in Python. Here it's about data descriptors created when you set __slots__ in the class definition.

    It is amazing that so large speed up can be achieved for such base operation.

    @pablogsal
    Copy link
    Member

    I will try to run today the pyperformance test suite to see if there is any impact on standard attribute access

    @pablogsal
    Copy link
    Member

    These are the results:

    venv ❯ python -m pyperf compare_to json_old/* -G --min-speed=2 --table
    +---------------------+--------------------------------------+------------------------------------------+
    | Benchmark | 2021-01-14_09-40-master-14cfa325c298 | 2021-01-14_10-39-slotscache-9123a84ad8d4 |
    +=====================+======================================+==========================================+
    | float | 148 ms | 133 ms: 1.11x faster |
    +---------------------+--------------------------------------+------------------------------------------+
    | pidigits | 265 ms | 252 ms: 1.05x faster |
    +---------------------+--------------------------------------+------------------------------------------+
    | scimark_monte_carlo | 130 ms | 126 ms: 1.03x faster |
    +---------------------+--------------------------------------+------------------------------------------+
    | spectral_norm | 168 ms | 164 ms: 1.03x faster |
    +---------------------+--------------------------------------+------------------------------------------+
    | logging_silent | 221 ns | 216 ns: 1.02x faster |
    +---------------------+--------------------------------------+------------------------------------------+
    | sympy_expand | 598 ms | 585 ms: 1.02x faster |
    +---------------------+--------------------------------------+------------------------------------------+
    | regex_v8 | 31.9 ms | 32.6 ms: 1.02x slower |
    +---------------------+--------------------------------------+------------------------------------------+
    | deltablue | 8.93 ms | 9.16 ms: 1.03x slower |
    +---------------------+--------------------------------------+------------------------------------------+
    | unpickle | 17.0 us | 17.7 us: 1.04x slower |
    +---------------------+--------------------------------------+------------------------------------------+
    | Geometric mean | (ref) | 1.00x faster |
    +---------------------+--------------------------------------+------------------------------------------+

    Benchmark hidden because not significant (50): sympy_sum, sqlalchemy_imperative, sympy_str, sympy_integrate, dulwich_log, scimark_fft, genshi_text, tornado_http, regex_dna, sqlalchemy_declarative, mako, meteor_contest, unpickle_pure_python, xml_etree_parse, genshi_xml, scimark_sor, sqlite_synth, pickle_pure_python, nbody, pickle_dict, pyflate, regex_effbot, xml_etree_process, logging_simple, python_startup, 2to3, fannkuch, python_startup_no_site, raytrace, go, hexiom, scimark_lu, json_dumps, richards, logging_format, xml_etree_generate, chaos, telco, pickle_list, unpack_sequence, regex_compile, django_template, json_loads, crypto_pyaes, xml_etree_iterparse, unpickle_list, nqueens, chameleon, scimark_sparse_mat_mult, pickle
    Ignored benchmarks (1) of json_old/2021-01-14_09-40-master-14cfa325c298.json.gz: pathlib

    I uploaded the pyperf result data to bpo as well

    @pablogsal
    Copy link
    Member

    So it seems that everything is in the noise range except the "float" benchmark that is 1.11x faster

    @pablogsal
    Copy link
    Member

    Some microbenchmarks:

    CURRENT MASTER:

    Variable and attribute read access:
    4.5 ns read_local
    5.8 ns read_nonlocal
    7.8 ns read_global
    7.8 ns read_builtin
    21.2 ns read_classvar_from_class
    19.1 ns read_classvar_from_instance
    16.4 ns read_instancevar
    23.6 ns read_instancevar_slots
    20.2 ns read_namedtuple
    40.5 ns read_boundmethod

    PR 24216

    Variable and attribute read access:
    4.5 ns read_local
    5.8 ns read_nonlocal
    7.8 ns read_global
    8.2 ns read_builtin
    21.4 ns read_classvar_from_class
    19.0 ns read_classvar_from_instance
    16.6 ns read_instancevar
    10.2 ns read_instancevar_slots
    20.5 ns read_namedtuple
    42.9 ns read_boundmethod

    @1st1
    Copy link
    Member

    1st1 commented Jan 14, 2021

    So it seems that everything is in the noise range except the "float" benchmark that is 1.11x faster

    Yeah, this is why. https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_float.py#L12

    This is a great result, IMO. I'm +1 to merge this.

    @1st1
    Copy link
    Member

    1st1 commented Jan 14, 2021

    Some microbenchmarks:

    Can you add a new one read_slots?

    @pablogsal
    Copy link
    Member

    This is a great result, IMO. I'm +1 to merge this.

    Same here. It would be good if Inada-san could confirm the benchmarks.

    Can you add a new one read_slots?

    That should be "read_instancevar_slots". Or you refer to some other check?

    @rhettinger
    Copy link
    Contributor

    +1 Thanks for this.

    @gvanrossum
    Copy link
    Member Author

    Thanks for all the positive feedback! What is the next step?

    @pablogsal
    Copy link
    Member

    What is the next step?

    I would say just finishing the PR (making sure that we do not miss some arcane edge case) and updating the what's new for 3.10 :)

    @gvanrossum
    Copy link
    Member Author

    I created a benchmark for this, see python/pyperformance#86.

    Next I'll add a blurb and then it's off to reviewers.

    @gvanrossum
    Copy link
    Member Author

    New changeset 5c5a938 by Guido van Rossum in branch 'master':
    bpo-42927: Inline cache for attributes defined with '__slots__' (bpo-24216)
    5c5a938

    @pablogsal
    Copy link
    Member

    New changeset a776da9 by Pablo Galindo in branch 'master':
    bpo-42927: Update the What's new entry for LOAD_ATTR optimizations (GH-24383)
    a776da9

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants