Add opcode cache for LOAD_ATTR #86259

pablogsal · 2020-10-20T03:18:36Z

BPO	42093
Nosy	@gvanrossum, @vstinner, @1st1, @gvanrossum, @pablogsal
PRs	bpo-42093: Add opcode cache for LOAD_ATTR #22803 bpo-42093: Tweak the what's new message about the new LOAD_ATTR opcode cache #24070 bpo-42093: Cleanup _PyDict_GetItemHint() #24582

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/pablogsal'
closed_at = <Date 2020-10-20.05:27:36.232>
created_at = <Date 2020-10-20.03:18:36.160>
labels = ['interpreter-core', '3.10']
title = 'Add opcode cache for LOAD_ATTR'
updated_at = <Date 2021-02-21.11:02:18.003>
user = 'https://github.com/pablogsal'

bugs.python.org fields:

activity = <Date 2021-02-21.11:02:18.003>
actor = 'vstinner'
assignee = 'pablogsal'
closed = True
closed_date = <Date 2020-10-20.05:27:36.232>
closer = 'pablogsal'
components = ['Interpreter Core']
creation = <Date 2020-10-20.03:18:36.160>
creator = 'pablogsal'
dependencies = []
files = []
hgrepos = []
issue_num = 42093
keywords = ['patch']
message_count = 9.0
messages = ['379083', '379087', '383859', '384087', '384254', '384354', '384364', '384366', '387451']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'vstinner', 'yselivanov', 'Guido.van.Rossum', 'pablogsal']
pr_nums = ['22803', '24070', '24582']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue42093'
versions = ['Python 3.10']

pablogsal · 2020-10-20T03:18:36Z

From the creators of "opcode cache for LOAD_GLOBAL" (https://bugs.python.org/issue26219) now it's time for "opcode cache for LOAD_ATTR: the revenge". This issue/PR builds on top of Yury's original patch in the same way https://bugs.python.org/issue26219 did for LOAD_GLOBAL.

These are the benchmark results for the pyperformance test suite with PGO/LTO/CPU ISOLATION in a tuned system with pyperf:

+-------------------------+--------------------------------------+-------------------------------------+
| Benchmark | 2020-10-20_01-18-master-de73d432bb29 | 2020-10-20_02-28-cache-68f931f6938a |
+=========================+======================================+=====================================+
| go | 407 ms | 349 ms: 1.17x faster (-14%) |
+-------------------------+--------------------------------------+-------------------------------------+
| raytrace | 822 ms | 730 ms: 1.13x faster (-11%) |
+-------------------------+--------------------------------------+-------------------------------------+
| unpickle_pure_python | 497 us | 447 us: 1.11x faster (-10%) |
+-------------------------+--------------------------------------+-------------------------------------+
| scimark_sor | 311 ms | 280 ms: 1.11x faster (-10%) |
+-------------------------+--------------------------------------+-------------------------------------+
| hexiom | 15.4 ms | 14.0 ms: 1.10x faster (-9%) |
+-------------------------+--------------------------------------+-------------------------------------+
| logging_silent | 302 ns | 276 ns: 1.10x faster (-9%) |
+-------------------------+--------------------------------------+-------------------------------------+
| chaos | 176 ms | 163 ms: 1.08x faster (-7%) |
+-------------------------+--------------------------------------+-------------------------------------+
| pyflate | 1.01 sec | 948 ms: 1.06x faster (-6%) |
+-------------------------+--------------------------------------+-------------------------------------+
| scimark_lu | 246 ms | 232 ms: 1.06x faster (-6%) |
+-------------------------+--------------------------------------+-------------------------------------+
| pickle_pure_python | 712 us | 674 us: 1.06x faster (-5%) |
+-------------------------+--------------------------------------+-------------------------------------+
| regex_effbot | 4.49 ms | 4.26 ms: 1.05x faster (-5%) |
+-------------------------+--------------------------------------+-------------------------------------+
| scimark_monte_carlo | 160 ms | 153 ms: 1.05x faster (-5%) |
+-------------------------+--------------------------------------+-------------------------------------+
| richards | 120 ms | 115 ms: 1.05x faster (-4%) |
+-------------------------+--------------------------------------+-------------------------------------+
| 2to3 | 458 ms | 442 ms: 1.04x faster (-4%) |
+-------------------------+--------------------------------------+-------------------------------------+
| regex_v8 | 33.7 ms | 32.5 ms: 1.04x faster (-3%) |
+-------------------------+--------------------------------------+-------------------------------------+
| scimark_sparse_mat_mult | 7.16 ms | 6.93 ms: 1.03x faster (-3%) |
+-------------------------+--------------------------------------+-------------------------------------+
| deltablue | 12.1 ms | 11.7 ms: 1.03x faster (-3%) |
+-------------------------+--------------------------------------+-------------------------------------+
| regex_dna | 268 ms | 261 ms: 1.03x faster (-3%) |
+-------------------------+--------------------------------------+-------------------------------------+
| meteor_contest | 152 ms | 148 ms: 1.03x faster (-3%) |
+-------------------------+--------------------------------------+-------------------------------------+
| genshi_xml | 89.0 ms | 87.1 ms: 1.02x faster (-2%) |
+-------------------------+--------------------------------------+-------------------------------------+
| logging_simple | 12.8 us | 12.5 us: 1.02x faster (-2%) |
+-------------------------+--------------------------------------+-------------------------------------+
| genshi_text | 42.4 ms | 41.5 ms: 1.02x faster (-2%) |
+-------------------------+--------------------------------------+-------------------------------------+
| nbody | 215 ms | 211 ms: 1.02x faster (-2%) |
+-------------------------+--------------------------------------+-------------------------------------+

Not significant (35): chameleon; django_template; dulwich_log; fannkuch; float; json_dumps; json_loads; logging_format; mako; nqueens; pathlib; pickle; pickle_dict; pickle_list; pidigits; python_startup; python_startup_no_site; regex_compile; scimark_fft; spectral_norm; sqlalchemy_declarative; sqlalchemy_imperative; sqlite_synth; sympy_expand; sympy_sum; sympy_str; telco; tornado_http; unpack_sequence; unpickle; unpickle_list; xml_etree_parse; xml_etree_iterparse; xml_etree_generate; xml_etree_process; sympy_integrate

pablogsal · 2020-10-20T05:22:52Z

New changeset 109826c by Pablo Galindo in branch 'master':
bpo-42093: Add opcode cache for LOAD_ATTR (GH-22803)
109826c

gvanrossum · 2020-12-27T20:04:36Z

Wow, this is amazing. I just found that this is now faster than slots. Should we mention that in What's New?

(Of course there's an optimization possible for slots as well, but it would require complicating the cache struct. Maybe in 3.11. :-)

pablogsal · 2020-12-30T20:34:08Z

Wow, this is amazing. I just found that this is now faster than slots.

Not only that: is even faster than the highly-tuned namedtuple access descriptors that used to be the faster access to an attribute:

3.9 results
-----------
17.6 ns read_classvar_from_class
16.3 ns read_classvar_from_instance
23.2 ns read_instancevar
19.7 ns read_instancevar_slots
17.9 ns read_namedtuple
39.2 ns read_boundmethod

Now this is the faster way to get an attribute:

3.10 results
------------
17.9 ns read_classvar_from_class
16.9 ns read_classvar_from_instance
14.1 ns read_instancevar
20.0 ns read_instancevar_slots
18.0 ns read_namedtuple
40.7 ns read_boundmethod

Should we mention that in What's New?

Good idea!. I will prepare a PR complementing the current paragraph.

pablogsal · 2021-01-03T04:38:12Z

New changeset 9e8fe19 by Pablo Galindo in branch 'master':
bpo-42093: Tweak the what's new message about the new LOAD_ATTR opcode cache (GH-24070)
9e8fe19

gvanrossum · 2021-01-04T23:05:56Z

Thanks! Do you have any plans for further inline caches? I was wondering if we could reverse the situation for slots again by adding slots support to the LOAD_ATTR opcode inline cache...

pablogsal · 2021-01-05T02:19:54Z

Thanks! Do you have any plans for further inline caches?

Yeah, we are experimenting with some ideas here: https://bugs.python.org/issue42115.

I was wondering if we could reverse the situation for slots again by adding slots support to the LOAD_ATTR opcode inline cache...

I think we can do it as long as we can detect easily if a given descriptor is immutable. The problem of mutability is this code:

class Descriptor:
    pass

class C:
    def __init__(self):
        self.x = 1
    x = Descriptor()

def f(o):
    return o.x

o = C()
for i in range(10000):
    assert f(o) == 1

Descriptor.__get__ = lambda self, instance, value: 2
Descriptor.__set__ = lambda *args: None

print(f(o))

In this case, if we do not skip the cache for mutable descriptors, the code will not reflect the new result (2 instead of 1). __slots__ are immutable descriptors so we should be good as long as we can detect them.

gvanrossum · 2021-01-05T03:57:20Z

Hm, I was thinking to recognize the specific type of descriptor used by slots and cache only that. Though we would still have to consider updates to C.__dict__ (that's handled by looking at the dict version right?).

vstinner · 2021-02-21T11:02:18Z

New changeset d5fc998 by Victor Stinner in branch 'master':
bpo-42093: Cleanup _PyDict_GetItemHint() (GH-24582)
d5fc998

pablogsal added the 3.10 only security fixes label Oct 20, 2020

pablogsal self-assigned this Oct 20, 2020

pablogsal added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Oct 20, 2020

pablogsal closed this as completed Oct 20, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opcode cache for LOAD_ATTR #86259

Add opcode cache for LOAD_ATTR #86259

pablogsal commented Oct 20, 2020

pablogsal commented Oct 20, 2020

pablogsal commented Oct 20, 2020

gvanrossum commented Dec 27, 2020

pablogsal commented Dec 30, 2020

pablogsal commented Jan 3, 2021

gvanrossum commented Jan 4, 2021

pablogsal commented Jan 5, 2021

gvanrossum commented Jan 5, 2021

vstinner commented Feb 21, 2021

Add opcode cache for LOAD_ATTR #86259

Add opcode cache for LOAD_ATTR #86259

Comments

pablogsal commented Oct 20, 2020

pablogsal commented Oct 20, 2020

pablogsal commented Oct 20, 2020

gvanrossum commented Dec 27, 2020

pablogsal commented Dec 30, 2020

pablogsal commented Jan 3, 2021

gvanrossum commented Jan 4, 2021

pablogsal commented Jan 5, 2021

gvanrossum commented Jan 5, 2021

vstinner commented Feb 21, 2021