classification
Title: Inline cache for slots
Type: performance Stage: commit review
Components: Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, gvanrossum, methane, pablogsal, rhettinger, serhiy.storchaka, yselivanov
Priority: normal Keywords: patch

Created on 2021-01-14 04:45 by gvanrossum, last changed 2021-01-31 22:55 by pablogsal. This issue is now closed.

Files
File name Uploaded Description Edit
result2.zip pablogsal, 2021-01-14 19:17
Pull Requests
URL Status Linked Edit
PR 24216 merged gvanrossum, 2021-01-14 04:54
PR 24383 merged pablogsal, 2021-01-30 02:08
Messages (15)
msg385063 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-14 04:45
I've been thinking about Python performance improvements, and I played around with an inline cache enhancement that supports slots. The results on a very simple benchmark look promising (30% speedup) but I'm terrible with our benchmarking tools, and this should be considered *very* carefully before we go ahead with it, since it could potentially pessimize the inline cache for standard attributes.

I'll attach a PR in a minute.
msg385064 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-01-14 09:14
Slot means so many different things in Python. Here it's about data descriptors created when you set __slots__ in the class definition.

It is amazing that so large speed up can be achieved for such base operation.
msg385066 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-01-14 10:32
I will try to run today the pyperformance test suite to see if there is any impact on standard attribute access
msg385076 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-01-14 19:17
These are the results:

venv ❯ python -m pyperf compare_to json_old/* -G --min-speed=2 --table
+---------------------+--------------------------------------+------------------------------------------+
| Benchmark           | 2021-01-14_09-40-master-14cfa325c298 | 2021-01-14_10-39-slotscache-9123a84ad8d4 |
+=====================+======================================+==========================================+
| float               | 148 ms                               | 133 ms: 1.11x faster                     |
+---------------------+--------------------------------------+------------------------------------------+
| pidigits            | 265 ms                               | 252 ms: 1.05x faster                     |
+---------------------+--------------------------------------+------------------------------------------+
| scimark_monte_carlo | 130 ms                               | 126 ms: 1.03x faster                     |
+---------------------+--------------------------------------+------------------------------------------+
| spectral_norm       | 168 ms                               | 164 ms: 1.03x faster                     |
+---------------------+--------------------------------------+------------------------------------------+
| logging_silent      | 221 ns                               | 216 ns: 1.02x faster                     |
+---------------------+--------------------------------------+------------------------------------------+
| sympy_expand        | 598 ms                               | 585 ms: 1.02x faster                     |
+---------------------+--------------------------------------+------------------------------------------+
| regex_v8            | 31.9 ms                              | 32.6 ms: 1.02x slower                    |
+---------------------+--------------------------------------+------------------------------------------+
| deltablue           | 8.93 ms                              | 9.16 ms: 1.03x slower                    |
+---------------------+--------------------------------------+------------------------------------------+
| unpickle            | 17.0 us                              | 17.7 us: 1.04x slower                    |
+---------------------+--------------------------------------+------------------------------------------+
| Geometric mean      | (ref)                                | 1.00x faster                             |
+---------------------+--------------------------------------+------------------------------------------+

Benchmark hidden because not significant (50): sympy_sum, sqlalchemy_imperative, sympy_str, sympy_integrate, dulwich_log, scimark_fft, genshi_text, tornado_http, regex_dna, sqlalchemy_declarative, mako, meteor_contest, unpickle_pure_python, xml_etree_parse, genshi_xml, scimark_sor, sqlite_synth, pickle_pure_python, nbody, pickle_dict, pyflate, regex_effbot, xml_etree_process, logging_simple, python_startup, 2to3, fannkuch, python_startup_no_site, raytrace, go, hexiom, scimark_lu, json_dumps, richards, logging_format, xml_etree_generate, chaos, telco, pickle_list, unpack_sequence, regex_compile, django_template, json_loads, crypto_pyaes, xml_etree_iterparse, unpickle_list, nqueens, chameleon, scimark_sparse_mat_mult, pickle
Ignored benchmarks (1) of json_old/2021-01-14_09-40-master-14cfa325c298.json.gz: pathlib


I uploaded the pyperf result data to bpo as well
msg385077 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-01-14 19:17
So it seems that everything is in the noise range except the "float" benchmark that is 1.11x faster
msg385078 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-01-14 19:20
Some microbenchmarks:

CURRENT MASTER:

Variable and attribute read access:
   4.5 ns       read_local
   5.8 ns       read_nonlocal
   7.8 ns       read_global
   7.8 ns       read_builtin
  21.2 ns       read_classvar_from_class
  19.1 ns       read_classvar_from_instance
  16.4 ns       read_instancevar
  23.6 ns       read_instancevar_slots
  20.2 ns       read_namedtuple
  40.5 ns       read_boundmethod

PR 24216

Variable and attribute read access:
   4.5 ns       read_local
   5.8 ns       read_nonlocal
   7.8 ns       read_global
   8.2 ns       read_builtin
  21.4 ns       read_classvar_from_class
  19.0 ns       read_classvar_from_instance
  16.6 ns       read_instancevar
  10.2 ns       read_instancevar_slots
  20.5 ns       read_namedtuple
  42.9 ns       read_boundmethod
msg385079 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2021-01-14 19:21
> So it seems that everything is in the noise range except the "float" benchmark that is 1.11x faster

Yeah, this is why. https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_float.py#L12

This is a great result, IMO. I'm +1 to merge this.
msg385080 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2021-01-14 19:21
> Some microbenchmarks:


Can you add a new one `read_slots`?
msg385081 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-01-14 19:24
> This is a great result, IMO. I'm +1 to merge this.

Same here. It would be good if Inada-san could confirm the benchmarks.

> Can you add a new one `read_slots`?

That should be "read_instancevar_slots". Or you refer to some other check?
msg385091 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-01-15 00:59
+1 Thanks for this.
msg385095 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-15 02:30
Thanks for all the positive feedback! What is the next step?
msg385103 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-01-15 09:20
> What is the next step?

I would say just finishing the PR (making sure that we do not miss some arcane edge case) and updating the what's new for 3.10 :)
msg385124 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-15 19:07
I created a benchmark for this, see https://github.com/python/pyperformance/pull/86.

Next I'll add a blurb and then it's off to reviewers.
msg385965 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-30 02:02
New changeset 5c5a938573ce665f00e362c7766912d9b3f3b44e by Guido van Rossum in branch 'master':
bpo-42927: Inline cache for attributes defined with '__slots__' (#24216)
https://github.com/python/cpython/commit/5c5a938573ce665f00e362c7766912d9b3f3b44e
msg386046 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-01-31 22:55
New changeset a776da90b8f2a1342f4f9bfd23a62cea9a0497c6 by Pablo Galindo in branch 'master':
bpo-42927: Update the What's new entry for LOAD_ATTR optimizations (GH-24383)
https://github.com/python/cpython/commit/a776da90b8f2a1342f4f9bfd23a62cea9a0497c6
History
Date User Action Args
2021-01-31 22:55:58pablogsalsetmessages: + msg386046
2021-01-30 02:08:32pablogsalsetpull_requests: + pull_request23199
2021-01-30 02:03:05gvanrossumsetstatus: open -> closed
resolution: fixed
stage: patch review -> commit review
2021-01-30 02:02:32gvanrossumsetmessages: + msg385965
2021-01-15 19:07:18gvanrossumsetmessages: + msg385124
2021-01-15 09:20:07pablogsalsetmessages: + msg385103
2021-01-15 02:30:31gvanrossumsetmessages: + msg385095
2021-01-15 00:59:28rhettingersetnosy: + rhettinger
messages: + msg385091
2021-01-14 20:33:58BTaskayasetnosy: + BTaskaya
2021-01-14 19:24:12pablogsalsetnosy: + methane
messages: + msg385081
2021-01-14 19:21:48yselivanovsetmessages: + msg385080
2021-01-14 19:21:18yselivanovsetmessages: + msg385079
2021-01-14 19:20:43pablogsalsetmessages: + msg385078
2021-01-14 19:17:59pablogsalsetmessages: + msg385077
2021-01-14 19:17:28pablogsalsetfiles: + result2.zip

messages: + msg385076
2021-01-14 10:32:47pablogsalsetmessages: + msg385066
2021-01-14 09:14:16serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg385064
2021-01-14 04:54:48gvanrossumsetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request23043
2021-01-14 04:45:09gvanrossumcreate