classification
Title: Investigate slow writes to class variables
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Yasser Alshalaan, nascheme, pablogsal, rhettinger, scoder
Priority: low Keywords: patch

Created on 2019-02-16 20:15 by rhettinger, last changed 2019-02-20 17:29 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
perf-screenshot.png nascheme, 2019-02-19 18:35
Pull Requests
URL Status Linked Edit
PR 11907 merged scoder, 2019-02-17 14:31
Messages (11)
msg335714 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-16 20:15
Benchmark show what writes to class variables are anomalously slow.

    class A(object):
        pass

    A.x = 1   # This write is 3 to 5 times slower than other writes.

FWIW, the same operation for old-style classes in Python 2.7 was several times faster.

We should investigate to understand why the writes are so slow.  There might be a good reason or there might be an opportunity for optimization.

-------------------------------------------------
$ python3.8 Tools/scripts/var_access_benchmark.py
Variable and attribute read access:
   4.3 ns	read_local
   4.6 ns	read_nonlocal
  14.5 ns	read_global
  19.0 ns	read_builtin
  18.4 ns	read_classvar_from_class
  16.2 ns	read_classvar_from_instance
  24.7 ns	read_instancevar
  19.7 ns	read_instancevar_slots
  19.5 ns	read_namedtuple
  26.4 ns	read_boundmethod

Variable and attribute write access:
   4.4 ns	write_local
   5.1 ns	write_nonlocal
  18.2 ns	write_global
 103.9 ns	write_classvar             <== Outlier
  35.4 ns	write_instancevar
  25.6 ns	write_instancevar_slots
msg335733 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-02-16 23:34
It seems 50% of the overhead (50ns) is due to two reasons:

- 30-40% is due to the call to update_slot(type, name) after the item is set in the class dictionary.

- 70-60% is due to all the extra work from _PyObject_GenericSetAttrWithDict until it ends calling  _PyObjectDict_SetItem with the slot. If in the code of typeobject.c:type_setattro you change:

res = _PyObject_GenericSetAttrWithDict((PyObject *)type, name, value, NULL);

by 

PyObject* dictptr = _PyObject_GetDictPtr(type);
res = _PyObjectDict_SetItem(type, dictptr, name, value);

and delete the update_slot(type, name) call afterwards, the times are reduced to 50ns.
msg335747 - (view) Author: Yasser Alshalaan (Yasser Alshalaan) * Date: 2019-02-17 02:53
can you include your python 2.7 runs? for me it looks similar
msg335763 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-17 08:50
> can you include your python 2.7 runs? for me it looks similar

It will give similar results unless you switch to old-style classes (edit out the inheritance from object).

    class A:
        pass

    A.x = 1

---------------------------------------
$ python2.7 var_access_benchmark.py
Variable and attribute read access:
   6.7 ns	read_local
  10.9 ns	read_global
  18.9 ns	read_builtin
  24.4 ns	read_classvar_from_class
  30.2 ns	read_classvar_from_instance
  22.7 ns	read_instancevar
  25.5 ns	read_instancevar_slots
  99.3 ns	read_namedtuple
  40.9 ns	read_boundmethod

Variable and attribute write access:
   8.2 ns	write_local
  18.7 ns	write_global
  32.9 ns	write_classvar         <== Not formerly an outlier
  32.5 ns	write_instancevar
  31.2 ns	write_instancevar_slots
msg335776 - (view) Author: Stefan Behnel (scoder) * Date: 2019-02-17 14:36
It turns out that "update_slot()" is always called, even when we are not updating a slot name (which is always a special dunder-name). The linear search for names in "update_slots()" is a huge waste of time here, and short-circuiting out of it when the name does not start with "_" cuts the overall update time by 50%. I pushed a PR.

Another improvement would be a sub-linear algorithm for searching the slot name, but that's a bigger change.
msg335794 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-02-18 00:59
This are the timings that I am measuring with PR 11907:

Variable and attribute read access:
   5.7 ns       read_local
   5.9 ns       read_nonlocal
  16.2 ns       read_global
  24.5 ns       read_builtin
  20.9 ns       read_classvar_from_class
  20.0 ns       read_classvar_from_instance
  29.7 ns       read_instancevar
  24.7 ns       read_instancevar_slots
  22.9 ns       read_namedtuple
  36.8 ns       read_boundmethod

Variable and attribute write access:
   6.9 ns       write_local
   6.9 ns       write_nonlocal
  26.7 ns       write_global
  65.4 ns       write_classvar <----- Down from 120.6 ns
  49.5 ns       write_instancevar
  34.5 ns       write_instancevar_slots
msg335795 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-18 01:32
Wow, I didn't expect to get an immediate win this of this magnitude :-)
msg335992 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2019-02-19 18:25
Some profiling using 'perf'.  This is for cpython 63fa1cfece4912110ce3a0ff11fb3ade3ff5e756.

  children      self
  [...]
+   97.27%     0.00%  run_mod (inlined)
+   88.53%     6.33%  PyObject_SetAttr
+   79.34%     6.80%  type_setattro
+   43.92%    43.92%  update_slot
+   26.87%     5.84%  _PyObject_GenericSetAttrWithDict
+   14.62%     6.52%  insertdict
+    8.80%     8.80%  lookdict_unicode_nodummy
+    6.57%     0.00%  _Py_DECREF (inlined)
+    5.19%     5.19%  PyUnicode_InternInPlace
+    3.57%     3.57%  _PyObjectDict_SetItem
+    3.38%     3.38%  _PyType_Lookup
+    1.71%     0.00%  _Py_INCREF (inlined)
+    1.42%     0.00%  _Py_XDECREF (inlined)
  [...]

After applying PR 11907:

  children      self
  [...]
+   94.76%     0.00%  run_mod (inlined)
+   75.22%     6.77%  PyObject_SetAttr
+   64.65%    13.08%  type_setattro
+   47.51%    11.96%  _PyObject_GenericSetAttrWithDict
+   22.99%    13.95%  insertdict
+   10.10%    10.10%  lookdict_unicode_nodummy
+    9.47%     9.47%  PyUnicode_InternInPlace
+    7.10%     7.10%  _PyObjectDict_SetItem
+    7.02%     7.02%  _PyType_Lookup
+    6.52%     0.00%  _Py_DECREF (inlined)
+    3.17%     0.00%  _Py_INCREF (inlined)
+    3.16%     0.00%  _Py_XDECREF (inlined)
+    2.59%     0.00%  PyDict_SetItem (inlined)
+    1.50%     0.00%  is_dunder_name (inlined)
  [...]

The PyUnicode_InternInPlace() can mostly be eliminated by testing PyUnicode_CHECK_INTERNED() first (we already have called PyUnicode_Check() on it).  That only gives a 7% speedup on my machine though.  The is_dunder_name() is a much bigger optimization.
msg335995 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2019-02-19 18:35
BTW, 'perf report [...]' has a really neat annotated assembly view.  Scroll to the function you are interested in and press 'a'.  Press 't' to toggle the time units (left side numbers).  I'm attaching a screenshot of the disassembly of the lookdict function.  The time units are sample accounts.  Each count is a point where the profiler woke up on that specific instruction.
msg336031 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-20 00:45
Thanks Neil.  The tooling does indeed look nice.

I added your PyUnicode_InternInPlace() suggestion to the PR.

At this point, the PR looks ready-to-go unless any of you think we've missed some low-hanging fruit.
msg336129 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-20 17:29
New changeset d8b9e1fc2e45d2bc3f4a9737c375f2adb8a8c7de by Raymond Hettinger (Stefan Behnel) in branch 'master':
bpo-36012: Avoid linear slot search for non-dunder methods (GH-11907)
https://github.com/python/cpython/commit/d8b9e1fc2e45d2bc3f4a9737c375f2adb8a8c7de
History
Date User Action Args
2019-02-20 17:29:55rhettingersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-02-20 17:29:26rhettingersetmessages: + msg336129
2019-02-20 00:45:18rhettingersetmessages: + msg336031
2019-02-19 18:35:45naschemesetfiles: + perf-screenshot.png

messages: + msg335995
2019-02-19 18:25:57naschemesetmessages: + msg335992
2019-02-19 11:27:22vstinnersetnosy: - vstinner
2019-02-18 01:32:29rhettingersetmessages: + msg335795
2019-02-18 00:59:21pablogsalsetmessages: + msg335794
2019-02-17 14:36:15scodersetnosy: + scoder
messages: + msg335776
2019-02-17 14:31:39scodersetkeywords: + patch
stage: patch review
pull_requests: + pull_request11932
2019-02-17 08:50:24rhettingersetmessages: + msg335763
2019-02-17 02:53:16Yasser Alshalaansetnosy: + Yasser Alshalaan
messages: + msg335747
2019-02-16 23:34:44pablogsalsetmessages: + msg335733
2019-02-16 20:15:18rhettingercreate