classification
Title: remove free_list for bound method objects
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: inada.naoki, jdemeyer, josh.r
Priority: normal Keywords: patch

Created on 2019-06-19 12:54 by inada.naoki, last changed 2019-07-26 07:12 by inada.naoki. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 14232 merged inada.naoki, 2019-06-19 13:14
Messages (10)
msg346040 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-06-19 12:54
LOAD_METHOD avoids temporary bound method object.
PyObject_CallMethodObjArgs now use same optimization.

Now I think there is not enough performance benefit from free_list.
When free_list is not used often enough, it may bother obmalloc reuse memory pool.

This is performance diff of removing free_list (with LTO, without PGO, patched=removed free_list):

```
$ ./python -m pyperf compare_to master.json patched.json -G --min-speed=1
Slower (19):
- sqlite_synth: 4.03 us +- 0.10 us -> 4.20 us +- 0.08 us: 1.04x slower (+4%)
- genshi_text: 41.2 ms +- 0.4 ms -> 42.6 ms +- 0.4 ms: 1.03x slower (+3%)
- scimark_sparse_mat_mult: 6.29 ms +- 0.03 ms -> 6.50 ms +- 0.50 ms: 1.03x slower (+3%)
- mako: 26.5 ms +- 0.1 ms -> 27.4 ms +- 0.3 ms: 1.03x slower (+3%)
- html5lib: 130 ms +- 4 ms -> 134 ms +- 5 ms: 1.03x slower (+3%)
- genshi_xml: 83.4 ms +- 1.1 ms -> 85.6 ms +- 1.2 ms: 1.03x slower (+3%)
- pickle: 15.1 us +- 0.5 us -> 15.5 us +- 0.5 us: 1.03x slower (+3%)
- float: 161 ms +- 1 ms -> 165 ms +- 1 ms: 1.02x slower (+2%)
- logging_simple: 13.9 us +- 0.2 us -> 14.2 us +- 0.2 us: 1.02x slower (+2%)
- xml_etree_process: 108 ms +- 1 ms -> 110 ms +- 1 ms: 1.02x slower (+2%)
- pathlib: 28.0 ms +- 0.2 ms -> 28.5 ms +- 0.3 ms: 1.02x slower (+2%)
- pickle_pure_python: 703 us +- 8 us -> 715 us +- 7 us: 1.02x slower (+2%)
- sympy_expand: 553 ms +- 5 ms -> 563 ms +- 12 ms: 1.02x slower (+2%)
- xml_etree_generate: 136 ms +- 2 ms -> 138 ms +- 1 ms: 1.02x slower (+2%)
- logging_format: 15.3 us +- 0.2 us -> 15.5 us +- 0.2 us: 1.01x slower (+1%)
- json_dumps: 17.4 ms +- 0.1 ms -> 17.7 ms +- 0.2 ms: 1.01x slower (+1%)
- logging_silent: 266 ns +- 5 ns -> 269 ns +- 9 ns: 1.01x slower (+1%)
- django_template: 163 ms +- 1 ms -> 165 ms +- 2 ms: 1.01x slower (+1%)
- sympy_sum: 219 ms +- 2 ms -> 222 ms +- 2 ms: 1.01x slower (+1%)

Faster (6):
- regex_effbot: 4.51 ms +- 0.04 ms -> 4.44 ms +- 0.03 ms: 1.02x faster (-2%)
- pickle_list: 5.21 us +- 0.04 us -> 5.13 us +- 0.04 us: 1.01x faster (-1%)
- crypto_pyaes: 164 ms +- 1 ms -> 162 ms +- 1 ms: 1.01x faster (-1%)
- xml_etree_parse: 202 ms +- 7 ms -> 200 ms +- 3 ms: 1.01x faster (-1%)
- scimark_sor: 287 ms +- 6 ms -> 284 ms +- 6 ms: 1.01x faster (-1%)
- raytrace: 758 ms +- 26 ms -> 750 ms +- 11 ms: 1.01x faster (-1%)

Benchmark hidden because not significant (35)
```

I think free_list is useful only when several benchmarks in pyperformance shows more than 5% speedup.
The benefit is smaller than my threshold.  I will run pyperformance again after bpo-37337 is merged.

FWIW, In case of sqlite_synth, I think performance difference came from here:
https://github.com/python/cpython/blob/015000165373f8db263ef5bc682f02d74e5782ac/Modules/_sqlite/connection.c#L662
If performance of user-defined aggregate feature is really important, we can optimize it further.
msg346041 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-06-19 12:56
PyCFunction has similar free_list.
msg346042 - (view) Author: Jeroen Demeyer (jdemeyer) * Date: 2019-06-19 13:11
> I will run pyperformance again after bpo-37337 is merged.

Good idea. bpo-37337 removes many calls of _PyObject_CallMethodId() which does NOT use the LOAD_METHOD optimization.
msg346043 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-06-19 13:15
GH-14232 uses only one free object instead of at most 256 free list.
msg346044 - (view) Author: Jeroen Demeyer (jdemeyer) * Date: 2019-06-19 13:17
> GH-14232 uses only one free object instead of at most 256 free list.

That sounds like a compromise which is worse than either extreme: it's worse than no free list because you still have the overhead of dealing with the one object. And it's worse than a free list of 256 because it won't for nested calls.
msg346045 - (view) Author: Jeroen Demeyer (jdemeyer) * Date: 2019-06-19 13:35
Are the benchmark results that you posted above for removing the free list completely or for the one-element free list as in PR 14232?
msg346057 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-06-19 15:11
Sorry, I just pushed another idea before I leave my office.
I will benchmark both ideas after bpo-37337 is finished.
msg346094 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-06-20 04:15
> Are the benchmark results that you posted above for removing the free list completely or for the one-element free list as in PR 14232?

Yes.  I see +3% overhead in several benchmarks and I want to see how one free obj save them.

> it's worse than no free list because you still have the overhead of dealing with the one object.

Yes.  But it still helps some common simple cases.
(e.g. PyObject_CallMethod() (not CallMethodObjArg))

> And it's worse than a free list of 256 because it won't for nested calls.

256 free list may bother 256 pools are reused.  But one free obj keeps only one pool.  And we can remove the overhead of linked list too.

Of course, I prefer a simple code.  But let's wait for benchmark result.
msg348425 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-07-25 03:02
Latest benchmark:

```
$ pyperf compare_to master.json no-freelist.json -G --min-speed=2
Slower (2):
- unpickle_list: 4.19 us +- 0.02 us -> 4.28 us +- 0.03 us: 1.02x slower (+2%)
- pathlib: 23.2 ms +- 0.2 ms -> 23.7 ms +- 0.3 ms: 1.02x slower (+2%)

Faster (2):
- nbody: 149 ms +- 2 ms -> 141 ms +- 2 ms: 1.06x faster (-6%)
- logging_simple: 10.1 us +- 0.2 us -> 9.86 us +- 0.18 us: 1.02x faster (-2%)

Benchmark hidden because not significant (56)
```

I decided to just remove the free_list.
msg348465 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-07-26 06:05
New changeset 3e54b575313c64f541e98216ed079fafed01ff5d by Inada Naoki in branch 'master':
bpo-37340: remove free_list for bound method objects (GH-14232)
https://github.com/python/cpython/commit/3e54b575313c64f541e98216ed079fafed01ff5d
History
Date User Action Args
2019-07-26 07:12:05inada.naokisetstatus: open -> closed
resolution: fixed
stage: resolved
2019-07-26 06:05:58inada.naokisetmessages: + msg348465
2019-07-25 03:02:41inada.naokisetmessages: + msg348425
2019-06-20 19:16:36josh.rsetnosy: + josh.r
2019-06-20 04:15:02inada.naokisetmessages: + msg346094
2019-06-19 15:11:22inada.naokisetmessages: + msg346057
2019-06-19 13:35:52jdemeyersetmessages: + msg346045
2019-06-19 13:17:35jdemeyersetmessages: + msg346044
2019-06-19 13:15:14inada.naokisetmessages: + msg346043
stage: patch review -> (no value)
2019-06-19 13:14:11inada.naokisetkeywords: + patch
stage: patch review
pull_requests: + pull_request14068
2019-06-19 13:11:26jdemeyersetmessages: + msg346042
2019-06-19 12:56:44inada.naokisetmessages: + msg346041
2019-06-19 12:54:21inada.naokicreate