This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: remove free_list for bound method objects
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: open Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, jdemeyer, josh.r, methane, rhettinger, vstinner
Priority: normal Keywords: patch

Created on 2019-06-19 12:54 by methane, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 14232 merged methane, 2019-06-19 13:14
PR 17284 merged vstinner, 2019-11-20 10:09
Messages (14)
msg346040 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-06-19 12:54
LOAD_METHOD avoids temporary bound method object.
PyObject_CallMethodObjArgs now use same optimization.

Now I think there is not enough performance benefit from free_list.
When free_list is not used often enough, it may bother obmalloc reuse memory pool.

This is performance diff of removing free_list (with LTO, without PGO, patched=removed free_list):

```
$ ./python -m pyperf compare_to master.json patched.json -G --min-speed=1
Slower (19):
- sqlite_synth: 4.03 us +- 0.10 us -> 4.20 us +- 0.08 us: 1.04x slower (+4%)
- genshi_text: 41.2 ms +- 0.4 ms -> 42.6 ms +- 0.4 ms: 1.03x slower (+3%)
- scimark_sparse_mat_mult: 6.29 ms +- 0.03 ms -> 6.50 ms +- 0.50 ms: 1.03x slower (+3%)
- mako: 26.5 ms +- 0.1 ms -> 27.4 ms +- 0.3 ms: 1.03x slower (+3%)
- html5lib: 130 ms +- 4 ms -> 134 ms +- 5 ms: 1.03x slower (+3%)
- genshi_xml: 83.4 ms +- 1.1 ms -> 85.6 ms +- 1.2 ms: 1.03x slower (+3%)
- pickle: 15.1 us +- 0.5 us -> 15.5 us +- 0.5 us: 1.03x slower (+3%)
- float: 161 ms +- 1 ms -> 165 ms +- 1 ms: 1.02x slower (+2%)
- logging_simple: 13.9 us +- 0.2 us -> 14.2 us +- 0.2 us: 1.02x slower (+2%)
- xml_etree_process: 108 ms +- 1 ms -> 110 ms +- 1 ms: 1.02x slower (+2%)
- pathlib: 28.0 ms +- 0.2 ms -> 28.5 ms +- 0.3 ms: 1.02x slower (+2%)
- pickle_pure_python: 703 us +- 8 us -> 715 us +- 7 us: 1.02x slower (+2%)
- sympy_expand: 553 ms +- 5 ms -> 563 ms +- 12 ms: 1.02x slower (+2%)
- xml_etree_generate: 136 ms +- 2 ms -> 138 ms +- 1 ms: 1.02x slower (+2%)
- logging_format: 15.3 us +- 0.2 us -> 15.5 us +- 0.2 us: 1.01x slower (+1%)
- json_dumps: 17.4 ms +- 0.1 ms -> 17.7 ms +- 0.2 ms: 1.01x slower (+1%)
- logging_silent: 266 ns +- 5 ns -> 269 ns +- 9 ns: 1.01x slower (+1%)
- django_template: 163 ms +- 1 ms -> 165 ms +- 2 ms: 1.01x slower (+1%)
- sympy_sum: 219 ms +- 2 ms -> 222 ms +- 2 ms: 1.01x slower (+1%)

Faster (6):
- regex_effbot: 4.51 ms +- 0.04 ms -> 4.44 ms +- 0.03 ms: 1.02x faster (-2%)
- pickle_list: 5.21 us +- 0.04 us -> 5.13 us +- 0.04 us: 1.01x faster (-1%)
- crypto_pyaes: 164 ms +- 1 ms -> 162 ms +- 1 ms: 1.01x faster (-1%)
- xml_etree_parse: 202 ms +- 7 ms -> 200 ms +- 3 ms: 1.01x faster (-1%)
- scimark_sor: 287 ms +- 6 ms -> 284 ms +- 6 ms: 1.01x faster (-1%)
- raytrace: 758 ms +- 26 ms -> 750 ms +- 11 ms: 1.01x faster (-1%)

Benchmark hidden because not significant (35)
```

I think free_list is useful only when several benchmarks in pyperformance shows more than 5% speedup.
The benefit is smaller than my threshold.  I will run pyperformance again after bpo-37337 is merged.

FWIW, In case of sqlite_synth, I think performance difference came from here:
https://github.com/python/cpython/blob/015000165373f8db263ef5bc682f02d74e5782ac/Modules/_sqlite/connection.c#L662
If performance of user-defined aggregate feature is really important, we can optimize it further.
msg346041 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-06-19 12:56
PyCFunction has similar free_list.
msg346042 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-06-19 13:11
> I will run pyperformance again after bpo-37337 is merged.

Good idea. bpo-37337 removes many calls of _PyObject_CallMethodId() which does NOT use the LOAD_METHOD optimization.
msg346043 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-06-19 13:15
GH-14232 uses only one free object instead of at most 256 free list.
msg346044 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-06-19 13:17
> GH-14232 uses only one free object instead of at most 256 free list.

That sounds like a compromise which is worse than either extreme: it's worse than no free list because you still have the overhead of dealing with the one object. And it's worse than a free list of 256 because it won't for nested calls.
msg346045 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-06-19 13:35
Are the benchmark results that you posted above for removing the free list completely or for the one-element free list as in PR 14232?
msg346057 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-06-19 15:11
Sorry, I just pushed another idea before I leave my office.
I will benchmark both ideas after bpo-37337 is finished.
msg346094 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-06-20 04:15
> Are the benchmark results that you posted above for removing the free list completely or for the one-element free list as in PR 14232?

Yes.  I see +3% overhead in several benchmarks and I want to see how one free obj save them.

> it's worse than no free list because you still have the overhead of dealing with the one object.

Yes.  But it still helps some common simple cases.
(e.g. PyObject_CallMethod() (not CallMethodObjArg))

> And it's worse than a free list of 256 because it won't for nested calls.

256 free list may bother 256 pools are reused.  But one free obj keeps only one pool.  And we can remove the overhead of linked list too.

Of course, I prefer a simple code.  But let's wait for benchmark result.
msg348425 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-07-25 03:02
Latest benchmark:

```
$ pyperf compare_to master.json no-freelist.json -G --min-speed=2
Slower (2):
- unpickle_list: 4.19 us +- 0.02 us -> 4.28 us +- 0.03 us: 1.02x slower (+2%)
- pathlib: 23.2 ms +- 0.2 ms -> 23.7 ms +- 0.3 ms: 1.02x slower (+2%)

Faster (2):
- nbody: 149 ms +- 2 ms -> 141 ms +- 2 ms: 1.06x faster (-6%)
- logging_simple: 10.1 us +- 0.2 us -> 9.86 us +- 0.18 us: 1.02x faster (-2%)

Benchmark hidden because not significant (56)
```

I decided to just remove the free_list.
msg348465 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-07-26 06:05
New changeset 3e54b575313c64f541e98216ed079fafed01ff5d by Inada Naoki in branch 'master':
bpo-37340: remove free_list for bound method objects (GH-14232)
https://github.com/python/cpython/commit/3e54b575313c64f541e98216ed079fafed01ff5d
msg357065 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-11-20 11:59
New changeset 4dedd0f0ddc5a983a57bf0105eb34f948a91d2c4 by Victor Stinner in branch 'master':
bpo-37340: Remove PyMethod_ClearFreeList() and PyCFunction_ClearFreeList() (GH-17284)
https://github.com/python/cpython/commit/4dedd0f0ddc5a983a57bf0105eb34f948a91d2c4
msg367557 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-28 19:21
This caused a performance regression (70%) for a fundamental operation.  See issue 37340.

Sometimes, bound methods are used directly and not through LOAD_METHOD:

     sorted(data, key=str.upper)
msg367563 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-28 21:20
> This caused a performance regression (70%) for a fundamental operation.  See issue 37340.

Which issue? This is bpo-37340.
msg367565 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-28 21:28
> sorted(data, key=str.upper)

Oh, I guess that it's bpo-39117.
History
Date User Action Args
2022-04-11 14:59:16adminsetgithub: 81521
2020-04-28 21:28:25vstinnersetmessages: + msg367565
2020-04-28 21:20:39vstinnersetmessages: + msg367563
2020-04-28 19:21:48rhettingersetstatus: closed -> open
nosy: + rhettinger
messages: + msg367557

2019-12-23 22:20:52eric.snowsetnosy: + eric.snow
2019-11-20 11:59:17vstinnersetnosy: + vstinner
messages: + msg357065
2019-11-20 10:09:18vstinnersetpull_requests: + pull_request16777
2019-07-26 07:12:05methanesetstatus: open -> closed
resolution: fixed
stage: resolved
2019-07-26 06:05:58methanesetmessages: + msg348465
2019-07-25 03:02:41methanesetmessages: + msg348425
2019-06-20 19:16:36josh.rsetnosy: + josh.r
2019-06-20 04:15:02methanesetmessages: + msg346094
2019-06-19 15:11:22methanesetmessages: + msg346057
2019-06-19 13:35:52jdemeyersetmessages: + msg346045
2019-06-19 13:17:35jdemeyersetmessages: + msg346044
2019-06-19 13:15:14methanesetmessages: + msg346043
stage: patch review -> (no value)
2019-06-19 13:14:11methanesetkeywords: + patch
stage: patch review
pull_requests: + pull_request14068
2019-06-19 13:11:26jdemeyersetmessages: + msg346042
2019-06-19 12:56:44methanesetmessages: + msg346041
2019-06-19 12:54:21methanecreate