classification
Title: TODO list when PEP 573 "Module State Access from C Extension Methods" will be implemented
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: corona10, erlendaasland, miss-islington, pablogsal, petr.viktorin, phsilva, rhettinger, serhiy.storchaka, shihai1991, vstinner
Priority: high Keywords: patch

Created on 2020-04-01 15:08 by vstinner, last changed 2021-04-14 08:24 by erlendaasland.

Files
File name Uploaded Description Edit
patch.diff erlendaasland, 2021-04-14 08:24 Move get_functools_state_by_type() to lru_cache_new()
Pull Requests
URL Status Linked Edit
PR 19177 merged corona10, 2020-04-01 15:20
PR 20055 closed shihai1991, 2020-05-12 15:18
PR 23405 merged shihai1991, 2020-11-19 15:22
PR 24006 merged vstinner, 2020-12-30 01:01
PR 25393 open vstinner, 2021-04-13 20:49
Messages (20)
msg365482 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-01 15:08
In bpo-1635741, many C extension modules are converted to PEP 489 multiphase initialization and/or modified to get a module state. Problem: the module state cannot be accessed in some functions, and so some of these changes had to workaround the fact that PEP 573 "Module State Access from C Extension Methods" is not implemented yet.

This issue tracks C extension modules which should be modified once PEP 573 will be implemented:

* _functools: Py_CLEAR(kwd_mark); is commented in _functools_free()
  See commit eacc07439591c97f69ab4a3d17391b009cd78ae2

* _abc: abc_invalidation_counter is shared by all module instances. abc_data_new() requires access to abc_invalidation_counter but it doesn't have access to the module state. abc_invalidation_counter should be moved to the module state.
msg365486 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-01 15:19
Another minor issue: "assert(PyScanner_Check(self));" and "assert(PyEncoder_Check(self));" were removed by commit 33f15a16d40cb8010a8c758952cbf88d7912ee2d when _json module got a module state.

scanner_traverse(), scanner_clear(), encoder_traverse() and encoder_clear() cannot get the module state of a module using PEP 489 multiphase initialization.

Similar issue in scanner_call() and encoder_call().

I'm not sure that the PEP 573 provide a solution for this exact issue, since the intent of the assertion is to check the type of the argument. The PEP 573 gives access to the module state through types. But here we are not sure that the object type is the expected type... :-)

Again, it's a minor issue, it can be ignored. That's why I accepted to merge the commit. Technically, I don't see how we could get the wrong type in practice.
msg368852 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-14 17:03
I consider that all things that could be done have already been done, so I close the issue. Thanks for Hai and Dong-hee who helped ;-)

> * _functools: Py_CLEAR(kwd_mark); is commented in _functools_free()

Sadly, PEP 573 implementation is not complete enough to use it in _functools: see PR 20055.

> * _abc: abc_invalidation_counter

Fixed in bpo-40566 by:

commit 77c614624b6bf2145bef69830d0f499d8b55ec0c
Author: Dong-hee Na <donghee.na92@gmail.com>
Date:   Sat May 9 17:31:40 2020 +0900

    bpo-40566: Apply PEP 573 to abc module (GH-20005)


> scanner_traverse(), scanner_clear(), encoder_traverse() and encoder_clear()

tp_clear slot cannot get the defining type (PEP 573 may be extended later to allow that).

Using PyType_GetModule(Py_TYPE(self)) to access types stored in the module state and then compare Py_TYPE(self) to these types... looks badly broken :-) If we get the wrong type, PyType_GetModule(Py_TYPE(self)) will return the wrong module and so we cannot get the json state from the wrong module... It's a chicken-and-egg issue :-)

I think that it's not worth it to attempt to add back these assertions. It's very unlikely to get the wrong types in practice.
msg380668 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-11-10 14:01
It should be possible to get them back using _PyType_GetModuleByDef.
msg380669 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-10 14:02
Oh nice, in this case, I reopen my issue :-)
msg383993 - (view) Author: miss-islington (miss-islington) Date: 2020-12-29 12:45
New changeset dd39123970892733c317f235808638ae5c0ccf04 by Hai Shi in branch 'master':
bpo-40137: Convert _functools module to use PyType_FromModuleAndSpec. (GH-23405)
https://github.com/python/cpython/commit/dd39123970892733c317f235808638ae5c0ccf04
msg384036 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-30 00:56
The following commit introduces a reference leak:

commit dd39123970892733c317f235808638ae5c0ccf04
Author: Hai Shi <shihai1992@gmail.com>
Date:   Tue Dec 29 20:45:07 2020 +0800

    bpo-40137: Convert _functools module to use PyType_FromModuleAndSpec. (GH-23405)

 Lib/test/test_functools.py                         |   3 +-
 .../2020-11-19-23-12-57.bpo-40137.bihl9O.rst       |   1 +
 Modules/_functoolsmodule.c                         | 475 +++++++++++----------
 3 files changed, 255 insertions(+), 224 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Core and Builtins/2020-11-19-23-12-57.bpo-40137.bihl9O.rst


Leaks:

test_interpreters leaked [2038, 2034, 2038] references, sum=6110
test_threading leaked [452, 452, 452] references, sum=1356
test_capi leaked [452, 452, 452] references, sum=1356
test__xxsubinterpreters leaked [6332, 6328, 6332] references, sum=18992
test_ast leaked [226, 226, 226] references, sum=678

https://buildbot.python.org/all/#/builders/129/builds/144
msg384037 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-30 01:03
I wrote PR 24006 to fix the leak.
msg384043 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-30 01:24
New changeset ba0e49a4648e727d1a16b3ce479499eb39f22311 by Victor Stinner in branch 'master':
bpo-40137: Fix refleak in _functools_exec() (GH-24006)
https://github.com/python/cpython/commit/ba0e49a4648e727d1a16b3ce479499eb39f22311
msg384048 - (view) Author: hai shi (shihai1991) * (Python triager) Date: 2020-12-30 03:57
> I wrote PR 24006 to fix the leak.
Oh, thanks for your fix, victor.
msg384049 - (view) Author: hai shi (shihai1991) * (Python triager) Date: 2020-12-30 04:30
After PR 23405 and PR 24006 merged, I think this bpo can be closed.
msg384056 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-30 11:30
> _abc: abc_invalidation_counter is shared by all module instances.

For the record, this one has been fixed by:

commit 77c614624b6bf2145bef69830d0f499d8b55ec0c
Author: Dong-hee Na <donghee.na92@gmail.com>
Date:   Sat May 9 17:31:40 2020 +0900

    bpo-40566: Apply PEP 573 to abc module (GH-20005)
msg390756 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-04-11 00:55
State lookups and indirections were added to most performance sensitive code part of the

    _functools_state *state;

    state = get_functools_state_by_type(Py_TYPE(self));
    if (state == NULL) {
        return NULL;
    }
    key = lru_cache_make_key(state, args, kwds, self->typed);

I think it should be removed for Python3.10.  Right now, it is of zero value to users (actually, negative value).
msg390758 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-11 01:20
Here are some benchmarks of using lru_cache in 3.9 and 3.10 (PGO/LTO/CPU isol):

❯ ./python -m pyperf timeit "from functools import lru_cache; f = lru_cache(lambda: 42)" "f()" --compare-to ../3.9/python
/home/pablogsal/github/3.9/python: ..................... 2.60 us +- 0.05 us
/home/pablogsal/github/cpython/python: ..................... 2.74 us +- 0.06 us

Mean +- std dev: [/home/pablogsal/github/3.9/python] 2.60 us +- 0.05 us -> [/home/pablogsal/github/cpython/python] 2.74 us +- 0.06 us: 1.06x slower

Given that lru_cache is normally used to seek speed, this is a bit unfortunate :(
msg390759 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-11 01:22
As a Release Manager, I am not very happy with a 6% reduction in speed, especially in these functions. Can someone take a look to see if we can get the performance closer to Python 3.9?
msg390768 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-04-11 05:01
Serhiy, do you think we could substitute the "self" instance of functools._lru_cache_wrapper object for the kw_mark?  It already compares on identity, so it should be just a fast as the instance of object().  AFAICT, kwd_mark doesn't have to be unique across instances.  

To avoid creating a GC only circular reference, we could skip the INCREF, leaving the object weakly referencing itself. 

In bounded_lru_cache_wrapper(), the other state reference reference is in creating a new lru_list_elem.  There are several things we can do about this.  1) move the expensive get_functools_state_by_type() inside the code block that creates new links, so that it won't affect a full cache.  2) create a per-instance variable that references the per-type lru_list_elem.  This has the downside of making each lru_cache instance slightly larger.  3) prepopulate the lru_cache with maxsize links.  This simplifies the code and makes it slightly faster but will waste spaces for use cases that never actually fill-up the cache lazily, also it slows down start-up time.

Ideally, these PRs should be reverted for Python 3.10.  AFAICT, no user of functools will benefit in any way from these changes.  Presumably, that is why there are no tests.  It is all cost and no benefit.

BTW, the 6% timing is likely understated because the loop is executed entirely in L1 cache or the CPU's recent write cache rather than real world conditions where there is fierce competition of the limited 32kb L1 data cache.  Just looking at the total number of sequentially dependent memory accesses, I expect that the actual cost is more than a third.
msg390769 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-04-11 05:08
FWIW, I've only looked at the lru_cache() code.  Someone should go through all the PyModule_GetState calls to see if they are on a critical path.  AFAICT a huge number of these changes were applied without regard to whether or not they occurred on an existing fast path.

This is a considerable number of clock cycles for a type lookup that used to be almost cost free.  Whether it is high or low impact greatly depends on whether the surrounding code was doing a lot of work or very little work.  In thin functions, the incremental costs will be higher.
msg391004 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-13 20:49
> ./python -m pyperf timeit "from functools import lru_cache; f = lru_cache(lambda: 42)" "f()" --compare-to ../3.9/python
> /home/pablogsal/github/3.9/python: ..................... 2.60 us +- 0.05 us

You misused pyperf timeit: the two statements are run at each iteration of the benchmark.

I rerun the benchmark on Linux with PGO+LTO and CPU isolation:

Mean +- std dev: [py39] 37.5 ns +- 1.0 ns -> [master] 43.2 ns +- 0.7 ns: 1.15x slower

I understand that adding get_functools_state_by_type() has a cost of +5.7 ns on the performance of functions decorated with @lru_cache.

I used the commands:

./configure --enable-optimizations --with-lto
make
./python -m venv env
env/bin/python -m pip install pyperf
./env/bin/python -m pyperf timeit -s "from functools import lru_cache; f = lru_cache(lambda: 42)" "f()" -o master.json -v --duplicate=4096
msg391005 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-13 20:53
I wrote PR 25393 to micro-optimize _PyType_GetModuleByDef(). It reduces the overhead from +5.7 ns to +3.4 ns (-2.3 ns).

Well, I didn't check in depth if _PyType_GetModuleByDef() is the only change in functools.lru_cache() from Python 3.9 to master.

Compare master to Python 3.9:

Mean +- std dev: [py39] 37.5 ns +- 1.0 ns -> [master] 43.2 ns +- 0.7 ns: 1.15x slower

Compare PR 25393 to Python 3.9:

Mean +- std dev: [py39] 37.5 ns +- 1.0 ns -> [inline] 40.9 ns +- 1.0 ns: 1.09x slower

Compare PR 25393 to master:

Mean +- std dev: [master] 43.2 ns +- 0.7 ns -> [inline] 40.9 ns +- 1.0 ns: 1.05x faster
msg391045 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-04-14 08:24
Raymond:
> 1) move the expensive get_functools_state_by_type() inside the code block that creates new links, so that it won't affect a full cache.

This should be as easy as adding a _functools_state pointer to struct lru_cache_object, init that pointer in lru_cache_new(), and then fetch it from self in infinite_lru_cache_wrapper() and bounded_lru_cache_wrapper(), no? Diff attached.
History
Date User Action Args
2021-04-14 08:24:46erlendaaslandsetfiles: + patch.diff
nosy: + erlendaasland
messages: + msg391045

2021-04-13 20:53:46vstinnersetmessages: + msg391005
2021-04-13 20:49:54vstinnersetmessages: + msg391004
stage: patch review -> resolved
2021-04-13 20:49:22vstinnersetstage: resolved -> patch review
pull_requests: + pull_request24126
2021-04-11 05:08:39rhettingersetmessages: + msg390769
2021-04-11 05:01:51rhettingersetmessages: + msg390768
2021-04-11 01:22:34pablogsalsetmessages: + msg390759
2021-04-11 01:20:14pablogsalsetmessages: + msg390758
2021-04-11 00:55:03rhettingersetstatus: closed -> open
priority: normal -> high

nosy: + pablogsal, serhiy.storchaka, rhettinger
messages: + msg390756

resolution: fixed ->
2020-12-30 11:30:57vstinnersetmessages: + msg384056
2020-12-30 04:30:55shihai1991setstatus: open -> closed
resolution: fixed
messages: + msg384049

stage: patch review -> resolved
2020-12-30 03:57:05shihai1991setmessages: + msg384048
2020-12-30 01:24:54vstinnersetmessages: + msg384043
2020-12-30 01:03:37vstinnersetmessages: + msg384037
2020-12-30 01:01:51vstinnersetpull_requests: + pull_request22848
2020-12-30 00:56:35vstinnersetmessages: + msg384036
2020-12-29 12:45:14miss-islingtonsetnosy: + miss-islington
messages: + msg383993
2020-11-19 15:22:29shihai1991setstage: resolved -> patch review
pull_requests: + pull_request22298
2020-11-10 16:51:49shihai1991setversions: + Python 3.10, - Python 3.9
2020-11-10 14:02:29vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg380669
2020-11-10 14:01:42petr.viktorinsetnosy: + petr.viktorin
messages: + msg380668
2020-05-14 17:03:45vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg368852

stage: patch review -> resolved
2020-05-12 15:18:42shihai1991setpull_requests: + pull_request19364
2020-04-01 16:17:55shihai1991setnosy: + shihai1991
2020-04-01 15:20:42corona10setkeywords: + patch
nosy: + corona10

pull_requests: + pull_request18636
stage: patch review
2020-04-01 15:19:56vstinnersetmessages: + msg365486
2020-04-01 15:14:45phsilvasetnosy: + phsilva
2020-04-01 15:08:23vstinnercreate