classification
Title: Convert static types to heap types: use PyType_FromSpec()
Type: Stage: patch review
Components: C API, Subinterpreters Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: corona10, eric.snow, erlendaasland, pablogsal, phsilva, shihai1991, skrah, vstinner
Priority: normal Keywords: patch

Created on 2020-03-26 16:04 by corona10, last changed 2020-09-16 10:06 by vstinner.

Files
File name Uploaded Description Edit
bench_isinstance_check.py corona10, 2020-03-29 02:06
bench_subclass_check.py corona10, 2020-03-29 02:06
bench_isinstance_check.py corona10, 2020-03-29 14:40
Pull Requests
URL Status Linked Edit
PR 19177 merged corona10, 2020-03-26 16:14
PR 19202 merged corona10, 2020-03-28 05:31
PR 19341 closed shihai1991, 2020-04-03 17:38
PR 19344 merged shihai1991, 2020-04-03 17:49
PR 19438 merged shihai1991, 2020-04-08 17:00
PR 20960 merged corona10, 2020-06-18 10:05
PR 20974 open corona10, 2020-06-19 03:11
PR 21953 closed corona10, 2020-08-25 13:44
PR 21954 merged corona10, 2020-08-25 14:01
Messages (28)
msg365087 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-03-26 16:04
Some of modules is not using PyType_FromSpec.
We need to convert them.

This changes can bring
- allow to destroy types at exit!
- allow subinterpreters to have their own "isolated" typ
msg365088 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-26 16:08
> We need to convert them.

Let me elaborate. Static types have multiple issues:

* Their lifetime is not well defined.
* It is not obvious when they are ready to be used.
* They are not destroyed at exit.
* They are incompatible with subinterpreters: each interpreter should have its own copy of a type, rather than static types are shared by all interpreters which cause problems with reference counting (require GIL or atomic operation).
* They are causing issues with stable ABI (limited C API): PEP 384.
msg365117 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-26 22:55
Wouldn't having less static types slow down startup time?
msg365125 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-27 01:15
Pablo:
> Wouldn't having less static types slow down startup time?

That's possible, even if I only expect a minor or non significant overhead.

But before starting talking about performances, we should focus on the correctness. Static types are causing issues with subinterpreters and the Python finalization.
msg365142 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-27 11:00
New changeset 33f15a16d40cb8010a8c758952cbf88d7912ee2d by Dong-hee Na in branch 'master':
bpo-40077: Convert _json module to use PyType_FromSpec() (GH-19177)
https://github.com/python/cpython/commit/33f15a16d40cb8010a8c758952cbf88d7912ee2d
msg365145 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-03-27 11:46
> Wouldn't having less static types slow down startup time?

Yes, and not only startup time:

https://bugs.python.org/issue15722
msg365149 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-27 13:58
In the _json module, PyModule_GetState() (called by get_json_state()) is only used by the garbage collector (traverse function) and to unload the module (clear and free functions). It's not used in "hot code" (let me consider that the GC is not part of the usual "hot code" :-)).

But maybe we should measure the overhead of future changes if PyModule_GetState() starts to be used in "hot code" by running a few microbenchmarks.

--

Stefan Krah:
> Yes, and not only startup time:
> https://bugs.python.org/issue15722

Aha, that's interesting. I didn't know that it could have an impact on runtime performance as well.

_decimal_pep3121-384_v1.patch attached to bpo-15722 seems to use:

#define _decimal_state_global ((_decimal_state *)PyModule_GetState(PyState_FindModule(&_decimal_module)))

whereas the commit 33f15a16d40cb8010a8c758952cbf88d7912ee2d only uses:

static inline _jsonmodulestate*
get_json_state(PyObject *module)
{
    void *state = PyModule_GetState(module);
    assert(state != NULL);
    return (_jsonmodulestate *)state;
}

Maybe PyState_FindModule() adds a little overhead, even if this function is simple: in short, it gets the i-th item of a list (from PyInterpreterState.modules_by_index).

PyModule_GetState() function is really simple: it only reads PyModuleObject.md_state attribute.

Or maybe _decimal_state_global was used in "hot code".

If PyState_FindModule() or PyModule_GetState() is the source of the overhead, maybe we could try to optimise these functions, or pass directly the module state to inner functions.

--

PyState_FindModule() doesn't work for a module using PyModuleDef_Init(). The PEP 573 is going to give access to the module state in functions which didn't access to it previsouly.

The commit 33f15a16d40cb8010a8c758952cbf88d7912ee2d removed a few assertions checking that "self" has the expected type. It wasn't possible to get the module state to get the types, because PEP 573 is not implemented yet and PyState_FindModule() doesn't work in _json (which uses PyModuleDef_Init()). I decided that it's ok to remove these assertions: it should not be possible to call these functions with another type in practice.

--

In his PR 19177, Dong-hee started by replacing direct access to PyTypeObject fields, like replacing:

    type->free(self);

with:

    freefunc free_func = PyType_GetSlot(tp, Py_tp_free);
    free_func(self);

I asked him to revert these changes.

I'm interested to experiment a few C extension modules of the stdlib using the limited C API (PEP 384, stable ABI), but I don't think that it should done while converting modules to PyType_FromSpec().

We should separate the two changes. And I would prefer to first see the overhead of PyType_FromSpec(), and discuss the advantages and drawbacks.
msg365154 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-03-27 14:48
> Or maybe _decimal_state_global was used in "hot code".

Yes, _decimal has problems here that most modules don't have.
Modules like atexit are obviously fine. :)

I just posted it as an example why one should be a bit cautious.


> The PEP 573 is going to give access to the module state in functions which didn't access to it previously.

That could help, we'll see.
msg365205 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-03-28 05:54
> And I would prefer to first see the overhead of PyType_FromSpec(), and discuss the advantages and drawbacks.

Should we stop the work until the overhead is measured?
msg365220 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-28 17:13
> Should we stop the work until the overhead is measured?

Can you try to measure the _abc._abc_instancecheck() and _abc._abc_subclasscheck() functions performance before/after your change? Functions like register() are rarely call, so I don't care much of their performance (I expect a small overhead or no overhead).
msg365241 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-03-29 02:05
It shown 1% slower for performance.

+--------------------------+-----------------+----------------------------+
| Benchmark                | master-subclass | bpo-40077-subclass         |
+==========================+=================+============================+
| bench _abc_subclasscheck | 295 ns          | 300 ns: 1.01x slower (+1%) |
+--------------------------+-----------------+----------------------------+

+--------------------------+-------------------+----------------------------+
| Benchmark                | master-isinstance | bpo-40077-isinstance       |
+==========================+===================+============================+
| bench _abc_instancecheck | 229 ns            | 232 ns: 1.01x slower (+1%) |
+--------------------------+-------------------+----------------------------+
msg365242 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-03-29 02:07
> Can you try to measure the _abc._abc_instancecheck() and _abc._abc_subclasscheck()

I 've submitted the benchmark :)
msg365255 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-29 14:17
IMO 1.01x slower on a *microbenchmark* is not significant so it's ok. Thanks for checking.

You pass a *type to isinstance() in bench_isinstance_check.py. You should pass *an instance* instead. Like (complete the (...) ;-)):

runner.timeit(name="bench _abc_instancecheck",
              stmt="isinstance(obj, T)",
              setup = """from abc import ABC (...) obj = (1, 2, 3)""")

Would you mind to fix your microbenchmark and re-run it?
msg365257 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-03-29 14:40
> You pass a *type to isinstance() in bench_isinstance_check.py.

Thanks for the catch my mistake.

The result is showing:
Not significant (1): bench _abc_instancecheck
msg365317 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 14:35
New changeset 53e4c91725083975598350877e2ed8e2d0194114 by Dong-hee Na in branch 'master':
bpo-40077: Convert _abc module to use PyType_FromSpec() (GH-19202)
https://github.com/python/cpython/commit/53e4c91725083975598350877e2ed8e2d0194114
msg365558 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 01:56
> bpo-40077: Convert _abc module to use PyType_FromSpec() (GH-19202)

This change introduced a reference leak: bpo-40149 "test_threading leaked [38, 38, 38] references, sum=114".
msg365773 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-04 19:24
New changeset b709302f3125622986bd458dfb2954fda5e8366d by Hai Shi in branch 'master':
bpo-40077: Fix potential refleaks of _json: traverse memo (GH-19344)
https://github.com/python/cpython/commit/b709302f3125622986bd458dfb2954fda5e8366d
msg366024 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2020-04-08 22:39
> Wouldn't having less static types slow down startup time?

FWIW, I've been considering an approach where the main interpreter
keeps using static types but subinterpreters use heap types.  If it
isn't too much effort (or too hacky) then it might be a sufficient
solution for now.
msg366061 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-04-09 15:10
New changeset dcb04d9c6dd6f31449ade6765fa4d26a305e7381 by Hai Shi in branch 'master':
bpo-40077: Remove redundant cast in json module (GH-19438)
https://github.com/python/cpython/commit/dcb04d9c6dd6f31449ade6765fa4d26a305e7381
msg368669 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-11 22:23
See also bpo-40601: [C API] Hide static types from the limited C API.
msg371120 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-09 16:11
I tried to finalize static types in Py_Finalize(), but it didn't work:

* https://bugs.python.org/issue1635741#msg371119
* https://github.com/python/cpython/pull/20763
msg371813 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-18 14:06
PR 20960 (_bz2 module) triggers an interesting question. The effect of converting a static type to a heap type on pickle, especially for protocols 0 and 1.

pickle.dumps(o, 0) calls object.__reduce__(o) which calls copyreg._reduce_ex(o, 0).

copyreg._reduce_ex() behaves differently on heap types:

    ...
    for base in cls.__mro__:
        if hasattr(base, '__flags__') and not base.__flags__ & _HEAPTYPE:
            break
    else:
        base = object # not really reachable
    ...

There are 3 things which impacts serialization/deserialization:

- Py_TPFLAGS_HEAPTYPE flag in the type flags
- Is __new__() overriden in the type?
- Py_TPFLAGS_BASETYPE flag in the type flags

Examples:

* In Python 3.7, _random.Random() cannot be serialized because it's a static type (it doesn't have (Py_TPFLAGS_HEAPTYPE)
* In master, _random.Random() cannot be deserialized because _random.Random() has __new__() method (it's not object.__new__())
msg371890 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-06-19 15:56
New changeset ec689187957cc80af56b9a63251bbc295bafd781 by Dong-hee Na in branch 'master':
bpo-40077: Convert _bz2 module to use PyType_FromSpec (GH-20960)
https://github.com/python/cpython/commit/ec689187957cc80af56b9a63251bbc295bafd781
msg372073 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-22 09:33
Search also for issues with "384" in their title (34 open issues):

https://bugs.python.org/issue?%40search_text=&ignore=file%3Acontent&title=384&%40columns=title&id=&%40columns=id&stage=&creation=&creator=&activity=&%40columns=activity&%40sort=activity&actor=&nosy=&type=&components=&versions=&dependencies=&assignee=&keywords=&priority=&status=1&%40columns=status&resolution=&nosy_count=&message_count=&%40group=&%40pagesize=50&%40startwith=0&%40sortdir=on&%40queryname=&%40old-queryname=&%40action=search

Search for issues with pep3121 keyword (40 open issues):

https://bugs.python.org/issue?%40search_text=&ignore=file%3Acontent&title=&%40columns=title&id=&%40columns=id&stage=&creation=&creator=&activity=&%40columns=activity&%40sort=activity&actor=&nosy=&type=&components=&versions=&dependencies=&assignee=&keywords=13&priority=&status=1&%40columns=status&resolution=&nosy_count=&message_count=&%40group=&%40pagesize=50&%40startwith=0&%40sortdir=on&%40queryname=&%40old-queryname=&%40action=search
msg372074 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-22 09:33
For example, see bpo-15849 for the xx module.
msg372076 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-22 09:35
See meta bpo-15787 "PEP 3121, 384 Refactoring" which tracks all these issues as dependencies.
msg372326 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-25 09:38
The Tools/scripts/abitype.py script can help to port C extensions to PyType_FromSpec().
msg375943 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-08-26 17:22
New changeset 31967fd8d0220af84e2698172d1378bffc8cd851 by Dong-hee Na in branch 'master':
bpo-40077: Convert _operator to use PyType_FromSpec (GH-21954)
https://github.com/python/cpython/commit/31967fd8d0220af84e2698172d1378bffc8cd851
History
Date User Action Args
2020-09-16 10:06:57vstinnersettitle: Convert static types to PyType_FromSpec() -> Convert static types to heap types: use PyType_FromSpec()
2020-08-26 17:22:37corona10setmessages: + msg375943
2020-08-25 14:01:06corona10setpull_requests: + pull_request21063
2020-08-25 13:44:36corona10setpull_requests: + pull_request21062
2020-06-25 09:38:33vstinnersetmessages: + msg372326
2020-06-22 09:35:00vstinnersetmessages: + msg372076
2020-06-22 09:33:27vstinnersetmessages: + msg372074
2020-06-22 09:33:12vstinnersetmessages: + msg372073
2020-06-19 15:56:17corona10setmessages: + msg371890
2020-06-19 03:11:11corona10setpull_requests: + pull_request20151
2020-06-18 14:06:36vstinnersetmessages: + msg371813
2020-06-18 10:05:34corona10setpull_requests: + pull_request20139
2020-06-09 18:59:43erlendaaslandsetnosy: + erlendaasland
2020-06-09 16:11:56vstinnersetmessages: + msg371120
2020-06-05 09:32:54vstinnersetcomponents: + Subinterpreters
2020-05-11 22:23:59vstinnersetmessages: + msg368669
2020-04-09 15:10:36corona10setmessages: + msg366061
2020-04-08 22:39:48eric.snowsetnosy: + eric.snow
messages: + msg366024
2020-04-08 17:00:41shihai1991setpull_requests: + pull_request18792
2020-04-04 19:24:23vstinnersetmessages: + msg365773
2020-04-03 17:49:13shihai1991setpull_requests: + pull_request18708
2020-04-03 17:38:26shihai1991setpull_requests: + pull_request18705
2020-04-02 01:56:23vstinnersetmessages: + msg365558
2020-03-30 14:52:13shihai1991setnosy: + shihai1991
2020-03-30 14:35:44vstinnersetmessages: + msg365317
2020-03-29 14:40:48corona10setfiles: + bench_isinstance_check.py

messages: + msg365257
2020-03-29 14:17:02vstinnersetmessages: + msg365255
2020-03-29 02:07:21corona10setmessages: + msg365242
2020-03-29 02:06:47corona10setfiles: + bench_subclass_check.py
2020-03-29 02:06:38corona10setfiles: + bench_isinstance_check.py
2020-03-29 02:05:45corona10setmessages: + msg365241
2020-03-28 17:13:18vstinnersetmessages: + msg365220
2020-03-28 05:54:13corona10setmessages: + msg365205
2020-03-28 05:31:04corona10setpull_requests: + pull_request18565
2020-03-27 14:48:11skrahsetmessages: + msg365154
2020-03-27 13:58:10vstinnersetmessages: + msg365149
2020-03-27 11:46:53skrahsetnosy: + skrah
messages: + msg365145
2020-03-27 11:00:12vstinnersetmessages: + msg365142
2020-03-27 03:17:47corona10setcomponents: + C API
2020-03-27 03:17:41corona10setversions: + Python 3.9
2020-03-27 01:19:29phsilvasetnosy: + phsilva
2020-03-27 01:15:29vstinnersetmessages: + msg365125
2020-03-26 22:55:59pablogsalsetmessages: + msg365117
2020-03-26 22:55:53pablogsalsetmessages: - msg365116
2020-03-26 22:55:46pablogsalsetnosy: + pablogsal
messages: + msg365116
2020-03-26 16:14:23corona10setkeywords: + patch
stage: patch review
pull_requests: + pull_request18537
2020-03-26 16:08:39vstinnersetmessages: + msg365088
2020-03-26 16:04:44corona10create