This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyType_IsSubtype is doing excessive work in the common case
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Dennis Sweeney, arhadthedev, itamaro, kj, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2021-11-03 01:12 by itamaro, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 29380 closed itamaro, 2021-11-03 01:17
PR 29392 merged itamaro, 2021-11-03 17:34
Messages (11)
msg405575 - (view) Author: Itamar Ostricher (itamaro) * Date: 2021-11-03 01:12
Based on real world profiling data we collected, a vast amount of `PyType_IsSubtype` calls are coming from `type_call`, when it decides whether `__init__` should run or not.

In the common case, the arguments to this call are identical, but the implementation still walks the MRO.

By returning early for identical types, the common case can be optimized with a non-trivial performance gain.
msg405577 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2021-11-03 02:14
Interesting. It seems like several call sites already check the equality case:

---- setobject.h: ----
#define PyFrozenSet_Check(ob) \
    (Py_IS_TYPE(ob, &PyFrozenSet_Type) || \
      PyType_IsSubtype(Py_TYPE(ob), &PyFrozenSet_Type))

---- object.h: ----
static inline int _PyObject_TypeCheck(PyObject *ob, PyTypeObject *type) {
    return Py_IS_TYPE(ob, type) || PyType_IsSubtype(Py_TYPE(ob), type);
}
#define PyObject_TypeCheck(ob, type) _PyObject_TypeCheck(_PyObject_CAST(ob), type)


Perhaps it would be better to use PyObject_TypeCheck where applicable (such as in the type_call function you mention, and in abstract.c's binary_op1()).
msg405581 - (view) Author: Itamar Ostricher (itamaro) * Date: 2021-11-03 03:43
Dennis, you mean something like this? https://github.com/itamaro/cpython/commit/92d46b260cf6ccce1a47003f539294530138e488

sure, that would preempt the `PyType_IsSubtype` calls coming from these specific callsites, but the early return in `PyType_IsSubtype` would cover those as well as any other path, wouldn't it?
msg405586 - (view) Author: Oleg Iarygin (arhadthedev) * Date: 2021-11-03 08:15
Dennis, can PyFrozenSet_Check and _PyObject_TypeCheck get rid of Py_IS_TYPE invocation then, so PyType_IsSubtype becomes the only source of truth here?
msg405589 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-11-03 09:01
The PyObject_TypeCheck() macro is used in performance critical cases. If it is not the case the PyType_IsSubtype() function can be used. Adding yet check in PyType_IsSubtype() will slow down more performance sensitive cases in which PyObject_TypeCheck() is used and speed up less performance sensitive cases. So there is no reason to add it.
msg405596 - (view) Author: Ken Jin (kj) * (Python committer) Date: 2021-11-03 09:39
> Dennis, you mean something like this? https://github.com/itamaro/cpython/commit/92d46b260cf6ccce1a47003f539294530138e488

Not Dennis, but these changes looks good. As Serhiy mentioned, we can replace the hot callsites with PyObject_TypeCheck without slowing down PyType_IsSubtype.
msg405633 - (view) Author: Itamar Ostricher (itamaro) * Date: 2021-11-03 17:50
thanks for the feedback & discussion, I submitted a new PR
msg405642 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-11-03 20:20
Creating a new type takes microseconds, and using PyObject_TypeCheck() instead of PyType_IsSubtype() can save nanoseconds. So, while this replacement looks harmless, its effect can hardly be measured even in microbenchmarks.
msg405654 - (view) Author: Itamar Ostricher (itamaro) * Date: 2021-11-03 22:42
thanks for the feedback Serhiy!

repeating my response from the PR here as well (not sure what's the proper etiquette.. :-) )

note that this code path is not for creating new types (which is slow as you say), but for creating new instances (which is quite fast).

I ran some synthetic benchmarks with this change and without it (on an opt build with PGO and LTO), and the gain is measurable:

build from main (commit acc89db9233abf4d903af9a7595a2ed7478fe7d3 which is the base commit for this PR):

```
>for _ in {1..10}; do ./main-opt/python -m timeit -n 10000 -s 'class Spam: pass' -- 'ten_k_spams = [Spam() for _ in range(10000)]'; done
10000 loops, best of 5: 896 usec per loop
10000 loops, best of 5: 887 usec per loop
10000 loops, best of 5: 857 usec per loop
10000 loops, best of 5: 838 usec per loop
10000 loops, best of 5: 847 usec per loop
10000 loops, best of 5: 863 usec per loop
10000 loops, best of 5: 845 usec per loop
10000 loops, best of 5: 902 usec per loop
10000 loops, best of 5: 890 usec per loop
10000 loops, best of 5: 875 usec per loop
```

build with this change applied (commit 2362bf67e8acee49c6f97ea754d59dfd8982e07c):

```
>for _ in {1..10}; do ./test-opt/python -m timeit -n 10000 -s 'class Spam: pass' -- 'ten_k_spams = [Spam() for _ in range(10000)]'; done
10000 loops, best of 5: 833 usec per loop
10000 loops, best of 5: 885 usec per loop
10000 loops, best of 5: 845 usec per loop
10000 loops, best of 5: 838 usec per loop
10000 loops, best of 5: 833 usec per loop
10000 loops, best of 5: 827 usec per loop
10000 loops, best of 5: 858 usec per loop
10000 loops, best of 5: 811 usec per loop
10000 loops, best of 5: 843 usec per loop
10000 loops, best of 5: 845 usec per loop
```

also worth noting that the previous approach (GH-29380) was implemented on the Instagram Server production workload in Meta/Facebook/Instagram (choose your favorite) and resulted measurable perf gain of around 0.2%
msg405674 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-11-04 10:40
New changeset 2c045bd5673d56c3fdde7536da9df1c7d6f270f0 by Itamar Ostricher in branch 'main':
bpo-45697: Use PyObject_TypeCheck in type_call (GH-29392)
https://github.com/python/cpython/commit/2c045bd5673d56c3fdde7536da9df1c7d6f270f0
msg405675 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-11-04 10:41
Thank you for your contribution Itamar.
History
Date User Action Args
2022-04-11 14:59:52adminsetgithub: 89860
2021-11-04 10:41:48serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg405675

stage: patch review -> resolved
2021-11-04 10:40:21serhiy.storchakasetmessages: + msg405674
2021-11-03 22:42:17itamarosetmessages: + msg405654
2021-11-03 20:20:59serhiy.storchakasetmessages: + msg405642
2021-11-03 17:50:49itamarosetmessages: + msg405633
2021-11-03 17:34:43itamarosetpull_requests: + pull_request27650
2021-11-03 09:39:56kjsetnosy: + kj
messages: + msg405596
2021-11-03 09:01:16serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg405589
2021-11-03 08:15:33arhadthedevsetnosy: + arhadthedev
messages: + msg405586
2021-11-03 03:43:04itamarosetmessages: + msg405581
2021-11-03 02:14:12Dennis Sweeneysetnosy: + Dennis Sweeney
messages: + msg405577
2021-11-03 01:17:36itamarosetkeywords: + patch
stage: patch review
pull_requests: + pull_request27638
2021-11-03 01:12:41itamarocreate