classification
Title: Cleanup object.h header
Type: Stage: resolved
Components: C API Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Jim.Jewett, lukasz.langa, petr.viktorin, pitrou, rhettinger, serhiy.storchaka, skrah, vstinner
Priority: Keywords: patch

Created on 2020-02-03 15:42 by vstinner, last changed 2020-07-11 17:52 by Jim.Jewett. This issue is now closed.

Files
File name Uploaded Description Edit
Screen Shot 2020-07-07 at 10.40.52 AM.png rhettinger, 2020-07-07 17:53 Disassembly of math.dist() in 3.8 vs 3.9
Pull Requests
URL Status Linked Edit
PR 18331 merged vstinner, 2020-02-03 15:45
PR 18332 merged vstinner, 2020-02-03 16:35
PR 18346 merged vstinner, 2020-02-04 14:45
PR 18362 merged vstinner, 2020-02-05 11:02
PR 18363 merged vstinner, 2020-02-05 11:48
PR 18364 merged vstinner, 2020-02-05 12:15
PR 18366 merged vstinner, 2020-02-05 13:35
PR 18368 merged vstinner, 2020-02-05 16:49
PR 18378 merged vstinner, 2020-02-06 16:01
PR 18384 closed vstinner, 2020-02-06 22:09
Messages (28)
msg361305 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-03 15:42
In bpo-39489, I removed the COUNT_ALLOCS special build. The object.h header can now be cleaned up to simplify the code.
msg361307 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-03 16:28
New changeset 4b524161a0f9d50d782e739a3708434ffd4e94a5 by Victor Stinner in branch 'master':
bpo-39542: Move object.h debug functions to internal C API (GH-18331)
https://github.com/python/cpython/commit/4b524161a0f9d50d782e739a3708434ffd4e94a5
msg361311 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-03 16:55
New changeset 49932fec62c616ec88da52642339d83ae719e924 by Victor Stinner in branch 'master':
bpo-39542: Simplify _Py_NewReference() (GH-18332)
https://github.com/python/cpython/commit/49932fec62c616ec88da52642339d83ae719e924
msg361384 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-05 00:11
New changeset 40e547dfbb9052ca0c667b242f6825ed1c23c195 by Victor Stinner in branch 'master':
bpo-39542: Make _Py_NewReference() opaque in C API (GH-18346)
https://github.com/python/cpython/commit/40e547dfbb9052ca0c667b242f6825ed1c23c195
msg361422 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-05 11:23
New changeset 0fa4f43db086ac3459811cca4ec5201ffbee694a by Victor Stinner in branch 'master':
bpo-39542: Exclude trashcan from the limited C API (GH-18362)
https://github.com/python/cpython/commit/0fa4f43db086ac3459811cca4ec5201ffbee694a
msg361425 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-05 12:12
New changeset f58bd7c1693fe041f7296a5778d0a11287895648 by Victor Stinner in branch 'master':
bpo-39542: Make PyObject_INIT() opaque in limited C API (GH-18363)
https://github.com/python/cpython/commit/f58bd7c1693fe041f7296a5778d0a11287895648
msg361429 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-05 13:24
New changeset 509dd90f4684e40af3105dd3e754fa4b9c1530c1 by Victor Stinner in branch 'master':
bpo-39542: Convert PyType_Check() to static inline function (GH-18364)
https://github.com/python/cpython/commit/509dd90f4684e40af3105dd3e754fa4b9c1530c1
msg361431 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-05 14:10
New changeset 0e4e735d06967145b49fd00693627f3624991dbc by Victor Stinner in branch 'master':
bpo-39542: Define PyTypeObject earlier in object.h (GH-18366)
https://github.com/python/cpython/commit/0e4e735d06967145b49fd00693627f3624991dbc
msg361432 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-05 14:11
Ok, object.h looks much better now! I close the issue.
msg361442 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-05 17:24
New changeset 58f4e1a6ee4c6ea82f3f5075d9d9d344ce6b8a56 by Victor Stinner in branch 'master':
bpo-39542: Declare _Py_AddToAllObjects() in pycore_object.h (GH-18368)
https://github.com/python/cpython/commit/58f4e1a6ee4c6ea82f3f5075d9d9d344ce6b8a56
msg361505 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-06 21:41
New changeset 2844336e6bb0dc932d710be5f4d8c126d0768d03 by Victor Stinner in branch 'master':
bpo-39542: Document limited C API changes (GH-18378)
https://github.com/python/cpython/commit/2844336e6bb0dc932d710be5f4d8c126d0768d03
msg372962 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-07-03 21:40
PyTuple_Check() got slower across the board.  This is problematic because the principal use case for PyTuple_Check() is as a guard for various fast paths.

The direct cause of the degradation is that the inlining of PyType_Check() didn't go so well — commit 509dd90f4684e40af3105dd3e754fa4b9c1530c1. 

There are two issues.  First, we lost the fast path for an exact type match that was present in the 3.8 version of PyType_Check().  Second, functions like PyType_Check() cannot be fully inlined if they call other non-inlined functions like PyType_GetFlags().

The original unreviewed commit doesn't revert cleanly because subsequent changes were made.  Instead, I suggest:

* restore the short-cut for an exact type match, and
* convert PyType_GetFlags() to a macro or an inlined function

That would fix the performance regression while still treating type objects as opaque.

------- Incomplete inlines ----------------------------------------------
------- line 639 in object.h --------------------------------------------

static inline int
PyType_HasFeature(PyTypeObject *type, unsigned long feature) {
    return ((PyType_GetFlags(type) & feature) != 0);
}
               ^---- Non-static function cannot be inlined

#define PyType_FastSubclass(type, flag) PyType_HasFeature(type, flag)

static inline int _PyType_Check(PyObject *op) {
    return PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_TYPE_SUBCLASS);
}
#define PyType_Check(op) _PyType_Check(_PyObject_CAST(op))


------- Old Type Check Code ---------------------------------------------
------- line 646 in object.h --------------------------------------------

#define PyObject_TypeCheck(ob, tp) \
    (Py_TYPE(ob) == (tp) || PyType_IsSubtype(Py_TYPE(ob), (tp)))
                 ^--- Fast path for exact match is now gone


------- Non-static function cannot be inlined --------------------------
------- line 2339 in typeobject.c  -------------------------------------

unsigned long
PyType_GetFlags(PyTypeObject *type)
{
    return type->tp_flags;
}
msg372983 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-07-04 08:41
Christian Heimes has expressed an interest (quite rudely) in unreviewed commits elsewhere.  I wonder if that applies to everyone regardless of employer.
msg372984 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-07-04 09:02
Ah, I see that the subject has already been answered in the negative:

https://discuss.python.org/t/make-code-review-mandatory/4174

He must have forgotten.
msg372993 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-07-04 16:46
Let's just focus on getting it fixed before 3.9 goes out.
msg372994 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2020-07-04 17:56
Stefan, stop it! I feel offended and harassed by you. Do not ever talk to me in public, private, or even insinuate to reference to my person in any public conversation.
msg372995 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-07-04 18:39
I have no opinion about this commit (I did not benchmark anything,
but I wouldn't be surprised if Victor did).

I do have the opinion that unreviewed commits occur quite frequently
and nearly every committer has made some (or a lot), including the
committers who mention them.

They are not inherently bad, so I hope we can focus on the outcome.
msg373028 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-07-05 11:45
Good catch, Raymond. I'll mark this as a release blocker to ensure I don't miss it. The last beta is scheduled for July 20th. Is this enough time for you? I would very much like to avoid refactors of any sort during the release candidate stage.
msg373230 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-07-07 16:19
Raymond:
> PyTuple_Check() got slower across the board.  This is problematic because the principal use case for PyTuple_Check() is as a guard for various fast paths.

Raymond gave more context in this email:
https://mail.python.org/archives/list/python-dev@python.org/message/Q3YHYIKNUQH34FDEJRSLUP2MTYELFWY3/

As far as I know, only macOS is impacted by this performance regression (119 nsec => 152 nsec on a microbenchmark). On Windows and Linux, Python is built with LTO and PGO optimizations.

For example, the Python package provided by Fedora 32 is built with GCC 10 using LTO and PGO. Using GCC 10 and LTO, the PyTuple_Check() code is inlined even if PyType_GetFlags() is function call in PyType_HasFeature().

I didn't know that on macOS, LTO was not used. I created bpo-41181 to request to enable LTO on the macOS installer builder, but Ned Deily rejected the issue (read the issue for the rationale).

I dislike comparing performances when LTO and PGO optimizations are not used. LTO enables tons of optimizations and PGO makes benchmark results more reproducible.

One idea is to stop running benchmarks on macOS, and at least only run benchmarks on binaries built with LTO and PGO.

--

Raymond:
> The direct cause of the degradation is that the inlining of PyType_Check() didn't go so well — commit 509dd90f4684e40af3105dd3e754fa4b9c1530c1.

I'm not sure the performance changed with this commit. IMHO it's more the commit 45ec5b99aefa54552947049086e87ec01bc2fc9a: bpo-40170 and GH-19378.

Background behind this change: see PEP 620 for the overall plan, see bpo-39573 and bpo-40170 to make PyObject and PyTypeObject structures opaque. The idea is to slowly move third party extension modules towards the stable ABI (PEP 384). Internally, CPython continues to access directly structure members.


> The original unreviewed commit (...)

GH-19378 was reviewed and approved by another core dev.

The impact of performance of these changes was taken in account, and that's why it was decided to add a new internal _PyType_HasFeature() function.

It wasn't noticed that PyTuple_Check() calls the public PyType_HasFeature() instead of the internal _PyType_HasFeature(). As I wrote, using LTO, the code is inlined anyway, so it's the same thing.
msg373231 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-07-07 16:28
Stefan Krah:
> Christian Heimes has expressed an interest (quite rudely) in unreviewed commits elsewhere.  I wonder if that applies to everyone regardless of employer.

I don't understand the relationship between this issue, Christian Heimes and making reviews mandatory or not.

As I wrote, the commit 45ec5b99aefa54552947049086e87ec01bc2fc9a was reviewed and approved by another core dev.

Stefan, please stop personal attack and stay at the technical level.
msg373234 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-07-07 17:53
Here are two timings for math.dist().  They were run with the production macOS builds from python.org:

$ python3.8 -m timeit -r 11 -s 'from math import dist' -s 'p=(1.1, 2.2); q=(1.7, 2.3)' 'dist(p, q)'
5000000 loops, best of 11: 58.4 nsec per loop

$ python3.9 -m timeit -r 11 -s 'from math import dist' -s 'p=(1.1, 2.2); q=(1.7, 2.3)' 'dist(p, q)'
5000000 loops, best of 11: 66.9 nsec per loop

The attached screen shot shows that the only change between the two versions is that the subclass check is inlined and fast in 3.8, but is an external function call in 3.9.

---- 3.8 subclass check -----------
    movq    8(%r12), %rax
    movl    $0, 32(%rsp)
    testb   $4, 171(%rax)
    je  L779

---- 3.9 subclass check -----------

    movq    8(%r12), %rdi
    call    _PyType_GetFlags
    movl    $0, 32(%rsp)
    testl   $67108864, %eax
    je  L856

The C code for math.dist() is unchanged between 3.8 and 3.9.  Both use PyTuple_Check().

This isn't unique.  Every single PyTuple_Check() is the similarly affected (approx. 225 occurrences).  Presumably, this affects other type checks as well.
msg373235 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-07-07 17:55
Serhiy, do you have any insights to offer?
msg373236 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-07-07 18:20
A brief note for Victor that *nothing* was directed against him. On
the contrary, msg372995 was supporting him in case the commit had
actually been unreviewed, which it apparently wasn't. Sorry for
jumping on the bandwagon there.


> PEP 620 for the overall plan.

This one I have some trouble with.  It cites PyPy as a bottleneck, but
IIRC Armin Rigo was on record saying that such changes would not help
PyPy, and he would come up with a counter proposal. Has this changed?

It is also a bit unusual that the PEP is in draft status but a lot
of things are already committed.
msg373237 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-07-07 18:21
s/PyPy as a bottleneck/the C-API as a bottle neck for PyPy/
msg373239 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-07-07 18:37
Victor, is there any reason PyType_GetFlags() can't be converted to a macro or an inlined function?  That seems like a simple and robust fix that won't get in the way of anything else you're doing.

We shouldn't disregard macOS timings. It is an important platform —dominant in data science, dominant among most of my clients, and dominant among the core developers at the last sprint.

None of the inlining should depend on LTO and PGO.  That is fragile and will create hard-to-explain variations between builds (even consecutive builds on the same platform).  Third-party extensions, SO and DLL files won't benefit from that.  Also, it makes it difficult to individually optimize and analyze a function without a costly rebuild at every step.
msg373282 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-07-08 07:11
I do not know the purpose of this issue. The new code does not look cleaner to me, but maybe it is only for me.

But performance regressions should be fixed. We cannot rely on compiler-specific global optimization, which is hard to test and does not work for dynamic linkage (third-party extensions and applications with built-in Python). Even "inline" is just a hint to the compiler and does not guarantee inlining the code.

Raymond's suggestions look reasonable to me.
msg373288 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-07-08 08:30
Since the PEP 620 is still a draft, I created PR 21390 to fix the performance issue.

As I wrote, I didn't expect any impact on performances since I added _PyType_HasFeature() which access directly the structure member. But I was wrong. 

This change is not strictly required by my work for now, so I just revert it until the PEP is accepted.

If I reapply the change later, I will take care of ensuring that it's properly optimized in CPython internals (access the member, don't call a function), especially on macOS.

If you want to continue the discussion, please discuss on bpo-40170, since this issue is unrelated.


Stefan Krah:
> This one I have some trouble with. It cites PyPy as a bottleneck,
> but IIRC Armin Rigo was on record saying that such changes would
> not help PyPy, and he would come up with a counter proposal.
> Has this changed?

Raymond:
> Victor, is there any reason PyType_GetFlags() can't be converted to a macro or an inlined function?

Serhiy Storchaka:
> I do not know the purpose of this issue. The new code does not look cleaner to me, but maybe it is only for me.

I explained the rationale in bpo-40170 and PEP 620.

It's not a "cleanup change". Serhiy: we are discussing the commit 45ec5b99aefa54552947049086e87ec01bc2fc9a of bpo-40170.

This discussion is happening in the wrong bpo which seems to confuse people. So I close again this issue which is fixed.
msg373530 - (view) Author: Jim Jewett (Jim.Jewett) (Python triager) Date: 2020-07-11 17:52
Raymond, did you replace the screenshot with a later one showing that things are fixed now?  The timestamp suggests it went up at the same time as your comment, but what I see in the .png file is that the two are identical other than addresses.
History
Date User Action Args
2020-07-11 17:52:28Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg373530
2020-07-08 08:30:06vstinnersetstatus: open -> closed
priority: release blocker ->
resolution: fixed
messages: + msg373288
2020-07-08 07:11:26serhiy.storchakasetmessages: + msg373282
2020-07-07 18:37:19rhettingersetmessages: + msg373239
2020-07-07 18:21:46skrahsetmessages: + msg373237
2020-07-07 18:20:11skrahsetmessages: + msg373236
2020-07-07 17:55:30rhettingersetnosy: + serhiy.storchaka
messages: + msg373235
2020-07-07 17:53:54rhettingersetfiles: + Screen Shot 2020-07-07 at 10.40.52 AM.png

messages: + msg373234
2020-07-07 16:28:52vstinnersetmessages: + msg373231
2020-07-07 16:19:24vstinnersetmessages: + msg373230
2020-07-05 11:45:39lukasz.langasetpriority: high -> release blocker
resolution: fixed -> (no value)
messages: + msg373028
2020-07-04 18:39:17skrahsetmessages: + msg372995
2020-07-04 17:56:09christian.heimessetnosy: - christian.heimes
2020-07-04 17:56:01christian.heimessetmessages: + msg372994
2020-07-04 16:46:18rhettingersetmessages: + msg372993
2020-07-04 09:02:33skrahsetmessages: + msg372984
2020-07-04 08:41:20skrahsetnosy: + skrah, christian.heimes
messages: + msg372983
2020-07-03 21:40:06rhettingersetstatus: closed -> open
priority: normal -> high

nosy: + petr.viktorin, rhettinger, lukasz.langa, pitrou
messages: + msg372962
2020-02-06 22:09:45vstinnersetpull_requests: + pull_request17758
2020-02-06 21:41:47vstinnersetmessages: + msg361505
2020-02-06 16:01:25vstinnersetpull_requests: + pull_request17755
2020-02-05 17:24:41vstinnersetmessages: + msg361442
2020-02-05 16:49:46vstinnersetpull_requests: + pull_request17742
2020-02-05 14:11:11vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg361432

stage: patch review -> resolved
2020-02-05 14:10:56vstinnersetmessages: + msg361431
2020-02-05 13:35:40vstinnersetpull_requests: + pull_request17741
2020-02-05 13:24:46vstinnersetmessages: + msg361429
2020-02-05 12:15:57vstinnersetpull_requests: + pull_request17739
2020-02-05 12:12:26vstinnersetmessages: + msg361425
2020-02-05 11:48:41vstinnersetpull_requests: + pull_request17738
2020-02-05 11:23:32vstinnersetmessages: + msg361422
2020-02-05 11:02:20vstinnersetpull_requests: + pull_request17737
2020-02-05 00:11:14vstinnersetmessages: + msg361384
2020-02-04 14:45:29vstinnersetpull_requests: + pull_request17720
2020-02-03 16:55:08vstinnersetmessages: + msg361311
2020-02-03 16:35:03vstinnersetpull_requests: + pull_request17705
2020-02-03 16:28:38vstinnersetmessages: + msg361307
2020-02-03 15:45:50vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request17704
2020-02-03 15:42:01vstinnercreate