classification
Title: Move PyThreadState into Include/internal/pycore_pystate.h
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, ncoghlan, scoder, vstinner
Priority: normal Keywords:

Created on 2019-02-09 02:00 by eric.snow, last changed 2019-02-18 20:33 by eric.snow.

Messages (7)
msg335122 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-09 02:00
(also see issue #35886)

In November Victor created the Include/cpython directory and moved a decent amount of public (but not limited) API there.  This included the PyThreadState struct.  I'd like to move it into the "internal" headers since it is somewhat coupled to the internal runtime implementation.  The alternative is extra complexity.  I ran into this while working on another issue.

There are 2 problems, however, with PyThreadState which make a move potentially challenging.  In fact, they could be (but probably aren't) cause for moving it back to Include/pystate.h.

1. the docs say: "The only public data member is PyInterpreterState *interp, which points to this thread’s interpreter state."
2. the struct has 6 fields that are exposed (likely unintentionally) in "stable" header files via the following macros:

  # Include/object.h
  Py_TRASHCAN_SAFE_BEGIN
  Py_TRASHCAN_SAFE_END
  Include.ceval.h
  Py_EnterRecursiveCall
  Py_LeaveRecursiveCall
  _Py_MakeRecCheck
  Py_ALLOW_RECURSION
  Py_END_ALLOW_RECURSION

I'm not sure how that factors into the stable ABI (PyThreadState wasn't ever guarded by Py_LIMITED_API).  However, keep in mind that Victor already moved PyThreadState out of the "stable" headers in November.

I'm fine with sorting out the situation with the macros if that's okay to do.  Likewise the promised field ("interp") should be solvable one way or another (e.g. remove it or use a private struct).

...or we could do nothing or (if the change in Novemver is a problem) move it back to Include/pystate.h.
msg335465 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-02-13 17:06
Same ratione than for PyInterpreterState: https://bugs.python.org/issue35886#msg335464

Except that I expect that a few more projects rely on PyThreadState fields. Maybe not. It's hard to guess :-(

I mean that I'm ok-ish with the change but it should be carefully prepared and announced.
msg335677 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-02-16 09:48
From Cython's point of view, the important fields in PyThreadState are the tracing/profiling and exception related ones. We're not using anything else. Users can explicitly opt out of the access to the exception fields by defining a C macro, at the expense of a bit of performance. I doubt that anyone is really doing that, though, because, why would they?

I'm not sure if we could avoid direct field access for profiling and tracing at all. You can look at the code, it's a whole bunch of macros that mimic what CPython does in ceval:

https://github.com/cython/cython/blob/master/Cython/Utility/Profile.c

I should note that both features are not enabled by default. Users have to explicitly enable profiling support via a directive, and even doubly opt in to tracing by enabling a compiler directive and a C macro. So production code will often not rely on these fields, but developers would want to use them and some might keep profiling support enabled also in their production code. My guess is that tracing is most often used for test coverage analysis.
msg335678 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-02-16 10:00
Oh, and I forgot the new trashcan support. Cython will also start to use that in its next release, so that adds the trashcan related attributes to the list.

https://github.com/cython/cython/pull/2842/files
msg335859 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-18 19:31
@Stefan, is it a problem for Cython if the relevant fields are exposed via C-API functions rather than directly on PyThreadState?
msg335864 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-02-18 20:17
Well … yes.

The exception fields are performance critical, and we try hard to make them visible to the C compiler so that swapping around exception state eats up as little CPU time as possible.

You could argue that profiling and tracing are less critical, but any nanosecond that is avoided while not tracing a function adds up to making the rest of the program faster, so I'd argue that that's performance critical, too. Profiling definitely is, because it should have as little impact on the code profile as possible. There is a huge difference between having the CPU pre-fetch a pointer and looking at the value, compared to calling into a C function and guessing what the result might be.

The trashcan is only used during deallocation, so … well, I guess it could be replaced by a different API, but that's a bit tricky due to the bracket nature of the current macros.

I also just noticed that "Py_EnterRecursiveCall" and "Py_LeaveRecursiveCall" are on your list. We use them in our inlined call helper functions, which mostly duplicate CPython functionality. Looking at these macros now, I find it a bit annoying that they call "PyThreadState_GET()" directly, rather than accepting one as input. Looking up the current thread-state is a non-local, atomic operation that can be surprisingly costly, and I've invested quite some work into reducing these lookups in Cython. Although it's probably not too bad around a call into an external function…

So, yeah, we do care about the thread state being readily available. :)

Could you explain what benefit you are expecting from hiding the thread state?
msg335865 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-18 20:33
Thanks for clarifying. :)

On Mon, Feb 18, 2019 at 1:17 PM Stefan Behnel <report@bugs.python.org> wrote:
> The exception fields are performance critical, and we try hard to make them visible to the C compiler so that swapping around exception state eats up as little CPU time as possible.
>
> You could argue that profiling and tracing are less critical, but any nanosecond that is avoided while not tracing a function adds up to making the rest of the program faster, so I'd argue that that's performance critical, too. Profiling definitely is, because it should have as little impact on the code profile as possible. There is a huge difference between having the CPU pre-fetch a pointer and looking at the value, compared to calling into a C function and guessing what the result might be.

Yeah, I had a feeling that was the case. :)  There might be decent
approaches to avoiding a performance hit on this, but I'm not sure
it's worth the extra complexity.  And I'm not terribly motivated to
make the effort personally. :)

> The trashcan is only used during deallocation, so … well, I guess it could be replaced by a different API, but that's a bit tricky due to the bracket nature of the current macros.

Yep, definitely tricky.

> I also just noticed that "Py_EnterRecursiveCall" and "Py_LeaveRecursiveCall" are on your list. We use them in our inlined call helper functions, which mostly duplicate CPython functionality. Looking at these macros now, I find it a bit annoying that they call "PyThreadState_GET()" directly, rather than accepting one as input. Looking up the current thread-state is a non-local, atomic operation that can be surprisingly costly, and I've invested quite some work into reducing these lookups in Cython. Although it's probably not too bad around a call into an external function…

Hmm, perhaps that's something we could address separately?

> Could you explain what benefit you are expecting from hiding the thread state?

This isn't so much about the thread state specifically or about
necessarily hiding *all* the thread state.  Rather it's about a
general effort to reduce the amount of private API we are exposing
publicly.  In this case I was looking at PyInterpreterState (#35886)
and opening a similar issue at the same time for PyThreadState made
sense.

Given your response it's clear that usage of PyThreadState doesn't
match the documentation very well.  That would be worth addressing one
way or another, so I'm glad we're having this conversation. :)
History
Date User Action Args
2019-02-18 20:33:23eric.snowsetmessages: + msg335865
2019-02-18 20:17:00scodersetmessages: + msg335864
2019-02-18 19:31:38eric.snowsetmessages: + msg335859
2019-02-16 10:00:46scodersetmessages: + msg335678
2019-02-16 09:48:09scodersetnosy: + scoder
messages: + msg335677
2019-02-13 17:06:53vstinnersetmessages: + msg335465
2019-02-09 02:00:19eric.snowcreate