Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The DISPATCH() macro is not as efficient as it could be (move PyThreadState.use_tracing) #87926

Closed
markshannon opened this issue Apr 7, 2021 · 48 comments
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) pending The issue will be closed if no feedback is provided performance Performance or resource usage topic-C-API

Comments

@markshannon
Copy link
Member

BPO 43760
Nosy @gvanrossum, @rhettinger, @scoder, @vstinner, @encukou, @ambv, @markshannon, @hroncok, @pablogsal, @miss-islington, @erlend-aasland, @Xtrem532
PRs
  • bpo-43760: Streamline dispatch sequence for machines without computed gotos. #25244
  • bpo-43760: Speed up check for tracing in interpreter dispatch #25276
  • [3.10] bpo-43760: Ensure that older Cython generated code compiles under 3.10. #28474
  • [3.10] bpo-43760: Ensure that older Cython generated code compiles under 3.10 #28498
  • bpo-43760: Document PyThreadState.use_tracing removal #28527
  • [3.10] bpo-43760: Document PyThreadState.use_tracing removal (GH-28527) #28529
  • bpo-43760: Add PyThreadState_EnterTracing() #28542
  • bpo-43760: Check for tracing using 'bitwise or' instead of branch in dispatch. #28723
  • bpo-43760: Rename _PyThreadState_DisableTracing() #29032
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-04-07.09:35:49.770>
    labels = ['interpreter-core', 'expert-C-API', '3.10', 'performance']
    title = 'The DISPATCH() macro is not as efficient as it could be (move PyThreadState.use_tracing)'
    updated_at = <Date 2021-11-08.17:16:37.213>
    user = 'https://github.com/markshannon'

    bugs.python.org fields:

    activity = <Date 2021-11-08.17:16:37.213>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core', 'C API']
    creation = <Date 2021-04-07.09:35:49.770>
    creator = 'Mark.Shannon'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43760
    keywords = ['patch']
    message_count = 46.0
    messages = ['390410', '390522', '390951', '393379', '393384', '393385', '393389', '393390', '393401', '393403', '393404', '393407', '393410', '393425', '393459', '393466', '393666', '402150', '402162', '402216', '402225', '402231', '402234', '402235', '402358', '402360', '402362', '402415', '402416', '402426', '402432', '402434', '402481', '402496', '402525', '402594', '402603', '403166', '403217', '404020', '404024', '404025', '404173', '404197', '404600', '405968']
    nosy_count = 13.0
    nosy_names = ['gvanrossum', 'rhettinger', 'jpe', 'scoder', 'vstinner', 'petr.viktorin', 'lukasz.langa', 'Mark.Shannon', 'hroncok', 'pablogsal', 'miss-islington', 'erlendaasland', 'Xtrem532']
    pr_nums = ['25244', '25276', '28474', '28498', '28527', '28529', '28542', '28723', '29032']
    priority = None
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue43760'
    versions = ['Python 3.10']

    @markshannon
    Copy link
    Member Author

    The DISPATCH() macro has two failings.

    1. Its check for tracing involves too much pointer chaser.

    2. The logic assumes that computed-gotos is the "fast path" which makes switch dispatch, and therefore Python on Windows unnecessarily slow.

    @markshannon
    Copy link
    Member Author

    New changeset 28d28e0 by Mark Shannon in branch 'master':
    bpo-43760: Streamline dispatch sequence for machines without computed gotos. (GH-25244)
    28d28e0

    @markshannon
    Copy link
    Member Author

    New changeset 9e7b207 by Mark Shannon in branch 'master':
    bpo-43760: Speed up check for tracing in interpreter dispatch (bpo-25276)
    9e7b207

    @hroncok
    Copy link
    Mannequin

    hroncok mannequin commented May 10, 2021

    I am afraid the "Speed up check for tracing in interpreter dispatch" brought some backwards incompatible changes:

    yappi/_yappi.c:1261:9: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘use_tracing’; did you mean ‘tracing’?
    1261 | ts->use_tracing = 1;
    | ^~~~~~~~~~~
    | tracing

    This is not mentioned in https://docs.python.org/3.10/whatsnew/3.10.html and I haven't noticed the use_tracing member being deprecated. I am confused. Should this happened?

    @hroncok
    Copy link
    Mannequin

    hroncok mannequin commented May 10, 2021

    @markshannon
    Copy link
    Member Author

    At yappi/_yappi.c:1261 sets an undocumented field on a CPython internal data structure.

    What did you believe that was supposed to do? use_tracing is not documented anywhere.

    We could add the field back and ignore it, but I doubt that would help you much.

    @markshannon
    Copy link
    Member Author

    If there is no C-API function that supports your needs, feel free to suggest one.

    @hroncok
    Copy link
    Mannequin

    hroncok mannequin commented May 10, 2021

    Disclaimer: I have not written the code nor do I understand what is trying to achieve. I merely collect the data and report the problems to the package maintainers.

    It just seems to me that a non-underscored (and hence public) member variable on a non-underscored (and hence public) structure should not suddenly go missing. Although, I am not familiar with the rules that define what part of the API falls under https://www.python.org/dev/peps/pep-0497/

    @hroncok
    Copy link
    Mannequin

    hroncok mannequin commented May 10, 2021

    scikit-learn: https://bugzilla.redhat.com/show_bug.cgi?id=1958976

    gcc: sklearn/cluster/_k_means_fast.c
    In file included from /usr/lib64/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1944,
    from /usr/lib64/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
    from /usr/lib64/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:4,
    from sklearn/cluster/_k_means_fast.c:635:
    /usr/lib64/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
    17 | #warning "Using deprecated NumPy API, disable it with " \
    | ^~~~~~~
    sklearn/cluster/_k_means_fast.c: In function ‘__Pyx_call_return_trace_func’:
    sklearn/cluster/_k_means_fast.c:1596:15: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘use_tracing’; did you mean ‘tracing’?
    1596 | tstate->use_tracing = 0;
    | ^~~~~~~~~~~
    | tracing
    sklearn/cluster/_k_means_fast.c:1602:15: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘use_tracing’; did you mean ‘tracing’?
    1602 | tstate->use_tracing = 1;
    | ^~~~~~~~~~~
    | tracing

    The usage comes from https://github.com/cython/cython/blob/master/Cython/Utility/Profile.c

    @encukou
    Copy link
    Member

    encukou commented May 10, 2021

    PEP-0497 is rejected; the active one is PEP-387, which says "backwards incompatibility" means preexisting code ceases to comparatively function after a change.
    So, this does look like a backwards-incompatible change.

    Unfortunately, not all of the C API is documented, so unless it's explicitly marked private, people will use it :(

    @markshannon
    Copy link
    Member Author

    But what does "use it" mean?
    What does setting tstate->use_tracing = 1 do?
    There is no documented behavior, so how do we know what assumptions people are making about what happens when they set some field to 1?

    As I said, we could keep the field and ignore it, but that seems worse.

    @rhettinger
    Copy link
    Contributor

    I don't think the PEP meant to restrict individual struct member such as this. For example, we were able to switch from byte code to word code without violating the intended rules. Consider asking Brett and Benjamin for clarification. I would think that if a new function were introduced to provide a reliable way to determine whether tracing was enabled, that would suffice for external packages to have a minimally disruptive migration path.

    @vstinner
    Copy link
    Member

    I understand that some projects manually call the profile and/or trace functions, and temporarily set use_tracing 0 while calling these functions.

    Some projects restore use_tracing to the correct value (compute the efficient value), some projects simply set use_tracing to 1.

    I see 3 use cases:

    • disable tracing temporarily (set use_tracing to 0)
    • reenable tracing (compute use_tracing to the correct value)
    • check if tracing is used (get use_tracing)

    We can add 3 functions:

    • PyThreadState_DisableTracing()
    • PyThreadState_EnableTracing()
    • PyThreadState_GetTracing()

    PyThreadState_EnableTracing(tstate) would do something like:

    tstate->cframe->use_tracing = (tstate->c_tracefunc || tstate->c_profilefunc);
    

    If we added these functions, I can then add an implementation for Python 3.9 and older to my https://github.com/pythoncapi/pythoncapi_compat project for backward compatibility.

    The problem is that some projects also increase temporarily ts->tracing. Since I would like to make PyThreadState opaque, I would prefer to hide this access behind a function call as well. Maybe we need an API to call profile and/or trace functions?

    --

    According to the bugzilla compiler errors:

    greenlet: https://bugzilla.redhat.com/show_bug.cgi?id=1957784

    It has already been fixed:

    It uses:

    • "tstate->use_tracing = 0;"
    • "tstate->use_tracing = (tstate->tracing <= 0 && (...)"

    dipy: https://bugzilla.redhat.com/show_bug.cgi?id=1958203

    It uses:

    • "tstate->use_tracing = 0;"
    • "tstate->use_tracing = 1;"
    • "tstate->use_tracing = (tstate->c_profilefunc || (...)"
    • "return tstate->use_tracing && retval;"
    • "if (tstate->use_tracing) {"

    yappi: https://bugzilla.redhat.com/show_bug.cgi?id=1958896

    • "ts->use_tracing = 1;"
    • "ts->use_tracing = 0;"

    smartcols: https://bugzilla.redhat.com/show_bug.cgi?id=1958938

    It uses "tstate->use_tracing = 0;".

    scikit-learn: https://bugzilla.redhat.com/show_bug.cgi?id=1958976

    It uses:

    • "tstate->use_tracing = 0;"
    • "tstate->use_tracing = 1;"

    The usage comes from https://github.com/cython/cython/blob/master/Cython/Utility/Profile.c

    Simplified code:

    --------------

    static int __Pyx_TraceSetupAndCall(...)
    {
        ...
        tstate->tracing++;
        tstate->use_tracing = 0;
    if (tstate->c_tracefunc)
        retval = tstate->c_tracefunc(tstate->c_traceobj, *frame, PyTrace_CALL, NULL) == 0;
    if (retval && tstate->c_profilefunc)
        retval = tstate->c_profilefunc(tstate->c_profileobj, *frame, PyTrace_CALL, NULL) == 0;
    
    tstate-\>use_tracing = (tstate-\>c_profilefunc ||
                           (CYTHON_TRACE && tstate-\>c_tracefunc));
    tstate-\>tracing--;
    ...
    

    }

      int __Pyx_use_tracing = 0;
    
      #define __Pyx_TraceCall(funcname, srcfile, firstlineno, nogil, goto_error)             \
      if (nogil) {                                                                           \
          if (CYTHON_TRACE_NOGIL) {                                                          \
              PyThreadState *tstate;                                                         \
              PyGILState_STATE state = PyGILState_Ensure();                                  \
              tstate = __Pyx_PyThreadState_Current;                                          \
              if (unlikely(tstate->use_tracing) && !tstate->tracing &&                       \
                      (tstate->c_profilefunc || (CYTHON_TRACE && tstate->c_tracefunc))) {    \
                  __Pyx_use_tracing = __Pyx_TraceSetupAndCall(&$frame_code_cname, &$frame_cname, tstate, funcname, srcfile, firstlineno);  \
              }                                                                              \
              PyGILState_Release(state);                                                     \
              if (unlikely(__Pyx_use_tracing < 0)) goto_error;                               \
          }                                                                                  \
      } else {                                                                               \
          PyThreadState* tstate = PyThreadState_GET();                                       \
          if (unlikely(tstate->use_tracing) && !tstate->tracing &&                           \
                  (tstate->c_profilefunc || (CYTHON_TRACE && tstate->c_tracefunc))) {        \
              __Pyx_use_tracing = __Pyx_TraceSetupAndCall(&$frame_code_cname, &$frame_cname, tstate, funcname, srcfile, firstlineno);  \
              if (unlikely(__Pyx_use_tracing < 0)) goto_error;                               \
          }                                                                                  \
      }

    @vstinner vstinner added the 3.10 only security fixes label May 10, 2021
    @vstinner vstinner changed the title The DISPATCH() macro is not as efficient as it could be. The DISPATCH() macro is not as efficient as it could be (move PyThreadState.use_tracing) May 10, 2021
    @vstinner vstinner added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API labels May 10, 2021
    @rhettinger
    Copy link
    Contributor

    +1 for Victor's suggestions. It provides a reasonable way forward without locking in eval-loop implementation details that weren't intended to be public and frozen in time.

    @hroncok
    Copy link
    Mannequin

    hroncok mannequin commented May 11, 2021

    A Cython issue report: cython/cython#4153

    @scoder
    Copy link
    Contributor

    scoder commented May 11, 2021

    For the same reason that motivated this ticket, I think the functions should be inline functions. They should also take the current thread-state as argument, because that's probably known on the caller side already.

    I guess a macro would be fine, too. :)

    Cython previously used "use_tracing" directly because it needs to implement the exact same tracing/profiling behaviour as CPython, regardless of who called a Cython implemented function (Cython or CPython).

    Naming nit: Get/Is/UsesTracing?

    Also, given that a common use case seems to be "make sure tracing is disabled, do something, enable tracing if it was enabled", I think DisableTracing() should return the previous state.

    @scoder
    Copy link
    Contributor

    scoder commented May 14, 2021

    I just noticed that new C-API functions are probably useless for Cython since I think it will have to maintain the CFrame stack, so not to enable "use_tracing" for the (Python) caller but the current (Cython) function. This then means that we own The current CFrame as well as its "use_tracing" field and don't need any help from CPython in order to change the state.

    I'm not sure if this is any different for other users of the "use_tracing" field.

    @scoder scoder added the performance Performance or resource usage label May 14, 2021
    @vstinner
    Copy link
    Member

    The commit 28d28e0 caused a performance regression on Windows which is currently blocking the Python 3.10.0 final release: bpo-45116.

    Moroever, this issue introduced a incompatible C API change which is not documented in What's New in Python 3.10, and it doesn't provide any solution for projects broken by these changes. So far, the following projects are known to be broken by the change:

    • Cython
    • greenlet (fixed)
    • dipy
    • yappi
    • smartcols

    Would it be possible to:

    • Bare minimum: document the change in What's in Python 3.10?

    • Provide a solution to broken project? If possible, solution working on old and new Python versions. Maybe a compatibility functions can added to https://github.com/pythoncapi/pythoncapi_compat if needed.

    • Maybe revert the change in Python 3.10 since a full solution may require additional work.

    By the way, I'm also disappointed that nothing was done to enhance the situation for 4 months (since the first known projects were reported here in May).

    I raise the priority to release blocker to make more people aware of the situation.

    @hroncok
    Copy link
    Mannequin

    hroncok mannequin commented Sep 19, 2021

    Also Numba is broken: https://bugzilla.redhat.com/show_bug.cgi?id=2005686

    @vstinner
    Copy link
    Member

    A Cython issue report: cython/cython#4153

    Cython 0.29.24 released at July 13, 2021 with a fix (2 commits):

    The Cython master branch was fixed as well: see cython/cython#4153

    @markshannon
    Copy link
    Member Author

    IMO those failures are bugs in the projects listed not in CPython.

    Relying on the exact meaning, or even the existence of an undocumented field of a C struct is not, nor ever has been, safe.
    The user of the field is assuming a meaning that is not known to the developers of the library/application, so there is no way for them to preserve that meaning.
    This is not specific to CPython, but applies to any C API.

    The code in the examples given above are using tstate->use_tracing assuming that its meaning is to determine whether tracing is turned on or not.
    However, it has no such meaning. It is an internal performance optimization to avoid checking both tstate->c_tracefunc and tstate->c_profilefunc.
    It is tstate->tracing that determines whether tracing is active or not.

    I propose adding back tstate->use_tracing as a copy of tstate->cframe->us_tracing. Any writes to tstate->use_tracing will be ignored, but any code that also sets tstate->tracing, as the Cython code does, will work correctly. Any code that reads tstate->use_tracing will work correctly.

    I'm minded to prefix all the names of all fields in all C structs that happen to be included by Python.h with "if_you_use_this_then_we_will_break_your_code_in_some_way_that_will_ruin_your_reputation_as_a_library_developer__maybe_not_tomorrow_maybe_not_next_year_but_someday"
    Although that might prove tricky with a 80 character line limit :)

    My attempts to avoid this happening again next year, and the year after that, and...
    https://bugs.python.org/issue45247
    cython/cython#4382

    @scoder
    Copy link
    Contributor

    scoder commented Sep 20, 2021

    The code in the examples given above are using tstate->use_tracing assuming that its meaning is to determine whether tracing is turned on or not.

    No, actually not. It is using the field in the same way as CPython, simply because most of this code was originally copied from CPython, and we also copied the optimisation of avoiding to check the other fields (for the obvious reason of being an optimisation).

    I propose adding back tstate->use_tracing as a copy of tstate->cframe->us_tracing.

    Cython 0.29.24 has already been adapted to this change and will use the new field in CPython 3.10b1+.

    Any code that reads tstate->use_tracing will work correctly.

    Any code that reads and /writes/ the field would probably also continue to work correctly, which is what older Cython versions did.

    if_you_use_this_then_we_will_break_your_code_in_some_way_that_will_ruin_your_reputation_as_a_library_developer…

    The thing is, new APIs can only be added to new CPython releases. Supporting features in older CPython versions (currently 2.7+) means that we always *have to* use the existing fields, and can only switch to new APIs by duplicating code based on a PY_VERSION_HEX preprocessor check. Even if a new low-latency profiling API was added in CPython 3.11, we'd have to wait until there is at least an alpha release that has it before enabling this code switch.

    And if the new API proves to be slower, we may end up keeping the old code around and adding a C compile-time configuration option for users to enable (or disable) its use. Cython has lots of those these days, mostly to support the different C-API capabilities of different Python implementations, e.g. to take advantage of the PyLong or PyUnicode internals if available, and use generic C-API calls if not.

    @jpe
    Copy link
    Mannequin

    jpe mannequin commented Sep 20, 2021

    Is adding the field back an option at this point? It would mean that extensions compiled against the release candidates may not be binary compatible with the final release

    My take is that use_tracing is an implementation and version dependent field, and that binary compatibility will be maintained for a specific release (e.g. 3.10) but that there's no assurance that it will be there in the next release -- though these things tend not to change. I also regard generated cython code as only being valid for the releases that a specific cython version supports.

    Code and API's change slowly, but eventually they do change.

    @pablogsal
    Copy link
    Member

    Also, just to clarify, I also opened PR 28498 to discuss the possibility of going ahead, I still don't want to move on without consensus.

    @pablogsal
    Copy link
    Member

    Also, I personally thing there is absolutely no guarantee that Cython code generated for 3.9 should work for 3.10 and the thread state is a private structure that has undocumented fields and is not part of the stable API nor the limited API so, tstate->tracing disappearing is totally withing the guarantees between Python versions.

    @pablogsal
    Copy link
    Member

    I discussed this particular instance with the Steering Council and the conclusion was that this field (use_tracing) is considered an implementation detail and therefore its removal it's justified so we won't be restoring it.

    I'm therefore closing PR28498

    Notice that this decision only affects this particular issue and should not be generalized to other fields or structures. We will try to determine and open a discusion in the future about what is considered public/private in these ambiguous cases and what can users expect regarding stability and backwards compatibility.

    @pablogsal
    Copy link
    Member

    I'm removing the release blocker as per above, feel free to close of there is nothing else to discuss or act on here.

    @encukou
    Copy link
    Member

    encukou commented Sep 22, 2021

    The ABI is not broken, the only thing that this PR change is the size of the struct. All the offsets to the members are the same and therefore will be valid in any compiled code.

    I'll just note that a change in struct size does technically break ABI, since *arrays* of PyThreadState will break.

    So the size shouldn't be changed in RCs or point releases. (But since it's not part of stable ABI, it was OK to change it for 3.10.)

    We will try to determine and open a discussion in the future about what is considered public/private in these ambiguous cases and what can users expect regarding stability and backwards compatibility.

    Please keep me in the loop; I'm working on summarizing my understanding of this (in a form that can be added to the docs if approved).

    @pablogsal
    Copy link
    Member

    I'll just note that a change in struct size does technically break ABI, since *arrays* of PyThreadState will break.

    Not that matters now because we are not proceeding but just to clarify why I deemed this acceptable: arrays of PyThreadState is extremelly unlikely in extensions because we pass it by Pointer and is always manipulated by pointer. To place it in an array you either need to create one or copy one into an array, which I cannot see what would be the point because the fields are mainly pointers that would become useless as the interpreter will not update anything

    @pablogsal
    Copy link
    Member

    Also, I checked the DWARF tree of all existing wheels for 3.10 on PyPI (there aren't many) and none had anything that uses the size of the struct.

    @vstinner
    Copy link
    Member

    I created PR 28527 to document PyThreadState.use_tracing removal and explain how to port existing code to Python 3.10.

    @vstinner
    Copy link
    Member

    New changeset f4ccb79 by Victor Stinner in branch 'main':
    bpo-43760: Document PyThreadState.use_tracing removal (GH-28527)
    f4ccb79

    @ambv
    Copy link
    Contributor

    ambv commented Sep 23, 2021

    New changeset 5557689 by Miss Islington (bot) in branch '3.10':
    bpo-43760: Document PyThreadState.use_tracing removal (GH-28527) (GH-28529)
    5557689

    @vstinner
    Copy link
    Member

    Analysis use use_tracing usage in 3rd part code.

    I see two main ways to add C API functions covering these use cases:

    • Provide high-level functions like "call a trace function" (disable tracing, call trace function, reenable tracing, increment/decrement tstate->tracing)
    • Provide low-level functions just to control use_tracing: make PyThreadState structure opaque, but stil make the assumption that it is possible to disable temporarily tracing and profiling (in practice, it's implemented as use_tracing=0).

    (*) greenlet

    greenlet disables temporarily tracing in g_calltrace(), and then restore it, to call a "tracing" function:
    ---
    tstate->tracing++;
    TSTATE_USE_TRACING(tstate) = 0;
    retval = PyObject_CallFunction(tracefunc, "O(OO)", event, origin, target);
    tstate->tracing--;
    TSTATE_USE_TRACING(tstate) =
    (tstate->tracing <= 0 &&
    ((tstate->c_tracefunc != NULL) || (tstate->c_profilefunc != NULL)));
    ---

    It also saves and then restores use_tracing value:
    ---

    ts__g_switchstack_use_tracing = tstate->cframe->use_tracing;
    (...)
    tstate->cframe->use_tracing = ts__g_switchstack_use_tracing;

    => it can use PyThreadState_IsTracing(), PyThreadState_DisableTracing() and PyThreadState_ResetTracing().

    These functions don't handle "tstate->tracing++;" and "tstate->tracing--;" which is also used by greenlet.

    greenlet also saves and restores tstate->cframe:
    https://github.com/python-greenlet/greenlet/blob/master/src/greenlet/greenlet.c

    (*) dipy

    Code generated by Cython.

    (*) smartcols

    Code generated by Cython.

    (*) yappi

    yappi is Python profiler.

    yappi sets use_tracing to 1 when it sets its profile function: "ts->c_profilefunc = _yapp_callback;".

    It sets use_tracing to 0 when it clears the profile function: "ts->c_profilefunc = NULL;". That's wrong, it ignores the trace function.

    PyEval_SetProfile() cannot be used because yappi works on a PyThreadState (ts).

    Code: https://github.com/sumerc/yappi/blob/master/yappi/_yappi.c

    It can use PyThreadState_DisableTracing() and PyThreadState_ResetTracing(). Maybe a PyThreadState_SetProfile(tstate, func) function would fit better yappi's use case.

    (*) Cython

    Cython defines 2 compatibility functions:

    • __Pyx_IsTracing(tstate, check_tracing, check_funcs): it can check c_profilefunc and c_tracefunc
    • __Pyx_SetTracing(tstate, enable)

    Code: https://github.com/cython/cython/blob/0.29.x/Cython/Utility/Profile.c

    The code is quite complicated. In short, it checks if tracing and/or profiling is enabled. If it's enabled, it disables temporarily tracing (use_tracing=0) while calling trace and profile functions.

    => it requires PyThreadState_IsTracing(), PyThreadState_DisableTracing() and PyThreadState_ResetTracing().

    @gvanrossum
    Copy link
    Member

    Ah, I think the docs need to be clarified a bit. Here's what I was missing:

    The key thing to know here is that there are three state variables; c_tracefunc, c_profilefunc (on the thread state), and use_tracing (on the C frame).

    Normally use_tracing is initialized to false if both functions are NULL, and true otherwise (if at least one of the functions is set).

    Disabling means setting use_tracing to false regardless. Resetting means setting use_tracing to the value computed above.

    There's also a fourth variable, tstate->tracing, which indicates whether a tracing function is active (i.e., it has been called and hasn't exited yet). This can be incremented and decremented. But none of the proposed APIs affect it.

    Would it be reasonable to just put these APIs in pythoncapi_compat, instead of in the stdlib? (It would be yet one more selling point for people to start using that. :-)

    @pablogsal
    Copy link
    Member

    New changeset 78184fa by Pablo Galindo (Miss Islington (bot)) in branch '3.10':
    bpo-43760: Document PyThreadState.use_tracing removal (GH-28527) (GH-28529)
    78184fa

    @markshannon
    Copy link
    Member Author

    New changeset bd627eb by Mark Shannon in branch 'main':
    bpo-43760: Check for tracing using 'bitwise or' instead of branch in dispatch. (GH-28723)
    bd627eb

    @vstinner
    Copy link
    Member

    New changeset 547d26a by Victor Stinner in branch 'main':
    bpo-43760: Add PyThreadState_EnterTracing() (GH-28542)
    547d26a

    @vstinner
    Copy link
    Member

    bpo-43760: Add PyThreadState_EnterTracing() (GH-28542)

    I created changes to use it:

    @vstinner
    Copy link
    Member

    PyThreadState.cframe.use_tracing format changed again: set value set to 0 or 255.
    bd627eb

    @vstinner
    Copy link
    Member

    @vstinner
    Copy link
    Member

    New changeset 034f607 by Victor Stinner in branch 'main':
    bpo-43760: Rename _PyThreadState_DisableTracing() (GH-29032)
    034f607

    @vstinner
    Copy link
    Member

    I created #29121 to add PyThreadState_SetProfile() and PyThreadState_SetTrace() functions.

    @vstinner
    Copy link
    Member

    vstinner commented Nov 8, 2021

    greenlet now uses PyThreadState_EnterTracing() and PyThreadState_LeaveTracing() rather than accessing directly use_tracing:

    python-greenlet/greenlet@9b49da5

    On Python 3.10, it implements these functions with:
    ---

    // bpo-43760 added PyThreadState_EnterTracing() to Python 3.11.0a2
    #if PY_VERSION_HEX < 0x030B00A2 && !defined(PYPY_VERSION)
    static inline void PyThreadState_EnterTracing(PyThreadState *tstate)
    {
        tstate->tracing++;
    #if PY_VERSION_HEX >= 0x030A00A1
        tstate->cframe->use_tracing = 0;
    #else
        tstate->use_tracing = 0;
    #endif
    }
    #endif
    
    // bpo-43760 added PyThreadState_LeaveTracing() to Python 3.11.0a2
    #if PY_VERSION_HEX < 0x030B00A2 && !defined(PYPY_VERSION)
    static inline void PyThreadState_LeaveTracing(PyThreadState *tstate)
    {
        tstate->tracing--;
        int use_tracing = (tstate->c_tracefunc != NULL
                           || tstate->c_profilefunc != NULL);
    #if PY_VERSION_HEX >= 0x030A00A1
        tstate->cframe->use_tracing = use_tracing;
    #else
        tstate->use_tracing = use_tracing;
    #endif
    }
    #endif

    This code was copied from my https://github.com/pythoncapi/pythoncapi_compat project. (I wrote the greenlet change.)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel
    Copy link
    Member

    Is there anything left to do here?

    @iritkatriel iritkatriel added the pending The issue will be closed if no feedback is provided label Sep 13, 2022
    @vstinner
    Copy link
    Member

    IMO we are done, and I close the issue. See the issue #84128 for making PyThreadState opaque.

    The DISPATCH() macro is not as efficient as it could be

    This part was fixed early. Following comments were more about PyThreadState incompatible changes, how to migrate existing C extensions to Python 3.11 and how to design a new API which no longer access directly PyThreadState changes. In fact, that's already the topic of the issue #84128 that I created in 2020 and so we can continue the discussion there.

    The main change related to PyThreadState was the use_tracing member which was moved. I added PyThreadState_EnterTracing() and PyThreadState_LeaveTracing() functions to Python 3.11 for that and so projects already use it.

    I created #29121 to add PyThreadState_SetProfile() and PyThreadState_SetTrace() functions.

    I abandoned this PR. @pablogsal added PyEval_SetProfileAllThreads() and PyEval_SetTraceAllThreads() functions to Python 3.12 (commit e34c82a) which should fit the use case, with a different design.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) pending The issue will be closed if no feedback is provided performance Performance or resource usage topic-C-API
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants