classification
Title: [C API] Add getter functions for PyFrameObject and maybe move PyFrameObject to the internal C API
Type: Stage: patch review
Components: C API Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, scoder, vstinner
Priority: normal Keywords: patch

Created on 2020-04-28 13:17 by vstinner, last changed 2020-05-08 06:51 by scoder.

Pull Requests
URL Status Linked Edit
PR 19755 merged vstinner, 2020-04-28 14:07
PR 19756 merged vstinner, 2020-04-28 14:45
PR 19757 merged vstinner, 2020-04-28 15:27
PR 19764 closed vstinner, 2020-04-28 17:27
PR 19765 merged vstinner, 2020-04-28 17:33
Messages (9)
msg367527 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-28 13:17
Similarly to bpo-39573 (make PyObject opaque) and bpo-39947 (make PyThreadState opaque), I propose to add getter functions to access PyFrameObject members without exposing the PyFrameObject structure in the C API.

The first step is to identify common usage of the PyFrameObject structure inside CPython code base to add getter functions, and maybe a few setter functions as well.

frameobject.h is not part of Python.h, but it's part of the public C API. The long term plan is to move PyFrameObject structure to the internal C API to hide implementation details from the public C API.
msg367532 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-28 14:32
New changeset 7c59d7c9860cdbaf4a9c26c9142aebd3259d046e by Victor Stinner in branch 'master':
bpo-40421: Add pyframe.h header file (GH-19755)
https://github.com/python/cpython/commit/7c59d7c9860cdbaf4a9c26c9142aebd3259d046e
msg367535 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-28 15:07
New changeset b8f704d2190125a7750b50cd9b67267b9c20fd43 by Victor Stinner in branch 'master':
bpo-40421: Add Include/cpython/code.h header file (GH-19756)
https://github.com/python/cpython/commit/b8f704d2190125a7750b50cd9b67267b9c20fd43
msg367545 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-28 17:01
New changeset a42ca74fa30227e2f89a619332557cf093a937d5 by Victor Stinner in branch 'master':
bpo-40421: Add PyFrame_GetCode() function (GH-19757)
https://github.com/python/cpython/commit/a42ca74fa30227e2f89a619332557cf093a937d5
msg367550 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-28 17:45
I looked how Cython uses PyFrameObject:

* read f_lasti
* read/write f_back
* write f_lineno
* read f_localsplus
* read/write f_trace

Details:

* Cython/Debugger/libpython.py: code using the Python API of gdb to read PyFrameObject.f_lasti. It it used to compute the line number of a frame. The python_step() function puts a watch point on "f->f_lasti".
* Cython/Utility/Coroutine.c: set PyFrameObject.f_back using "f->f_back = tstate->frame;" and "Py_CLEAR(f->f_back);".
* Cython/Utility/ModuleSetupCode.c, __Pyx_PyFrame_SetLineNumber(): set PyFrameObject.f_lineno member. The limited C API flavor of this function does nothing, since this member cannot be set in the limited C API.
* Cython/Utility/ObjectHandling.c, __Pyx_PyFrame_GetLocalsplus(): complex implementation to access PyFrameObject.f_localsplus:

  // Initialised by module init code.
  static size_t __pyx_pyframe_localsplus_offset = 0;

  #include "frameobject.h"
  // This is the long runtime version of
  //     #define __Pyx_PyFrame_GetLocalsplus(frame)  ((frame)->f_localsplus)
  // offsetof(PyFrameObject, f_localsplus) differs between regular C-Python and Stackless Python.
  // Therefore the offset is computed at run time from PyFrame_type.tp_basicsize. That is feasible,
  // because f_localsplus is the last field of PyFrameObject (checked by Py_BUILD_ASSERT_EXPR below).
  #define __Pxy_PyFrame_Initialize_Offsets()  \
    ((void)__Pyx_BUILD_ASSERT_EXPR(sizeof(PyFrameObject) == offsetof(PyFrameObject, f_localsplus) + Py_MEMBER_SIZE(PyFrameObject, f_localsplus)), \
     (void)(__pyx_pyframe_localsplus_offset = ((size_t)PyFrame_Type.tp_basicsize) - Py_MEMBER_SIZE(PyFrameObject, f_localsplus)))
  #define __Pyx_PyFrame_GetLocalsplus(frame)  \
    (assert(__pyx_pyframe_localsplus_offset), (PyObject **)(((char *)(frame)) + __pyx_pyframe_localsplus_offset))

* Cython/Utility/Profile.c, __Pyx_TraceLine(): read PyFrameObject.f_trace.

* Cython/Utility/Profile.c, __Pyx_TraceSetupAndCall(): set PyFrameObject.f_trace.
msg367604 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-29 01:28
New changeset 703647732359200c54f1d2e695cc3a06b9a96c9a by Victor Stinner in branch 'master':
bpo-40421: Add PyFrame_GetBack() function (GH-19765)
https://github.com/python/cpython/commit/703647732359200c54f1d2e695cc3a06b9a96c9a
msg368342 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2020-05-07 14:13
"maybe a few setter functions as well"
Please don't add any setter functions, that would a major change to the VM and would need a PEP.

Also, could you remove PyFrame_GetLastInstr(PyFrameObject *frame)?
The only purpose of `f_lasti` is to get the line number and that can be done directly via `PyFrame_GetLineNumber(PyFrameObject *frame)`

Perhaps Stefan can tell us why Cython needs to access the internals of the frame object.
msg368350 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-07 15:13
> Please don't add any setter functions, that would a major change to the VM and would need a PEP.

There is no such "major change". PyFrameObject structure was fully exposed in the public C API since the earliest Python version.

I don't see how adding setter is a major change, since it's already possible to directly modify *any* field of PyFrameObject without any kind of protection.

My plan is to have two milestones:

A) Make the structure opaque, so it's not longer possible to directly access it.

B) Deprecate or remove a few getter or setter functions, or move them to the internal C API.


I don't think that moving directly to step (B) is a realistic migration plan. So far, nobody analyzed all C extensions on PyPI to see how PyFrameObject is accessed.

I prefer to move slowly, step by step.

Breaking C extensions which currently *modify* directly PyFrameObject on purpose doesn't seem like a reasonable option to me.

--

I'm trying to distinguish functions which should be "safe"/"portable" from the ones which may be too "CPython specific":

* I added Include/pyframe.h which is included by Include/Python.h for "safe"/"portable" functions:

  * PyFrame_GetLineNumber()
  * PyFrame_GetCode()

* All other functions are added to Include/cpython/frameobject.h which is excluded from the limited C API:

  * PyFrame_GetBack()

Note: Compared to accessing directly PyFrameObject.f_code, PyFrame_GetCode() also avoids the issue of borrowed references since it returns a strong reference.

PyFrame_GetBack() looks specific to the current implementation of CPython. Another implementation might decide to not chain frames. Or maybe don't provide an easy way to traverse the chain of frames. Or at least, it might be a different API than PyFrame_GetBack().

--

> Also, could you remove PyFrame_GetLastInstr(PyFrameObject *frame)?

I didn't add it :-) It's the pending PR 19764.

I didn't merge it since it's unclear to me if it's a good idea to directly expose it or not. Cython uses it, but Cython also abuses CPython internals in general, usually for best performances :-)

*Maybe* one compromise would be to add a private _PyFrame_GetLastInstr() to ease migration to step (A) (replace direct access to the structure with function calls).
msg368424 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2020-05-08 06:51
Adding to the list above:

"f_lasti" is used in "libpython.py", which is an almost exact copy of the one in CPython's "Tools/gdb/" for debugging Python code in gdb. If that implementation changes, we can probably adapt, especially since it uses runtime generated code. I don't find a C function for this necessary at this point, given that CPython will also need to access its own internals here in some way.

"f_localsplus" is used in the FASTCALL implementation and seems specific to Py3.6/7(?) which lacks the vectorcall protocol. So I guess it won't impact 3.8+ anymore. (The reason why its access is so complex is that StacklessPython has a slightly different frame struct, but we need to be binary(!) compatible with both.)

"f_back" is used in the coroutine implementation for correct exception traceback building, and the usage is pretty much copied from CPython. I doubt that there is much use for setting it outside of "code that tries to reimplement CPython behaviour", but Cython has to do exactly that here.

"f_lineno" is also used for traceback generation, because Cython creates frames when an exception occurs and needs to tell it what the current code line is. It's only used on newly created frames, although I can't guarantee that that will never change. Being able to read and set it seems reasonable.

"f_trace" is obviously used for tracing/profiling, and there isn't currently a good C-API for that, besides talking to frame struct fields (which is quite efficient when it comes to performance).
History
Date User Action Args
2020-05-08 06:51:12scodersetmessages: + msg368424
2020-05-07 15:13:33vstinnersetmessages: + msg368350
2020-05-07 14:13:15Mark.Shannonsetnosy: + Mark.Shannon, scoder
messages: + msg368342
2020-04-29 01:28:50vstinnersetmessages: + msg367604
2020-04-28 17:45:57vstinnersetmessages: + msg367550
2020-04-28 17:33:15vstinnersetpull_requests: + pull_request19086
2020-04-28 17:27:58vstinnersetpull_requests: + pull_request19085
2020-04-28 17:01:39vstinnersetmessages: + msg367545
2020-04-28 15:27:56vstinnersetpull_requests: + pull_request19079
2020-04-28 15:07:16vstinnersetmessages: + msg367535
2020-04-28 14:45:35vstinnersetpull_requests: + pull_request19077
2020-04-28 14:32:52vstinnersetmessages: + msg367532
2020-04-28 14:07:09vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request19076
2020-04-28 13:17:50vstinnercreate