Title: Limited API support for Py_buffer
Type: enhancement Stage: patch review
Components: C API Versions: Python 3.11
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, erlendaasland, petr.viktorin, pitrou, serhiy.storchaka, skrah, vstinner
Priority: normal Keywords: patch

Created on 2021-10-13 09:55 by christian.heimes, last changed 2021-10-21 14:47 by erlendaasland.

Pull Requests
URL Status Linked Edit
PR 29035 open christian.heimes, 2021-10-18 15:57
Messages (15)
msg403813 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-10-13 09:55
Currently all APIs related to Py_buffer are excluded from the limited API. It's neither possible to use Py_buffer from a C extension with limited API nor is it possible to define heap types with buffer support using the stable ABI.

The lack of Py_buffer support prevents prominent projects like NumPy or Pillow to use the limited API and produce abi3 binary wheel. To be fair it's not the only reason why these projects have not adopted the stable abi3 yet. Still Py_buffer is a necessary but not sufficient condition. Stable abi3 support would enable NumPy stack to build binary wheels that work with any Python version >= 3.11, < 4.0.

The limited API excludes any C API that references Py_buffer:

- 8 PyBuffer_*() functions
- 21 PyBUF_* constants
- PyMemoryView_FromBuffer()
- PyObject_GetBuffer
- Py_bf_getbuffer / Py_bf_releasebuffer type slots for PyBufferProcs

It should not be terribly complicated to add Py_buffer to the stable API. All it takes are an opaque struct definition of Py_buffer, an allocate function, a free function and a bunch of getters and setters. The hard part is to figure out which getters and setters are needed and how many struct members must be exposed by getters and setters. I recommend to get feedback from NumPy, Pillow, and Cython devs first.


typedef struct bufferinfo Py_buffer;

/* allocate a new Py_buffer object on the heap and initialize all members to NULL / 0 */
    Py_buffer *view = PyMem_Calloc(1, sizeof(Py_buffer));
    if (view == NULL) {
    return view;

/* convenience function */
PyBuffer_NewEx(PyObject *obj, void *buf,  Py_ssize_t len, Py_ssize_t itemsize,
               int readonly, int ndim, char *format, Py_ssize_t *shape, Py_ssize_t *strides,
               Py_ssize_t *suboffsets, void *internal)

/* release and free buffer */
PyBuffer_Free(Py_buffer *view)
    if (view != NULL) {
msg403820 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-13 10:28
Py_buffer.shape requires a Py_ssize_t* pointer. It's not convenient. For example, the array module uses:

static int
array_buffer_getbuf(arrayobject *self, Py_buffer *view, int flags)
    if ((flags & PyBUF_ND)==PyBUF_ND) {
        view->shape = &((PyVarObject*)self)->ob_size;
    return 0;

This code is not compatible with a fully opaque PyObject structure:
msg403822 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-13 11:16
shape is a pointer to array of Py_ssize_t of size ndim. array and memoryview do a trick to avoid memory allocation, but _testbuffer.ndarray allocates it dynamically in the heap. We can add a small static buffer in Py_buffer to avoid additional memory allocation in common cases.
msg403825 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-10-13 11:49
IIRC shape, strides, and suboffsets are all arrays of ndims length.

We could optimize allocation if we would require users to specify the value of ndims and don't allow them to change the value afterwards. PyBuffer_New(int ndims) then would allocate view of size sizeof(Py_buffer) + (3 * ndims * sizeof(Py_ssize_t *)). This would give us sufficient space to memcpy() shape, strides, and suboffsets arguments into memory after the Py_buffer struct.
msg403828 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-13 12:11
ndim is not known before calling PyObject_GetBuffer(), so we will need a new API which combines PyObject_GetBuffer() and PyBuffer_New().
msg404201 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-10-18 17:18
CC Antoine for his expertise of the buffer protocol

Opaque Py_Buffer and PyObject structs will require a different approach and prevent some optimizations. The consumer will have to malloc() a Py_buffer struct on the heap. In non-trivial cases the producer (exporter) may have to malloc() another blob and store it in Py_buffer.internal [1]. I'm not particularly worried about the performance of malloc here.

msg404274 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-19 08:23
Py_buffer is often used for handling arguments if the function supports bytes, bytearray and other bytes-like objects. For example bytes.partition(). Any additional memory allocation would add significant overhead here. bytes.join() creates Py_buffer for every item, it would be a deoptimization if it would need to allocate them all separately.

We should allow to allocate Py_buffer on stack. Currently it has too complex structure and we cannot guarantee its stability (although there were no changes for years). I propose to split Py_buffer on transparent and opaque parts and standardize the transparent structure. It should include: obj, buf, len, possible flags (to distinguish read-only from writeable) and a pointer to opaque data. For bytes, bytearray, BytesIO, mmap and most other classes the pointer to opaque data is NULL. For array and memoryview objects the opaque data could be embedded into the object.
msg404294 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-10-19 13:09
> I recommend to get feedback from NumPy, Pillow, and Cython devs first.

Could you split this into two PRs: one to add the new API, and another to  add things to the limited set?

There's no rush to add it to the limited API, esp. since it can't be tested with the intended consumers yet. And it's a hassle to remove mistakes from the limited API, even in alphas/betas.

To try this out I suggest making the struct opaque when something like `_PyBuffer_IS_OPAQUE` is #defined. Then you can let the compiler tell you what needs to change in those projects :)
msg404377 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-19 23:18
Maybe a PEP is needed to collect usages of the Py_buffer API and check if the ABI is future proof. A PEP may help to discuss with other projects which currently consume this API.

I suggest to start with the smallest possible API and then slowly extend it. It's too easy to make mistakes :-( Once it's added to the stable ABI, it will be really hard to change it.

For example, PyBuffer.format is a "char*", but who owns the string? For a stable ABI, I would suggest to duplicate the string.

For shape, stripes and suboffsets arrays, I would also suggest to allocate these arrays on the heap people to ensure that it cannot be modified from the outside.

In your PR, PyBuffer_GetLayout() gives indirectly access to the internal Py_buffer structure members and allows to modify them. One way is to avoid this issue is to return a *copy* of these arrays.

I would prefer to require to call "Set" functions to modify a Py_buffer to ensure that a buffer always remains consistency.

> PyBuffer_NewEx(PyObject *obj, void *buf,  Py_ssize_t len, Py_ssize_t itemsize, int readonly, int ndim, char *format, Py_ssize_t *shape, Py_ssize_t *strides, Py_ssize_t *suboffsets, void *internal)

This API looks like PyCode_New() which was broken *often* so it looks like a bad pattern for a stable ABI.

Maybe PyBuffer_New() + many Set() functions would be more future proof.

But I don't know which Py_buffer members are mandatory to have a "valid" buffer.

What if tomorrow we add new members. Will it be possible to initalize them to a reasonable default value?
msg404400 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-10-20 08:19
All memory is owned by the exporter object. The exporter (aka producer) is the Python type that implements Py_bf_getbuffer and Py_bf_releasebuffer. In majority of cases the exporter doesn't have to set shape, strides, and suboffsets. They are used in special cases, e.g. multidimensional arrays with custom layout. For example they can be used to convert TIFF images from strides big endian format to a NumPy array in little endian format.

It's up to the exporter's Py_bf_getbuffer to decide how it fills shape, strides, and suboffsets, too. For example an exporter could allocate format, shape, strides, and suboffsets on the heap and assign pointers in its getbuffer function and store a hint in the ``internal`` field of Py_buffer. Its releasebuffer function then checks ``internal`` field and performs de-allocations. We must not copy fields. This would break the API.

It would be a bad idea to return copies in PyBuffer_GetLayout(). Consumers have to get the layout every time they access a specific item in the buffer in order to calculate the offset. I'd rather define the arguments as "const". The documentation already states that e.g. "The shape array is read-only for the consumer.".

It is highly unlikely that we will ever have to extend the Py_buffer interface. It is already extremely versatile and can encode complex formats. You can even express the layout of a TIFF image of float32 CMYK in planar configuration (one array of 32bit floats cyan, followed by an array of magenta, then an array of yellow, and finally an array of contrast).

PS: I have removed PyBuffer_NewEx() function. It did not make sense any sense.
msg404402 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-10-20 08:30
A consumer will use the APIs:

Py_buffer *view;
int ndim;
const char *format;
const Py_ssize_t *shape, *strides, *suboffsets;
void *buf;

view = PyBuffer_New();
PyObject_GetBuffer(obj, view, flags);
ndim = PyBuffer_GetLayout(&format, &shape, &strides, &suboffsets);
buf = PyBuffer_GetPointer(view, [...]);
PyBuffer_Free(view); // also calls PyBuffer_Release()

The API functions PyBuffer_FillInfo(), PyBuffer_FillInfoEx(), and PyBuffer_GetInternal() are for exporters (producers)-only. The exporter uses the PyBuffer_FillInfo*() in its Py_bf_getbuffer function to fill the view. It may use PyBuffer_GetInternal() in its Py_bf_releasebuffer function to access the internal field and to release additional resources.
msg404403 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-20 08:44
I do not like requirement to allocate Py_buffer on the heap. It adds an overhead. Common case in CPython code is:

Py_buffer view;
void *buf;
Py_ssize_t len;

PyObject_GetBuffer(obj, &view, PyBUF_SIMPLE);
buf = view.buf;
len = view.len;
// no other fields are used

And I want to keep it as simple and efficient as it can be.
msg404404 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-10-20 08:47
CPython internals can still use allocation on the stack. Only stable ABI extensions have to use allocation on the heap.
msg404409 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-20 09:42
That would be an unfair advantage. If we want people to use the limited API we should not make it much slower than the non-limited API.
msg404418 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-10-20 10:31
No. Limited API is generally not as performant as the non-limited one. It is even documented this way:

We should not make it *much* slower, but code that can take advantage of implementation details can always be faster. Max speed should not be a concern in the limited API.
Date User Action Args
2021-10-21 14:47:35erlendaaslandsetnosy: + erlendaasland
2021-10-20 10:31:07petr.viktorinsetmessages: + msg404418
2021-10-20 09:42:35serhiy.storchakasetmessages: + msg404409
2021-10-20 08:47:54christian.heimessetmessages: + msg404404
2021-10-20 08:44:09serhiy.storchakasetmessages: + msg404403
2021-10-20 08:30:16christian.heimessetmessages: + msg404402
2021-10-20 08:19:21christian.heimessetmessages: + msg404400
2021-10-19 23:18:17vstinnersetmessages: + msg404377
2021-10-19 13:09:37petr.viktorinsetmessages: + msg404294
2021-10-19 08:23:16serhiy.storchakasetnosy: + skrah
messages: + msg404274
2021-10-18 17:18:41christian.heimessetnosy: + pitrou
messages: + msg404201
2021-10-18 15:57:29christian.heimessetkeywords: + patch
stage: patch review
pull_requests: + pull_request27306
2021-10-13 12:11:25serhiy.storchakasetmessages: + msg403828
2021-10-13 11:49:14christian.heimessetmessages: + msg403825
2021-10-13 11:16:02serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg403822
2021-10-13 10:28:48vstinnersetmessages: + msg403820
2021-10-13 09:55:13christian.heimescreate