Title: Meaning and clarification of PyBuffer_Release()
Type: behavior Stage: resolved
Components: C API Versions: Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5, Python 2.7
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Eric Wieser, mark.dickinson, seberg
Priority: normal Keywords:

Created on 2020-01-27 22:14 by seberg, last changed 2020-04-22 17:52 by mark.dickinson. This issue is now closed.

Messages (4)
msg360809 - (view) Author: Sebastian Berg (seberg) * Date: 2020-01-27 22:14
The current documentation of ``PyBuffer_Release()`` and the PEP is a bit fuzzy about what the function can and cannot do.

When an object exposes the buffer interface, I believe it should always return a `view` (in NumPy speak) of its own data, i.e. the data exposed by the object is also owned by it directly.

On the other hand the buffer view _itself_ has fields such as `strides`, etc. which may need allocating.

In other words, I think `PyBuffer_Release()` should be documented to deallocate/invalidate the `Py_buffer`. But, it *must not* invalidate the actual memory it points to. If I copy all information out of the `Py_buffer` and then free it, the copy must still be valid.

I think this is the intention, but it is not spelled out clear enough, it is also the reason for the behaviour of the "#s", etc. keyword argument parsers failing due to the code:

    if (pb != NULL && pb->bf_releasebuffer != NULL) {
        *errmsg = "read-only bytes-like object";
        return -1;

which in turn currently means NumPy decides to _not_ implement bf_releasebuffer at all (leading to very ugly work arounds).

I am happy to make a PR, if we can get to a point where everyone is absolutely certain that the above interpretation was always correct, we could clean up a lot of code inside NumPy as well!
msg360814 - (view) Author: Sebastian Berg (seberg) * Date: 2020-01-27 22:28
Hmmm, it seems I had missed this chunk of PEP 3118 before:

> Exporters will need to define a bf_releasebuffer function if they can re-allocate their memory, strides, shape, suboffsets, or format variables which they might share through the struct bufferinfo.

Which reads like the opposite of what I would like to see, unfortunately. Which I guess means that the parsing issue should be addressed differently. Opinions still welcome, though.
msg360818 - (view) Author: Sebastian Berg (seberg) * Date: 2020-01-27 22:39
I went through Python, `array` seems to not break the logic. pickling has a comment which specifically wants to run into the argument parsing corner case above (I am not sure that it is really important). However, `Modules/_testbuffer.c` (which is just test) actually does allocate new memory for the buffer!

I can be convinced that this is how it is supposed to be, but right now I am still not quite. The main idea is efficient no-copy data sharing. It is always possible to create a `to_memoryview()` method if a realloc was desired/necessary...
msg367032 - (view) Author: Sebastian Berg (seberg) * Date: 2020-04-22 17:40
Ok, I will just close it. It is painfully clear that e.g. `mmap` uses it this way to prohibit closing, and also `memoryview` has all the machinery necessary to do counting of how many exports, etc. exists.

I admit, this still rubs me the wrong way, and I think it means that we may need to check `bf_releasebuffer != NULL` also in NumPy.

We still have the issue of not being able to use `releasebuffer` easily in NumPy. But there is nothing to be done, unless we can consider limiting the `"#s"`, etc. type argparsing, which is likely not an option.
Date User Action Args
2020-04-22 17:52:40mark.dickinsonsetnosy: + mark.dickinson
2020-04-22 17:40:06sebergsetstatus: open -> closed

messages: + msg367032
stage: resolved
2020-01-27 23:00:47Eric Wiesersetnosy: + Eric Wieser
2020-01-27 22:39:18sebergsetmessages: + msg360818
2020-01-27 22:28:22sebergsetmessages: + msg360814
2020-01-27 22:14:19sebergcreate