This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyMemoryView_FromBuffer() behavior change (possible regression)
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.3, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: skrah Nosy List: Alexander.Belopolsky, belopolsky, docs@python, jcea, ncoghlan, pitrou, skrah
Priority: normal Keywords: patch

Created on 2012-08-30 09:53 by skrah, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
_testbuffer.diff belopolsky, 2012-08-30 17:25 review
test_indirect.py belopolsky, 2012-08-30 17:30
issue15821.diff skrah, 2012-08-31 18:34 review
Messages (15)
msg169443 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-30 09:53
People are using PyMemoryView_FromBuffer() to create and return
permanent memoryviews from buffers that are allocated on the stack.
See:

http://stackoverflow.com/questions/8123121/how-to-get-back-a-valid-object-from-python-in-c-while-this-object-has-been-con


The docs should perhaps warn against this.
msg169452 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-30 13:39
This may be a bigger problem (grep for image_surface_get_data):

http://lists.cairographics.org/archives/cairo/2011-December/022563.html


The previous semantics for PyMemoryView_FromBuffer(view) were:

  1) If non-NULL, steal the view.obj reference with automatic
     decrement in PyBuffer_Release().

  2) Copy shape and strides to view.smalltable.



The new semantics for PyMemoryView_FromBuffer(view) are:

  1) If non-NULL, treat view.obj as a borrowed reference.

  2) Rely on the fact that shape and strides won't disappear.


2) means that returning PyMemoryView_FromBuffer(view) from
a view allocated on the stack is no longer safe. In all instances
people could migrate to PyMemoryView_FromMemory(), which is
both safe and more convenient, but code may be broken.

PyManaged_Buffer is designed to expect buffers from exporters,
where it is guaranteed that the buffers won't disappear,so it
won't be easy to keep backwards compatibility.
msg169460 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2012-08-30 14:36
Having been bitten by an indirect buffer bug in 2.7, I decided to write some tests for 3.3.  I added an objview() function to _testbuffer module that creates an indirect view for nested tuples.  I have not written unit tests yet, so I'll attach a demo script test_indirect.py.

I hope this test case will be helpful in figuring out the right semantics for PyMemoryView_FromBuffer().  My feeling is that memoryview should be able to store shape/strides/suboffsets arrays in its own structure.  I also think the only way to achieve that is to make it a PyVarObject and allocate up to sizeof(Py_ssize_t)*ndim*3 bytes after the "view" substructure to store copies of shape/strides/suboffsets arrays.
msg169464 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-30 14:57
PyMemoryViewObject already is a PyVarObject with its own shape, strides
and suboffsets. It is the PyManagedBuffer object that directly communicates
with exporters.

The decision to store *exactly* the buffer that is obtained from the
exporter was made in #10181.

Your test case seems to pass here. :) Was it supposed to crash?
msg169466 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2012-08-30 15:07
> Your test case seems to pass here. :) Was it supposed to crash?

No, I worked real hard to make it pass. :-)  I think it would crash 2.7 and 3.2, but I have not tried.  I also suspect it leaks memory.

Do you think this is something that we should include in the test suit.  If so, I will probably open a separate issue and transform test_indirect.py into a patch for test_buffer.
msg169467 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2012-08-30 15:15
> PyMemoryViewObject already is a PyVarObject with its own shape,
> strides and suboffsets.

You are right.  I was mistakenly looking at 3.2.3 sources. It looks like there are a lot of undocumented changes here.
msg169471 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-30 17:22
On second thought, it's impossible to crash a memoryview generated
by PyMemoryView_FromBuffer(info) even when info->shape etc. point
to stack addresses.

The reason is this: All new memoryviews are created through the mbuf_add*
API. In the first call to mbuf_add_view(mbuf, NULL) the stack addresses
are still valid, and the private shape, strides and suboffsets of the
first memoryview that is registered with the managed buffer are
initialized correctly.

Now PyMemoryView_FromBuffer() returns and the managed buffer itself
contains invalid stack addresses. But these are never used again!

Chained memoryviews are generated from existing memoryviews, and
they are registered with the managed buffer by calling
mbuf_add_view(mbuf, existing_view->view).

mbuf_add_incomplete_view(mbuf, NULL, ndim) does not access shape, strides
and suboffsets.

If info->format points to a stack address, it would crash both the
old and new implementations. I'd regard such a use of info->format
silly.

I've never considered this use of PyMemoryView_FromBuffer(info) legitimate,
but it works in the new implementation. If we officially endorse this
pattern, then info->format should probably also be copied.

So the topic is reduced to:

  1) Previous: If non-NULL, steal the view.obj reference with automatic
               decrement in PyBuffer_Release().
     New:      If non-NULL, treat view.obj as a borrowed reference.

  2) Copy info->format?
msg169472 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2012-08-30 17:30
I've updated test_indirect.py to print all memoryview fields and cleaned up the _testbuffer patch a little.  Note this code that is needed to prevent a memory leak:

+    /* PyMemoryView_FromBuffer ignores info.obj.  Add it explicitely. */
+    if (view != NULL)
+      ((PyMemoryViewObject *)view)->view.obj = info.obj;

Shouldn't  PyMemoryView_FromBuffer() copy non-NULL obj from Py_buffer?
msg169473 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2012-08-30 17:38
On Thu, Aug 30, 2012 at 1:22 PM, Stefan Krah <report@bugs.python.org> wrote:
> So the topic is reduced to:
>
>   1) Previous: If non-NULL, steal the view.obj reference with automatic
>                decrement in PyBuffer_Release().
>      New:      If non-NULL, treat view.obj as a borrowed reference.
>
>   2) Copy info->format?

+1 to both ideas.
msg169574 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-31 18:34
Alexander, your test case is brilliant and could eventually go into
_testbuffer.c, but I'd prefer a simpler test case for now. :)

Here's a patch that restores the info.obj cleanup facility.

I intentionally did not add copying the format in order to keep the
diff small (in case this can go into rc2). Crashing on stack addresses
for format is not a regression.


Apparently PyMemoryView_FromBuffer() is used for low level tinkering and
automatic cleanup, so I now think that it's better not to break backwards
compatibility, i.e. leave the value of info.obj untouched.

Otherwise people using the cleanup facility will have reference leaks.


However, I don't like the interface at all: Passing a pointer to a struct
and then relying on the fact that *some* values are copied (shape, strides
and suboffsets) but not others (format, buf) is counterintuitive.


With the ManagedBuffer, we could now write a new function that returns
a memoryview with much nicer cleanup facilities. But that doesn't help
here since we're stuck with this function for 3.3.0.


What do you think? Should this go into 3.3.0? The changes to memoryobject.c
are minimal, the rest are tests and docs.
msg169575 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2012-08-31 18:53
On Fri, Aug 31, 2012 at 2:34 PM, Stefan Krah <report@bugs.python.org> wrote:
> With the ManagedBuffer, we could now write a new function that returns
> a memoryview with much nicer cleanup facilities. But that doesn't help
> here since we're stuck with this function for 3.3.0.
>
>
> What do you think? Should this go into 3.3.0?

I am still getting up to speed with all the changes that went in since
3.2.  I'll review your patch over the weekend.  Meanwhile, I think the
goal should be that after PyMemoryview_FromBuffer(info) is called, it
should be OK to discard info by calling PyBuffer_Release() and if info
is not allocated on the stack, Py_DECREF(info).  I think we are almost
there achieving that goal with possible exception of dynamically or
stack-allocated fmt strings.
msg169576 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-31 19:12
Alexander Belopolsky <report@bugs.python.org> wrote:
> I am still getting up to speed with all the changes that went in since
> 3.2.  I'll review your patch over the weekend.  Meanwhile, I think the
> goal should be that after PyMemoryview_FromBuffer(info) is called, it
> should be OK to discard info by calling PyBuffer_Release()

Now I'm puzzled: I thought your goal was to preserve the implicit cleanup
from 3.2, i.e. PyBuffer_Release() is called when the managed buffer is
deallocated.

Without the patch it's OK to call PyBuffer_Release(info) after
PyMemoryview_FromBuffer(info). With the patch you can't call
PyBuffer_Release(info), since it's done automatically already.
msg169582 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2012-08-31 20:40
On Fri, Aug 31, 2012 at 3:12 PM, Stefan Krah <report@bugs.python.org> wrote:
> Now I'm puzzled: I thought your goal was to preserve the implicit cleanup
> from 3.2, i.e. PyBuffer_Release() is called when the managed buffer is
> deallocated.
>

The issue that I raised in msg169472 above was that
PyMemoryView_FromBuffer() would not copy .obj from Py_buffer structure
to the memoryview.  A related issue is that it looks like
PyObject_GetBuffer() often does not fill .obj either.   I would expect
that PyObject_GetBuffer() would always store a new reference in .obj
to assure that the .buf pointer remains valid until PyBuffer_Release()
is called explicitly.  (I am ignoring the issue of mutable objects
such as lists for the moment.)   PyMemoryView_FromBuffer() in turn
should store an additional reference in its own private copy of
Py_buffer structure.  After PyMemoryView_FromBuffer() returns a
well-behaved program should call PyBuffer_Release() releasing the
first reference and the second reference should be released in
memoryview destructor.  Am I missing something?
msg169583 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2012-08-31 21:06
Here is what I think the test case should look like (untested):


static PyObject *
memoryview_from_buffer_cleanup(PyObject *self, PyObject *noargs)
{
    PyObject *b, *view = NULL;
    Py_buffer info;
    Py_ssize_t shape[3] = {2, 2, 3};
    Py_ssize_t strides[3] = {6, 3, 1};
    const char *cp = "abcdefghijkl";

    b = PyBytes_FromString(cp);
    if (b == NULL)
        return NULL;

    if (PyObject_GetBuffer(b, &info, PyBUF_FULL_RO) < 0)
        goto done;
    /* reshape */
    info.ndim = 3;
    info.shape = shape;
    info.strides = strides;

    view = PyMemoryView_FromBuffer(&info);
    /* release resources allocated for Py_buffer struct 
       before it goes out of scope */
    PyBuffer_Release(&info);
 done:
    Py_DECREF(b);
    return view;
}
msg169613 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 07:33
In general, the number of calls to PyObject_GetBuffer() must be equal to
the number of calls to PyBuffer_Release(), otherwise bad things will happen
in the same way as with malloc()/free().


Now, in my test case I omitted the call to PyObject_GetBuffer() *with*
the internal knowledge that in the case of the bytes object it does
not matter: The releasebufferproc is NULL, so PyBuffer_Release()
is just a way of decrementing the reference count of the bytes object.


No matter how you turn it, if info.obj != NULL you need some internal
knowledge what the obj's releasebufferproc will do. For example, I
suspect that many existing releasebufferprocs do not account for
the fact that a consumer may change shape, strides and suboffsets.

So you need to know the exporter really well.


My strategy here is this: I want to keep backwards compatibility
for PyMemoryView_FromBuffer(), which I think should be deprecated
at some point.


A proper function that returns a memoryview with automatic cleanup
would not involve getbuffer/releasebuffer at all and could look like
this:

Flags:
------
    PyManagedBuffer_FreeBuf
    PyManagedBuffer_FreeFormat
    PyManagedBuffer_FreeArrays  // shape, strides, suboffsets

PyMemoryView_FromBufferWithCleanup(info, flags)
{
    /* info should never have been obtained by a call to PyObject_GetBuffer().
       Hence it should never be released. */
    assert(info.obj == NULL);

    /* buf, format, shape, strides and suboffsets MUST stay valid
       for the lifetime of the returned memoryview. Usually they
       will be dynamically allocated using PyMem_Malloc(). This is
       no problem, since there is automatic cleanup in mbuf_dealloc(). */

    mbuf = mbuf_alloc();
    mbuf->master = *info;
    mbuf->flags |= flags;

    ...

    return new memoryview
}

mbuf_dealloc(self)
{
    if (self->flags&PyManagedBuffer_FreeBuf)
        PyMem_Free(self->master.buf);
    ...
}

This is much more flexible than the existing function and does not suffer
from any abuse of getbuffer/releasebuffer.
History
Date User Action Args
2022-04-11 14:57:35adminsetgithub: 60025
2012-09-28 15:17:24jceasetnosy: + jcea
2012-09-01 07:33:41skrahsetmessages: + msg169613
2012-08-31 21:06:09belopolskysetmessages: + msg169583
2012-08-31 20:40:45Alexander.Belopolskysetmessages: + msg169582
2012-08-31 19:12:11skrahsetmessages: + msg169576
2012-08-31 18:53:58Alexander.Belopolskysetmessages: + msg169575
2012-08-31 18:34:42skrahsetfiles: + issue15821.diff

assignee: docs@python -> skrah
components: + Interpreter Core, - Documentation
title: Improve docs for PyMemoryView_FromBuffer() -> PyMemoryView_FromBuffer() behavior change (possible regression)
type: enhancement -> behavior
messages: + msg169574
stage: patch review
2012-08-30 17:39:45belopolskysetfiles: - test_indirect.py
2012-08-30 17:39:36belopolskysetfiles: - _testbuffer.diff
2012-08-30 17:38:21Alexander.Belopolskysetnosy: + Alexander.Belopolsky

messages: + msg169473
title: Improve docs for PyMemoryView_FromBuffer() -> Improve docs for PyMemoryView_FromBuffer()
2012-08-30 17:30:02belopolskysetfiles: + test_indirect.py

messages: + msg169472
2012-08-30 17:25:55belopolskysetfiles: + _testbuffer.diff
2012-08-30 17:23:11skrahsetpriority: high -> normal
2012-08-30 17:22:36skrahsetmessages: + msg169471
2012-08-30 15:15:47belopolskysetmessages: + msg169467
2012-08-30 15:07:26belopolskysetmessages: + msg169466
2012-08-30 14:57:20skrahsetmessages: + msg169464
2012-08-30 14:36:35belopolskysetfiles: + test_indirect.py
2012-08-30 14:36:12belopolskysetfiles: + _testbuffer.diff
keywords: + patch
messages: + msg169460
2012-08-30 13:39:04skrahsetpriority: normal -> high

messages: + msg169452
2012-08-30 09:53:08skrahcreate