Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyMemoryView_FromBuffer() behavior change (possible regression) #60025

Open
skrah mannequin opened this issue Aug 30, 2012 · 15 comments
Open

PyMemoryView_FromBuffer() behavior change (possible regression) #60025

skrah mannequin opened this issue Aug 30, 2012 · 15 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@skrah
Copy link
Mannequin

skrah mannequin commented Aug 30, 2012

BPO 15821
Nosy @jcea, @ncoghlan, @abalkin, @pitrou, @skrah
Files
  • _testbuffer.diff
  • test_indirect.py
  • issue15821.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/skrah'
    closed_at = None
    created_at = <Date 2012-08-30.09:53:08.324>
    labels = ['interpreter-core', 'type-bug']
    title = 'PyMemoryView_FromBuffer() behavior change (possible regression)'
    updated_at = <Date 2012-09-28.15:17:24.951>
    user = 'https://github.com/skrah'

    bugs.python.org fields:

    activity = <Date 2012-09-28.15:17:24.951>
    actor = 'jcea'
    assignee = 'skrah'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2012-08-30.09:53:08.324>
    creator = 'skrah'
    dependencies = []
    files = ['27061', '27062', '27076']
    hgrepos = []
    issue_num = 15821
    keywords = ['patch']
    message_count = 15.0
    messages = ['169443', '169452', '169460', '169464', '169466', '169467', '169471', '169472', '169473', '169574', '169575', '169576', '169582', '169583', '169613']
    nosy_count = 7.0
    nosy_names = ['jcea', 'ncoghlan', 'belopolsky', 'pitrou', 'skrah', 'Alexander.Belopolsky', 'docs@python']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue15821'
    versions = ['Python 3.3', 'Python 3.4']

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Aug 30, 2012

    People are using PyMemoryView_FromBuffer() to create and return
    permanent memoryviews from buffers that are allocated on the stack.
    See:

    http://stackoverflow.com/questions/8123121/how-to-get-back-a-valid-object-from-python-in-c-while-this-object-has-been-con

    The docs should perhaps warn against this.

    @skrah skrah mannequin assigned docspython Aug 30, 2012
    @skrah skrah mannequin added type-feature A feature request or enhancement docs Documentation in the Doc dir labels Aug 30, 2012
    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Aug 30, 2012

    This may be a bigger problem (grep for image_surface_get_data):

    http://lists.cairographics.org/archives/cairo/2011-December/022563.html

    The previous semantics for PyMemoryView_FromBuffer(view) were:

    1. If non-NULL, steal the view.obj reference with automatic
      decrement in PyBuffer_Release().

    2. Copy shape and strides to view.smalltable.

    The new semantics for PyMemoryView_FromBuffer(view) are:

    1. If non-NULL, treat view.obj as a borrowed reference.

    2. Rely on the fact that shape and strides won't disappear.

    3. means that returning PyMemoryView_FromBuffer(view) from
      a view allocated on the stack is no longer safe. In all instances
      people could migrate to PyMemoryView_FromMemory(), which is
      both safe and more convenient, but code may be broken.

    PyManaged_Buffer is designed to expect buffers from exporters,
    where it is guaranteed that the buffers won't disappear,so it
    won't be easy to keep backwards compatibility.

    @abalkin
    Copy link
    Member

    abalkin commented Aug 30, 2012

    Having been bitten by an indirect buffer bug in 2.7, I decided to write some tests for 3.3. I added an objview() function to _testbuffer module that creates an indirect view for nested tuples. I have not written unit tests yet, so I'll attach a demo script test_indirect.py.

    I hope this test case will be helpful in figuring out the right semantics for PyMemoryView_FromBuffer(). My feeling is that memoryview should be able to store shape/strides/suboffsets arrays in its own structure. I also think the only way to achieve that is to make it a PyVarObject and allocate up to sizeof(Py_ssize_t)*ndim*3 bytes after the "view" substructure to store copies of shape/strides/suboffsets arrays.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Aug 30, 2012

    PyMemoryViewObject already is a PyVarObject with its own shape, strides
    and suboffsets. It is the PyManagedBuffer object that directly communicates
    with exporters.

    The decision to store *exactly* the buffer that is obtained from the
    exporter was made in bpo-10181.

    Your test case seems to pass here. :) Was it supposed to crash?

    @abalkin
    Copy link
    Member

    abalkin commented Aug 30, 2012

    Your test case seems to pass here. :) Was it supposed to crash?

    No, I worked real hard to make it pass. :-) I think it would crash 2.7 and 3.2, but I have not tried. I also suspect it leaks memory.

    Do you think this is something that we should include in the test suit. If so, I will probably open a separate issue and transform test_indirect.py into a patch for test_buffer.

    @abalkin
    Copy link
    Member

    abalkin commented Aug 30, 2012

    PyMemoryViewObject already is a PyVarObject with its own shape,
    strides and suboffsets.

    You are right. I was mistakenly looking at 3.2.3 sources. It looks like there are a lot of undocumented changes here.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Aug 30, 2012

    On second thought, it's impossible to crash a memoryview generated
    by PyMemoryView_FromBuffer(info) even when info->shape etc. point
    to stack addresses.

    The reason is this: All new memoryviews are created through the mbuf_add*
    API. In the first call to mbuf_add_view(mbuf, NULL) the stack addresses
    are still valid, and the private shape, strides and suboffsets of the
    first memoryview that is registered with the managed buffer are
    initialized correctly.

    Now PyMemoryView_FromBuffer() returns and the managed buffer itself
    contains invalid stack addresses. But these are never used again!

    Chained memoryviews are generated from existing memoryviews, and
    they are registered with the managed buffer by calling
    mbuf_add_view(mbuf, existing_view->view).

    mbuf_add_incomplete_view(mbuf, NULL, ndim) does not access shape, strides
    and suboffsets.

    If info->format points to a stack address, it would crash both the
    old and new implementations. I'd regard such a use of info->format
    silly.

    I've never considered this use of PyMemoryView_FromBuffer(info) legitimate,
    but it works in the new implementation. If we officially endorse this
    pattern, then info->format should probably also be copied.

    So the topic is reduced to:

    1. Previous: If non-NULL, steal the view.obj reference with automatic
      decrement in PyBuffer_Release().
      New: If non-NULL, treat view.obj as a borrowed reference.

    2. Copy info->format?

    @abalkin
    Copy link
    Member

    abalkin commented Aug 30, 2012

    I've updated test_indirect.py to print all memoryview fields and cleaned up the _testbuffer patch a little. Note this code that is needed to prevent a memory leak:

    + /* PyMemoryView_FromBuffer ignores info.obj. Add it explicitely. */
    + if (view != NULL)
    + ((PyMemoryViewObject *)view)->view.obj = info.obj;

    Shouldn't PyMemoryView_FromBuffer() copy non-NULL obj from Py_buffer?

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Aug 30, 2012

    On Thu, Aug 30, 2012 at 1:22 PM, Stefan Krah <report@bugs.python.org> wrote:

    So the topic is reduced to:

    1. Previous: If non-NULL, steal the view.obj reference with automatic
      decrement in PyBuffer_Release().
      New: If non-NULL, treat view.obj as a borrowed reference.

    2. Copy info->format?

    +1 to both ideas.

    @AlexanderBelopolsky AlexanderBelopolsky mannequin changed the title Improve docs for PyMemoryView_FromBuffer() Improve docs for PyMemoryView_FromBuffer() Aug 30, 2012
    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Aug 31, 2012

    Alexander, your test case is brilliant and could eventually go into
    _testbuffer.c, but I'd prefer a simpler test case for now. :)

    Here's a patch that restores the info.obj cleanup facility.

    I intentionally did not add copying the format in order to keep the
    diff small (in case this can go into rc2). Crashing on stack addresses
    for format is not a regression.

    Apparently PyMemoryView_FromBuffer() is used for low level tinkering and
    automatic cleanup, so I now think that it's better not to break backwards
    compatibility, i.e. leave the value of info.obj untouched.

    Otherwise people using the cleanup facility will have reference leaks.

    However, I don't like the interface at all: Passing a pointer to a struct
    and then relying on the fact that *some* values are copied (shape, strides
    and suboffsets) but not others (format, buf) is counterintuitive.

    With the ManagedBuffer, we could now write a new function that returns
    a memoryview with much nicer cleanup facilities. But that doesn't help
    here since we're stuck with this function for 3.3.0.

    What do you think? Should this go into 3.3.0? The changes to memoryobject.c
    are minimal, the rest are tests and docs.

    @skrah skrah mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed docs Documentation in the Doc dir labels Aug 31, 2012
    @skrah skrah mannequin changed the title Improve docs for PyMemoryView_FromBuffer() PyMemoryView_FromBuffer() behavior change (possible regression) Aug 31, 2012
    @skrah skrah mannequin assigned skrah and unassigned docspython Aug 31, 2012
    @skrah skrah mannequin added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Aug 31, 2012
    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Aug 31, 2012

    On Fri, Aug 31, 2012 at 2:34 PM, Stefan Krah <report@bugs.python.org> wrote:

    With the ManagedBuffer, we could now write a new function that returns
    a memoryview with much nicer cleanup facilities. But that doesn't help
    here since we're stuck with this function for 3.3.0.

    What do you think? Should this go into 3.3.0?

    I am still getting up to speed with all the changes that went in since
    3.2. I'll review your patch over the weekend. Meanwhile, I think the
    goal should be that after PyMemoryview_FromBuffer(info) is called, it
    should be OK to discard info by calling PyBuffer_Release() and if info
    is not allocated on the stack, Py_DECREF(info). I think we are almost
    there achieving that goal with possible exception of dynamically or
    stack-allocated fmt strings.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Aug 31, 2012

    Alexander Belopolsky <report@bugs.python.org> wrote:

    I am still getting up to speed with all the changes that went in since
    3.2. I'll review your patch over the weekend. Meanwhile, I think the
    goal should be that after PyMemoryview_FromBuffer(info) is called, it
    should be OK to discard info by calling PyBuffer_Release()

    Now I'm puzzled: I thought your goal was to preserve the implicit cleanup
    from 3.2, i.e. PyBuffer_Release() is called when the managed buffer is
    deallocated.

    Without the patch it's OK to call PyBuffer_Release(info) after
    PyMemoryview_FromBuffer(info). With the patch you can't call
    PyBuffer_Release(info), since it's done automatically already.

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Aug 31, 2012

    On Fri, Aug 31, 2012 at 3:12 PM, Stefan Krah <report@bugs.python.org> wrote:

    Now I'm puzzled: I thought your goal was to preserve the implicit cleanup
    from 3.2, i.e. PyBuffer_Release() is called when the managed buffer is
    deallocated.

    The issue that I raised in msg169472 above was that
    PyMemoryView_FromBuffer() would not copy .obj from Py_buffer structure
    to the memoryview. A related issue is that it looks like
    PyObject_GetBuffer() often does not fill .obj either. I would expect
    that PyObject_GetBuffer() would always store a new reference in .obj
    to assure that the .buf pointer remains valid until PyBuffer_Release()
    is called explicitly. (I am ignoring the issue of mutable objects
    such as lists for the moment.) PyMemoryView_FromBuffer() in turn
    should store an additional reference in its own private copy of
    Py_buffer structure. After PyMemoryView_FromBuffer() returns a
    well-behaved program should call PyBuffer_Release() releasing the
    first reference and the second reference should be released in
    memoryview destructor. Am I missing something?

    @abalkin
    Copy link
    Member

    abalkin commented Aug 31, 2012

    Here is what I think the test case should look like (untested):

    static PyObject *
    memoryview_from_buffer_cleanup(PyObject *self, PyObject *noargs)
    {
        PyObject *b, *view = NULL;
        Py_buffer info;
        Py_ssize_t shape[3] = {2, 2, 3};
        Py_ssize_t strides[3] = {6, 3, 1};
        const char *cp = "abcdefghijkl";
        b = PyBytes_FromString(cp);
        if (b == NULL)
            return NULL;
    if (PyObject_GetBuffer(b, &info, PyBUF_FULL_RO) < 0)
        goto done;
    /* reshape */
    info.ndim = 3;
    info.shape = shape;
    info.strides = strides;
    
        view = PyMemoryView_FromBuffer(&info);
        /* release resources allocated for Py_buffer struct 
           before it goes out of scope */
        PyBuffer_Release(&info);
     done:
        Py_DECREF(b);
        return view;
    }

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Sep 1, 2012

    In general, the number of calls to PyObject_GetBuffer() must be equal to
    the number of calls to PyBuffer_Release(), otherwise bad things will happen
    in the same way as with malloc()/free().

    Now, in my test case I omitted the call to PyObject_GetBuffer() *with*
    the internal knowledge that in the case of the bytes object it does
    not matter: The releasebufferproc is NULL, so PyBuffer_Release()
    is just a way of decrementing the reference count of the bytes object.

    No matter how you turn it, if info.obj != NULL you need some internal
    knowledge what the obj's releasebufferproc will do. For example, I
    suspect that many existing releasebufferprocs do not account for
    the fact that a consumer may change shape, strides and suboffsets.

    So you need to know the exporter really well.

    My strategy here is this: I want to keep backwards compatibility
    for PyMemoryView_FromBuffer(), which I think should be deprecated
    at some point.

    A proper function that returns a memoryview with automatic cleanup
    would not involve getbuffer/releasebuffer at all and could look like
    this:

    Flags:
    ------
    PyManagedBuffer_FreeBuf
    PyManagedBuffer_FreeFormat
    PyManagedBuffer_FreeArrays // shape, strides, suboffsets

    PyMemoryView_FromBufferWithCleanup(info, flags)
    {
        /* info should never have been obtained by a call to PyObject_GetBuffer().
           Hence it should never be released. */
        assert(info.obj == NULL);
    /* buf, format, shape, strides and suboffsets MUST stay valid
       for the lifetime of the returned memoryview. Usually they
       will be dynamically allocated using PyMem_Malloc(). This is
       no problem, since there is automatic cleanup in mbuf_dealloc(). */
    
        mbuf = mbuf_alloc();
        mbuf->master = *info;
        mbuf->flags |= flags;
    ...
    
        return new memoryview
    }
    mbuf_dealloc(self)
    {
        if (self->flags&PyManagedBuffer_FreeBuf)
            PyMem_Free(self->master.buf);
        ...
    }

    This is much more flexible than the existing function and does not suffer
    from any abuse of getbuffer/releasebuffer.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant