Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with Py_buffer management in memoryobject.c (and elsewhere?) #54390

Closed
rupole mannequin opened this issue Oct 23, 2010 · 147 comments
Closed

Problems with Py_buffer management in memoryobject.c (and elsewhere?) #54390

rupole mannequin opened this issue Oct 23, 2010 · 147 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@rupole
Copy link
Mannequin

rupole mannequin commented Oct 23, 2010

BPO 10181
Nosy @loewis, @pfmoore, @mdickinson, @ncoghlan, @pitrou, @kristjanvalur, @vstinner, @pv, @skrah, @akheron
Files
  • pybuffer.diff
  • buffer-interface-clarify.patch: Clarifying the behavior of the buffer interface (for discussion, probably shouldn't be applied as-is)
  • bbe70ca4e0e5.diff: PyManagedBuffer implementation
  • 718801740bde.diff: PyManagedBuffer implementation Rename README to README.rst and enhance formatting #2
  • 78fb1181787a.diff: Add Modules/_testbuffer.c and Lib/test/test_buffer.py
  • 70a8ccd53ade.diff: Fixed suboffsets representation in _testbuffer.c
  • 51cedae5acfc.diff: PyManagedBuffer, native format, complete 1D slicing, casts
  • 4492afe05a07.diff: Handle objects with an index() method.
  • 8dd9f0ea4876.diff
  • a5c4a96dc981.diff
  • issue10181-news.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-03-01.08:32:50.822>
    created_at = <Date 2010-10-23.18:22:37.367>
    labels = ['interpreter-core', 'type-crash']
    title = 'Problems with Py_buffer management in memoryobject.c (and elsewhere?)'
    updated_at = <Date 2012-04-09.08:15:56.593>
    user = 'https://bugs.python.org/rupole'

    bugs.python.org fields:

    activity = <Date 2012-04-09.08:15:56.593>
    actor = 'joshtriplett'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-03-01.08:32:50.822>
    closer = 'skrah'
    components = ['Interpreter Core']
    creation = <Date 2010-10-23.18:22:37.367>
    creator = 'rupole'
    dependencies = []
    files = ['20282', '20756', '22564', '22577', '22873', '22909', '23119', '23185', '24323', '24381', '24629']
    hgrepos = ['36']
    issue_num = 10181
    keywords = ['patch']
    message_count = 147.0
    messages = ['119460', '120267', '120268', '120270', '120280', '120281', '120297', '120318', '120319', '120321', '120322', '125296', '125297', '125327', '125328', '125332', '125418', '125424', '125431', '125433', '125462', '125499', '125502', '125507', '125531', '125582', '125618', '125630', '125631', '125635', '125640', '125641', '128475', '128477', '128491', '128495', '128499', '128503', '128505', '128506', '128508', '128514', '128516', '128517', '128518', '128525', '128526', '128527', '128528', '128537', '128538', '128539', '128540', '128542', '128544', '128546', '128551', '128878', '139162', '139172', '139173', '139256', '139257', '139266', '139314', '139316', '139317', '139320', '139323', '139325', '139341', '139747', '139748', '139751', '139758', '139760', '139762', '139764', '139766', '139767', '139775', '139826', '139834', '139846', '139856', '139859', '139861', '139865', '139866', '139868', '139870', '139871', '139872', '139873', '139874', '139875', '139877', '139878', '139883', '139919', '139920', '139921', '139932', '139933', '139935', '139936', '141858', '142191', '143727', '143730', '144234', '151360', '151445', '151447', '151537', '151538', '152086', '152090', '152091', '152124', '152145', '152161', '152173', '152174', '152175', '152188', '152196', '152197', '152206', '152208', '152223', '152245', '152404', '153870', '154138', '154141', '154240', '154323', '154327', '154616', '154651', '154665', '154680', '154764', '154937', '157834', '157835']
    nosy_count = 15.0
    nosy_names = ['loewis', 'teoliphant', 'paul.moore', 'mark.dickinson', 'ncoghlan', 'rupole', 'pitrou', 'kristjan.jonsson', 'vstinner', 'joshtriplett', 'pv', 'skrah', 'python-dev', 'jcon', 'petri.lehtinen']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue10181'
    versions = ['Python 3.3']

    @rupole
    Copy link
    Mannequin Author

    rupole mannequin commented Oct 23, 2010

    There are a number of places in memoryobject.c where get_shape0 is used without the return value being checked. If it fails, this leads to hanging exceptions and crashes.

    @rupole rupole mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 23, 2010
    @pitrou
    Copy link
    Member

    pitrou commented Nov 2, 2010

    Yes, there are probably many holes like this. I've done what I could to make the simple cases (builtin types) to work, but the spec is rotten from the start. Blame the numpy people, sorry.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Nov 2, 2010

    As far as I know, PEP-3118 serves its purpose of allowing extension modules like numpy and PIL to share data without needing to copy it around all the time. It's just that memoryview wasn't really part of that purpose (since all the affected third party libraries have their own objects for looking at memory), and it shows :P

    Our near term goal should be to get memoryview in a good place for handling 1D contiguous data and throw exceptions (rather than crashing) for everything else. I think Antoine has achieved the former, but it sounds like there is still some work to do on the latter.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 2, 2010

    As far as I know, PEP-3118 serves its purpose of allowing extension
    modules like numpy and PIL to share data without needing to copy it
    around all the time. It's just that memoryview wasn't really part of
    that purpose (since all the affected third party libraries have their
    own objects for looking at memory), and it shows :P

    The deeper issue is that there are no ownership or lifetime rules for
    dynamically allocated fields in Py_buffer (such as strides and shape),
    which itself isn't a PyObject. For very simple objects (such as bytes or
    bytearray) it isn't really a problem, but it is in more complicated
    situations; and there's no possible solution without a clear spec..

    (and I'm not even talking of the issue of making slices of memoryviews,
    since the buffer API itself doesn't handle slicing, which means the
    memoryview object has to modify its inner Py_buffer...)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Nov 2, 2010

    Read the "Releasing the buffer" section in PEP-3118 again. Unless I'm misunderstanding you completely, the rules you're asking for are already in place: those fields are entirely the responsibility of the exporting object, and it needs to ensure they remain valid until the buffer is released.

    Now, it may be that we haven't *implemented* this correctly or consistently in the builtin objects, in which case we should do something about it. But the rules are there.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 2, 2010

    Read the "Releasing the buffer" section in PEP-3118 again. Unless I'm
    misunderstanding you completely, the rules you're asking for are
    already in place: those fields are entirely the responsibility of the
    exporting object, and it needs to ensure they remain valid until the
    buffer is released.

    In practice, though, we copy Py_buffer structs around and there's no way
    for the original object to know about that. That's the fundamental
    difference with a PyObject where you would only increase a refcount
    instead of copying the structure's contents.

    @rupole
    Copy link
    Mannequin Author

    rupole mannequin commented Nov 3, 2010

    The culprit wrt copying Py_buffer structs seems mainly to be dup_buffer, which is called in memory_getbuf. This seems unnecessary in the case where there's an underlying object and it has returned the view thru its own tp_as_buffer. The underlying object at that point is solely responsible for releasing the buffer, so memory_getbuf shouldn't mess with it at all. In the case where there is no underlying object (mainly thru PyMemoryView_FromBuffer), it probably should allocate any memory in the view in such a way that it can be freed in memory_releasebuf when the view->obj is NULL.

    @rupole
    Copy link
    Mannequin Author

    rupole mannequin commented Nov 3, 2010

    While on the subject, the docs for PyMemoryView_FromBuffer state that the resulting memoryview takes ownership of the Py_buffer struct and is responsible for freeing any associated memory. It does not do so, which is not surprising. The absence of a standard for allocation makes it absolutely impossible for it to do so safely.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Nov 3, 2010

    Hmm, if we're ever creating a new copy of a Py_buffer without calling GetBuffer again on the original object, then *that* is a bug (exactly comparable to copying a PyObject pointer without incrementing the reference count - it's OK if you can guarantee the lifecycle of your borrowed pointer is shorter than the lifecycle of the original, but disallowed otherwise).

    This still doesn't sound like a problem with the spec to me, it sounds like a problem with the implementation strategy that was originally used when integrating the spec into the interpeter core (which I readily agree received far less attention than the spec itself did).

    It really sounds like we should be creating a _Py_ManagedBuffer internal object, with each instance containing a PyObject* and a Py_buffer instance. We don't need to alter PEP-3118 to do that - such a change is entirely consumer side, so the objects providing the buffers don't need to care how the lifecycle of the Py_buffer struct is managed.

    When we need another reference to the buffer, we can then just increment the refcount of the _Py_ManagedBuffer instance rather than having to call GetBuffer again on the original object.

    The contents of Py_buffer objects that have been passed to GetBuffer really shouldn't be getting modified at all - for those cases, we should maintain *two* Py_buffer structs, one (unmodified) with the original results of the GetBuffer call, and a second owned by the caller (for manipulation).

    @ncoghlan ncoghlan changed the title get_shape0 in memoryobject.c not checked for error return Problems with Py_buffer management in memoryobject.c (and elsewhere?) Nov 3, 2010
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Nov 3, 2010

    It doesn't help that neither PEP-3118 nor the Py_buffer docs mention the "obj" member of the Py_buffer struct that refers back to the original object providing the buffer - that's fairly fundamental to understanding how PyBuffer_Release and PyMemoryView_FromBuffer can work even in theory.

    (Given that, an additional _PyManagedBuffer object shouldn't be necessary - MemoryView just needs to call ReleaseBuffer and GetBuffer at the appropriate times)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Nov 3, 2010

    So the overall to-do list here is sounding like:

    1. Document "Py_buffer->obj" appropriately in both the C API documentation and in PEP-3118

    2. Ensure GetBuffer/ReleaseBuffer are used as the moral equivalent of INCREF/DECREF for Py_buffer objects.

    3. Check builtin types correctly manage the lifecycle of all elements of their exported Py_buffer structs

    4. Check return values and throw exceptions as noted in the original post

    @mdickinson
    Copy link
    Member

    It might be instructive to look at how NumPy itself manages sharing of ndarray data and ownership of the corresponding structs. I meant to find time to look at this over the break.

    @mdickinson
    Copy link
    Member

    BTW, I agree that it looks as though significant changes might be needed. I wonder whether it would make sense to exclude the Py_buffer struct fro m the Stable ABI PEP for now.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 4, 2011

    Agreed. I'll put something on python-dev about that, so MvL sees it.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 4, 2011

    More direct - added MvL to nosy list.

    Martin, we would like to exclude Py_buffer from the stable ABI for Python 3.2, until we have a chance to thrash out the missing pieces of the documentation for 3.3. I *think* it is a documentation problem, but until we're certain, it seems safer to leave it out.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 4, 2011

    From Lenard Lindstrom in bpo-10001 (closed as dupe of this issue):

    The ~Py_buffer.obj field is undocumented. Yet memoryview, that acts as a wrapper, includes the field in gc traversal. Also, if the field is not initialized by bf_getbuffer its value can be indeterminate. For memoryview the gc can crash (see attached C demo program).

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 5, 2011

    Added Travis to nosy list - even if he doesn't have time to finish this off himself, hopefully he can point us in the right direction.

    @mdickinson
    Copy link
    Member

    Nick, it sounds as though you have an idea of how you think things should be working here---perhaps you can help me out. I'd really like to understand what the implementation should look like from the POV of a 3rd party module that defines some object exporting the buffer interface.

    Here's a specific scenario I'd like to understand: module foo defines a type Foo that implements the buffer protocol. For simplicity, suppose it's exporting 1-dim buffers. When I do:

    >> from foo import Foo
    >> foo_object = Foo()
    >> m = memoryview(foo_object)
    >> n = m[::2] # take a slice of m
    >> del m # delete the objects, in whichever order.
    >> del n

    what's the sequence of getbuffer and releasebuffer calls that foo_object should expect to see?

    Q1. Does foo get 2 requests to getbuffer (and 2 to releasebuffer), or just one each? I'm assuming at least that getbuffer and releasebuffer calls should be paired.

    Q2. For each pair of getbuffer/releasebuffer calls, should the 'view' parameter passed into releasebuffer be identical to that provided to getbuffer? Or is it acceptable for the actual Py_buffer* pointers to be distinct, but the pointed-to Py_buffers to be exact copies. (The existence of the smalltable field complicates the notion of an exact copy a little bit, but I think that's a detail that can be ignored for these questions.)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 5, 2011

    PEP-3118 makes it clear that the underlying object should see *two* pairs of calls to the buffer methods:
    http://www.python.org/dev/peps/pep-3118/#the-py-buffer-struct

    Even if we ignore the undocumented "obj" field, the target object needs to ensure the exported buffer remains valid as long as any memory views are referencing it. The only way to do that is to treat GetBuffer/ReleaseBuffer as the moral equivalent of INCREF/DECREF.

    However, I believe the current memoryview implementation does the wrong thing and only calls them once, and then duplicates the Py_buffer structures without ever going back to the original objects (that opinion was based on a quick scan of the code a while back, but it would fit with the uncomplimentary sentiments Antoine has expressed in trying to get all this to exhibit some kind of consistency)

    For point 2, it must be the same pointer. When the PEP says "the same", I agree it could be taken as ambiguous, but the later reference to the exporter managing a linked-list of exported views makes it clear that identity is what matters.

    As far as I can see, some of things in the PEP were found to be a PITA in practice (such as every consumer of the API having to implement the equivalent of the "base" attribute in the original memoryview design), so Travis changed them. Unfortunately, those changes never made it back into the protocol documentation, leading to the current confusion.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 5, 2011

    However, I believe the current memoryview implementation does the
    wrong thing and only calls them once, and then duplicates the
    Py_buffer structures without ever going back to the original objects
    (that opinion was based on a quick scan of the code a while back, but
    it would fit with the uncomplimentary sentiments Antoine has expressed
    in trying to get all this to exhibit some kind of consistency)

    Actually, and unless I made a mistake, it does call them twice.
    (and does the slicing by hand afterwards, which explains part of the
    hilarity with smalltable and friends :-))

    For point 2, it must be the same pointer. When the PEP says "the
    same", I agree it could be taken as ambiguous, but the later reference
    to the exporter managing a linked-list of exported views makes it
    clear that identity is what matters.

    The common idiom (including in code not written by me :-)) is currently
    to use Py_buffer variables allocated on the C stack.

    Also, we have the C API function PyMemoryView_FromBuffer() which
    basically mandates that Py_buffer structs can be copied around. And it's
    a very useful function since it allows to create a memoryview from a
    chunk of anonymous memory.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jan 5, 2011

    Martin, we would like to exclude Py_buffer from the stable ABI for
    Python 3.2, until we have a chance to thrash out the missing pieces
    of the documentation for 3.3. I *think* it is a documentation
    problem, but until we're certain, it seems safer to leave it out.

    Fine with me. Attached is a patch; it would be good if someone could
    confirm that this exactly comprises the API that should be hidden.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 6, 2011

    It's OK if the Py_buffer is on the stack - it's just a unique identifier for the exporter to use as a key, not something the exporter controls the lifecycle of (the latter is true only for the pointers *inside* the struct, such as buf, shape, strides, etc).

    PyMemoryView_FromBuffer should be calling PyObject_Getbuffer on the view->obj member (it's one of the things that embedding the reference allows, just as it allowed removal of the separate obj argument from the PyObject_ReleaseBuffer signature). That way the source object knows there is now a *second* Py_buffer struct kicking around, and can decide whether to re-use the same internal pointers or create new ones.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 6, 2011

    Martin's patch for the PEP-384 adjustments looks good to me.

    A note in the PEP that we've deliberately excluded this on a temporary basis due to the implementation/documentation mismatch and expect to have it back in for 3.3 might be helpful, too.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2011

    PyMemoryView_FromBuffer should be calling PyObject_Getbuffer on the
    view->obj member (it's one of the things that embedding the reference
    allows, just as it allowed removal of the separate obj argument from
    the PyObject_ReleaseBuffer signature).

    One use case is anonymous memory, i.e. view->obj == NULL.

    @mdickinson
    Copy link
    Member

    [Nick]

    For point 2, it must be the same pointer. When the PEP says "the same", > I agree it could be taken as ambiguous, but the later reference to the
    exporter managing a linked-list of exported views makes it clear that
    identity is what matters.

    Okay, thanks. Next point of confusion: does that mean that in the example I gave, the object 'n' (the memoryview slice) needs to know about *two* distinct Py_buffer structs---one to pass back to releasebuffer and one to store its own information? It seems to me that these can't be the same, since (a) the Py_buffer for the slice will have different shape and stride from that returned by getbuffer, (b) what's passed back to releasebuffer should be identical to what came from getbuffer, so we can't just modify the shape and stride information in place.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jan 6, 2011

    Removal of the buffer interface from the ABI has been committed as r87805, r87806, r87805, r87808, and r87809.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 27, 2012

    I've added some comments on Rietveld. This time, I pasted the email
    addresses manually into the CC field, apparently without success
    (I didn't receive mail).

    Regarding the use of _testbuffer in the docs: I agree that it's strange,
    on the other hand it illustrates several examples much better (length of
    a multi-dimensional view, result of cast to ND, etc).

    So I'm not sure what to do. Of course I'll take out the examples if you
    really don't like it.

    @ncoghlan
    Copy link
    Contributor

    Radical suggestion: make it public as collections.simple_ndarray?

    (prefixing with simple_ to be explicit that this is not even *close* to being the all-singing, all-dancing NumPy.ndarray)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 28, 2012

    I've been busy fixing the memoryview.h issues first. Could you take a
    look at:

    http://hg.python.org/features/pep-3118/file/f020716704d4/Include/memoryobject.h

    Changes:

    a) Make all functions and the two buffer access macros part
    of the limited API again.

    b) Make the managed buffer API definitely private.

    c) Make PyBUF_MAX_NDIM=64 part of the official buffer API.

    I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc.,
    since the comment already says "... Don't access their fields directly.".

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 28, 2012

    Nick Coghlan <report@bugs.python.org> wrote:

    Radical suggestion: make it public as collections.simple_ndarray?

    Heh, that's a nice one. :)

    While I think that the code in _testbuffer.c is sound and very well tested,
    the API is low-level and optimized for testing all sorts of quirks.

    Examples: _testbuffer.ndarray has the questionable capability of changing
    views while buffers are exported. This is to test the pathological case
    that came up in the discussion.

    Then, similar to NumPy's ndarray, _testbuffer.ndarray constructs arrays
    from a flat list. This is a bit clumsy but quite suitable for testing.

    NumPy's ndarray is also the low level API. I think most people use
    the high level array:

    >>> a = array([[1,2], [3,4]], dtype="L")
    >>> a
    array([[1, 2],
           [3, 4]], dtype=uint64)

    So it would take some effort to polish the API.

    Meanwhile, to eliminate the use of _testbuffer in the documentation, I think
    the following might work:

    I've just added complete support for displaying multi-dimensional lists and
    multi-dimensional comparisons to memoryobject.c in my private repo.

    The code is quite succinct and follows exactly the same pattern as
    copy_base/copy_rec.

    Then, in the documentation we can use cast() and memoryview.tolist(),
    but add a comment that cast() is just used for demonstration purposes.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 28, 2012

    Nick Coghlan <report@bugs.python.org> wrote:
    > Radical suggestion: make it public as collections.simple_ndarray?

    Heh, that's a nice one. :)

    Well, the question is: does it have a point? Is this ndarray useful
    outside of any third-party infrastructure such as the APIs offered by
    numpy?

    You might answer that we already have array.array but ISTM that
    array.array is quite useless in general :)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 28, 2012

    Antoine Pitrou <report@bugs.python.org> wrote:

    > Nick Coghlan <report@bugs.python.org> wrote:
    > > Radical suggestion: make it public as collections.simple_ndarray?
    >
    > Heh, that's a nice one. :)

    Well, the question is: does it have a point? Is this ndarray useful
    outside of any third-party infrastructure such as the APIs offered by
    numpy?

    I think it would be mainly educational. It's easier to understand
    the whole (multi-dimensional) point of PEP-3118 when there's a
    concrete object that you can play with. Of course serious users
    would go straight to NumPy.

    The other issue is that's it's slightly strange to have a fully
    featured memoryview with no object in the stdlib that can utilize
    all features (at least the testing side is now complete).

    Anyway, right now I don't know myself if I'm for or against it. :)

    @pitrou
    Copy link
    Member

    pitrou commented Jan 28, 2012

    a) Make all functions and the two buffer access macros part
    of the limited API again.

    Hmm, I don't think buffer access macros should be part of the limited
    API. If they are truely important (which I doubt), we should have
    equivalent functions for them.

    I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc.,
    since the comment already says "... Don't access their fields directly.".

    My question is whether there is any point in making these flags. Does
    3rd-party code want to manipulate memoryview internals, instead of
    querying the Py_buffer?

    (and of course the memoryview object is becoming more and more like
    another Py_buffer :-))

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 28, 2012

    Antoine Pitrou <report@bugs.python.org> wrote:

    > a) Make all functions and the two buffer access macros part
    > of the limited API again.

    Hmm, I don't think buffer access macros should be part of the limited
    API. If they are truely important (which I doubt), we should have
    equivalent functions for them.

    I thought the whole Py_buffer API was only temporarily removed from the
    limited API until the issues were sorted out:

    http://bugs.python.org/issue10181#msg125462

    For instance, here the macros are not excluded:

    http://hg.python.org/cpython/file/3292cc222d5c/Include/memoryobject.h

    Since the issues seem resolved in general, I thought it was time to
    restore the previous state. [I just noticed that I didn't revert all
    of Martin's changes, so xxlimited currently does not build in the
    pep-3118 repo.]

    As for the two macros specifically, I know Pauli was using them:

    http://bugs.python.org/issue10181#msg139775

    > I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc.,
    > since the comment already says "... Don't access their fields directly.".

    My question is whether there is any point in making these flags. Does
    3rd-party code want to manipulate memoryview internals, instead of
    querying the Py_buffer?

    The flags are primarily for the memoryview itself to avoid repeated calls
    to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER
    to get the view and also needs to determine the contiguity type, that
    code could also use the flags.

    Pauli: If you are still following the issue, do you think having access
    to contiguity flags is useful for 3rd-party code or not?

    @pitrou
    Copy link
    Member

    pitrou commented Jan 28, 2012

    Antoine Pitrou <report@bugs.python.org> wrote:
    > > a) Make all functions and the two buffer access macros part
    > > of the limited API again.
    >
    > Hmm, I don't think buffer access macros should be part of the limited
    > API. If they are truely important (which I doubt), we should have
    > equivalent functions for them.

    I thought the whole Py_buffer API was only temporarily removed from the
    limited API until the issues were sorted out:

    http://bugs.python.org/issue10181#msg125462

    I'm talking about the memoryview access macros. It's like
    PyList_GET_SIZE, it isn't part of the limited API and you have to use
    PyList_GetSize instead.

    The limited API promises binary (ABI) compatibility accross releases.
    It's a very strong promise to make and we must be careful not to put too
    much stuff in the limited API. I don't think the rule for inclusion is
    "we should put everything useful in the limited API".

    For instance, here the macros are not excluded:

    http://hg.python.org/cpython/file/3292cc222d5c/Include/memoryobject.h

    It might be a mistake, then.

    As for the two macros specifically, I know Pauli was using them:

    http://bugs.python.org/issue10181#msg139775

    Well, Pauli might use them, but it just means his code isn't compatible
    with the limited API, then.

    The flags are primarily for the memoryview itself to avoid repeated calls
    to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER
    to get the view and also needs to determine the contiguity type, that
    code could also use the flags.

    But why would 3rd-party code use PyMemoryView_GET_BUFFER instead of, for
    example, PyObject_GetBuffer? You seldom have APIs which *expect* a
    memoryview, I think. Instead, they would expect a buffer-compatible
    object.

    @ncoghlan
    Copy link
    Contributor

    I'm with Antoine here - we want to be *very* conservative with what we expose through the limited API. Due to the ABI compatibility promise, anything exposed that way is very hard to change.

    Keeping things out of the limited API isn't really an issue - it just means that anyone that wants to use them needs to rebuild their extensions for each new version of the C API (just as they do now).

    Addings things to the stable ABI later is easy though - that's why we had Martin take all the Py_buffer related stuff out of the initial version of the limited API.

    @ncoghlan
    Copy link
    Contributor

    Also, +1 to replacing _testbuffer with .cast() to demonstrate the multi-dimensional support. Without an actual multi-dimensional array object in the standard library, demonstrating those aspects is always going to be a little tricky. However, it's worth doing so that the likes of NumPy and PIL don't need to explain those interactions.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 29, 2012

    Antoine Pitrou <report@bugs.python.org> wrote:

    > I thought the whole Py_buffer API was only temporarily removed from the
    > limited API until the issues were sorted out:
    >
    > http://bugs.python.org/issue10181#msg125462

    I'm talking about the memoryview access macros. It's like
    PyList_GET_SIZE, it isn't part of the limited API and you have to use
    PyList_GetSize instead.

    You're right: I presumed that the macros were excluded temporarily
    when in fact that had already happened in 62b61abd02b8.

    > The flags are primarily for the memoryview itself to avoid repeated calls
    > to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER
    > to get the view and also needs to determine the contiguity type, that
    > code could also use the flags.

    But why would 3rd-party code use PyMemoryView_GET_BUFFER instead of, for
    example, PyObject_GetBuffer? You seldom have APIs which *expect* a
    memoryview, I think. Instead, they would expect a buffer-compatible
    object.

    That's a good question. It looks to me that the macro was present as
    PyMemoryView() initially. You renamed it in bpo-3560 (with Guido's approval),
    and later PyMemoryView_GET_BUFFER appeared in the docs.

    I think 3rd-party code uses the macros mainly because they are
    present and, in the case of PyMemoryView_GET_BUFFER, documented.
    In most situations PyObject_GetBuffer() could be used indeed.

    Most use cases I can think of would also involve having access to
    the managed buffer API. As an example, here's a technique that is
    similar to what goes on in PyMemoryView_GetContiguous():

    Suppose you have an initialized bytes object that you want to
    wrap as a multi-dimensional exporter. Then:

    • Base the memoryview on the bytes object and keep exactly one
      reference to it.

    • Adjust the shape, strides etc. to get the structure you want.

    • Return the view: You now have a fully compliant exporter that
      only needs a single Py_DECREF() to disappear and do all cleanup.

    Of course this could also be exposed as a function, e.g.:

       /* stealing a reference to bytes */
       PyMemoryView_FromBytesAndInfo(PyObject *bytes, Py_buffer *info);

    So let's make the flags private. What do you prefer?

    1. Leave them in memoryview.h, but with a leading underscore:

      _Py_MEMORYVIEW_C
      [...]

    2. Move them to memoryobject.c, with a leading underscore.

    3. Move them to memoryobject.c, without a leading underscore (I find
      this more readable).

    4. Move them to memoryobject.c as MV_C, MV_FORTRAN, etc.

    Also, I'll add a note to the docs that PyMemoryView_GET_BUFFER can
    probably be avoided by using PyObject_GetBuffer() directly.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 29, 2012

    Of course this could also be exposed as a function, e.g.:

    /* stealing a reference to bytes */
    PyMemoryView_FromBytesAndInfo(PyObject *bytes, Py_buffer *info);

    I think we should minimize the number of reference-stealing functions.

    So let's make the flags private. What do you prefer?

    I don't really mind, whatever you think is best :)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 31, 2012

    I've uploaded a new patch that should address the remaining issues:

    o In the documentation _testbuffer has been replaced by
    m.cast() + the now multi-dimensional m.tolist().

    o I restored the state of the limited API. If we want
    to include Py_buffer again, I think this should be done
    in a separate patch.

    o Flags of the memoryview object are private.

    Additionally, because NumPy allows non-aligned array accesses,
    I changed the packing functions to use memcpy for multi-byte types.

    On x86/amd64 gcc is smart enough to produce almost exactly the same
    asm output as before, with a slowdown of 0-1%, depending on the
    benchmark.

    On other platforms the situation might be worse, but I don't have
    access to real hardware where alignment actually matters.

    @ncoghlan
    Copy link
    Contributor

    Latest version looks good to me - I vote for landing it whenever you're ready :)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Feb 24, 2012

    Thanks, Nick. I'll try to get it done this weekend.

    I've uploaded Misc/NEWS and Doc/whatsnew/3.3.rst (my apologies to Antoine
    for plagiarizing the first sentence, I found it hard to come up with a
    better version).

    I wasn't sure whether to put the "whatsnew" entry into the section
    "Other Language Changes" or into the PEP section. I chose the latter,
    since many new features have been added that are part of the PEP.

    @ncoghlan
    Copy link
    Contributor

    PEP section makes sense - I plan to mark PEP-3118 as Final once you commit this (or you can do that yourself, for that matter).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 25, 2012

    New changeset 3f9b3b6f7ff0 by Stefan Krah in branch 'default':

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Feb 26, 2012

    Nick Coghlan <report@bugs.python.org> wrote:

    PEP section makes sense - I plan to mark PEP-3118 as Final once you commit
    this (or you can do that yourself, for that matter).

    Array features are complete except for multi-dimensional indexing and slicing.
    I think it would be nice to add those in a separate issue; it's not a lot of
    additional code and it doesn't interfere with the current code.

    The overall array handling scheme is sound. Life would be a bit easier
    if contiguity flags were a mandatory part of the Py_buffer structure
    that the exporter has to fill in.

    Then there is an open issue (bpo-3132) for expanding the struct module syntax,
    where the wording in some sections of the PEP led to a bit of head-scratching. :)

    In another issue (bpo-13072) the question came up whether the proposed 'u' and 'w'
    formats still make sense after PEP-393 (I think they do, they should map to
    UCS-2 and UCS-4).

    We need to decide what to do about 2.7 and 3.2. It's pretty difficult by
    now to separate the bug fixes from the features. I could follow the example
    of CPU manufacturers: start with the whole feature set and disable as much
    as possible.

    Another problem for 2.7 and 3.2 is that the 'B' format would still need to
    accept bytes instead of ints. Or can we change that as a bug fix?

    @ncoghlan
    Copy link
    Contributor

    Aw, I guess I was too optimistic in thinking this was the last gap before we could declare it Final...

    +1 on creating a new feature request for NumPy style multi-dimensional indexing and slicing support on memoryview (I'm assuming that's what you meant).

    As far as your last question goes, I'm not sure we *can* salvage this in 2.7/3.2. We do have the option of throwing our hands up in the air and saying "Sorry, we couldn't figure out how to fix it without completely rewriting it. Take a look at 3.3 if you want to see how it's *supposed* to work."

    I hesitate to suggest this (since I'm not volunteering to do the work) but perhaps it would be worth packaging up the 3.3. memoryview and publishing it on PyPI for the benefit of 3.2 and 2.7?

    May also be worth bringing up on python-dev to see if anyone else has any bright ideas. Myself, I'm not seeing any real choices beyond "won't fix", "backport 3.3 version and remove/disable the new features", "backport 3.3 version, new features and all" and "publish a full featured version on PyPI".

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Feb 29, 2012

    I'm busy adding the C-API changes to the docs. Regarding the stable ABI:

    The general mood was to *keep* the removal of the buffer interface
    for some time, did I get that right?

    In that case this removal (especially of the Py_buffer struct) should
    be documented also for 3.2.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 29, 2012

    New changeset 9d3c7dd55c7f by Stefan Krah in branch 'default':
    Issue bpo-10181: Add warning that structure layouts in memoryobject.h and
    http://hg.python.org/cpython/rev/9d3c7dd55c7f

    @ncoghlan
    Copy link
    Contributor

    For 3.2, there's no removal to document - we asked MvL to drop the buffer APIs from PEP-384, so they've never been part of the stable ABI.

    From the PEP: "The buffer interface (type Py_buffer, type slots bf_getbuffer and bf_releasebuffer, etc) has been omitted from the ABI, since the stability of the Py_buffer structure is not clear at this time. Inclusion in the ABI can be considered in future releases."

    I agree we shouldn't be hasty in making even the updated buffer interface implementation officially part of the stable ABI. Better to leave it out for 3.3, get some real world feedback on the latest version, then commit to stability for 3.4+.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Mar 1, 2012

    From the PEP: "The buffer interface (type Py_buffer, type slots bf_getbuffer
    and bf_releasebuffer, etc) has been omitted from the ABI, since the stability
    of the Py_buffer structure is not clear at this time. Inclusion in the ABI
    can be considered in future releases."

    Great, that's exactly what I was looking for. - As far as I can see the issue
    is finished, so I'm closing it. Thanks again everyone for all the help!

    @skrah skrah mannequin closed this as completed Mar 1, 2012
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Mar 2, 2012

    It occurs to me that one thing that *could* be backported to early versions are some of the documentation improvements on how to correctly handle the lifecycle of fields in Py_buffer. That gets messy though because memoryview doesn't behave nicely in those versions, so we'd be violating our own guidelines.

    Perhaps the relevant sections should be updated with a reference saying that the semantics have been cleaned up in 3.3, and if in doubt, try to follow the 3.3 documentation?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 5, 2012

    New changeset 373f6cdc6d08 by Stefan Krah in branch 'default':
    Issue bpo-10181: The decision was to raise a buffer error in memory_exit()
    http://hg.python.org/cpython/rev/373f6cdc6d08

    @joshtriplett
    Copy link
    Mannequin

    joshtriplett mannequin commented Apr 9, 2012

    I currently use Python 2.7, and I'd like to make use of memoryview. Specifically, I work on BITS (http://biosbits.org/), which runs Python in ring 0 as part of GRUB, and I'd like to use memoryview to give Python access to data in physical memory. I ran into several of the problems documented here when trying to do so. I'd really love to see a backport of this fixed version into 2.7.

    @joshtriplett
    Copy link
    Mannequin

    joshtriplett mannequin commented Apr 9, 2012

    I currently use Python 2.7, and I'd like to make use of memoryview. Specifically, I work on BITS (http://biosbits.org/), which runs Python in ring 0 as part of GRUB, and I'd like to use memoryview to give Python access to data in physical memory. I ran into several of the problems documented here when trying to do so. I'd really love to see a backport of this fixed version into 2.7.

    More specifically, the primary functionality that I'd like to use exists in 3.3 as PyMemoryView_FromMemory. I've tried to approximate that function using the available API in 2.7, but that led me here.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants