Problems with Py_buffer management in memoryobject.c (and elsewhere?) #54390

rupole · 2010-10-23T18:22:37Z

BPO	10181
Nosy	@loewis, @pfmoore, @mdickinson, @ncoghlan, @pitrou, @kristjanvalur, @vstinner, @pv, @skrah, @akheron
Files	pybuffer.diff buffer-interface-clarify.patch: Clarifying the behavior of the buffer interface (for discussion, probably shouldn't be applied as-is) bbe70ca4e0e5.diff: PyManagedBuffer implementation 718801740bde.diff: PyManagedBuffer implementation Rename README to README.rst and enhance formatting #2 78fb1181787a.diff: Add Modules/_testbuffer.c and Lib/test/test_buffer.py 70a8ccd53ade.diff: Fixed suboffsets representation in _testbuffer.c 51cedae5acfc.diff: PyManagedBuffer, native format, complete 1D slicing, casts 4492afe05a07.diff: Handle objects with an index() method. 8dd9f0ea4876.diff a5c4a96dc981.diff issue10181-news.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-03-01.08:32:50.822>
created_at = <Date 2010-10-23.18:22:37.367>
labels = ['interpreter-core', 'type-crash']
title = 'Problems with Py_buffer management in memoryobject.c (and elsewhere?)'
updated_at = <Date 2012-04-09.08:15:56.593>
user = 'https://bugs.python.org/rupole'

bugs.python.org fields:

activity = <Date 2012-04-09.08:15:56.593>
actor = 'joshtriplett'
assignee = 'none'
closed = True
closed_date = <Date 2012-03-01.08:32:50.822>
closer = 'skrah'
components = ['Interpreter Core']
creation = <Date 2010-10-23.18:22:37.367>
creator = 'rupole'
dependencies = []
files = ['20282', '20756', '22564', '22577', '22873', '22909', '23119', '23185', '24323', '24381', '24629']
hgrepos = ['36']
issue_num = 10181
keywords = ['patch']
message_count = 147.0
messages = ['119460', '120267', '120268', '120270', '120280', '120281', '120297', '120318', '120319', '120321', '120322', '125296', '125297', '125327', '125328', '125332', '125418', '125424', '125431', '125433', '125462', '125499', '125502', '125507', '125531', '125582', '125618', '125630', '125631', '125635', '125640', '125641', '128475', '128477', '128491', '128495', '128499', '128503', '128505', '128506', '128508', '128514', '128516', '128517', '128518', '128525', '128526', '128527', '128528', '128537', '128538', '128539', '128540', '128542', '128544', '128546', '128551', '128878', '139162', '139172', '139173', '139256', '139257', '139266', '139314', '139316', '139317', '139320', '139323', '139325', '139341', '139747', '139748', '139751', '139758', '139760', '139762', '139764', '139766', '139767', '139775', '139826', '139834', '139846', '139856', '139859', '139861', '139865', '139866', '139868', '139870', '139871', '139872', '139873', '139874', '139875', '139877', '139878', '139883', '139919', '139920', '139921', '139932', '139933', '139935', '139936', '141858', '142191', '143727', '143730', '144234', '151360', '151445', '151447', '151537', '151538', '152086', '152090', '152091', '152124', '152145', '152161', '152173', '152174', '152175', '152188', '152196', '152197', '152206', '152208', '152223', '152245', '152404', '153870', '154138', '154141', '154240', '154323', '154327', '154616', '154651', '154665', '154680', '154764', '154937', '157834', '157835']
nosy_count = 15.0
nosy_names = ['loewis', 'teoliphant', 'paul.moore', 'mark.dickinson', 'ncoghlan', 'rupole', 'pitrou', 'kristjan.jonsson', 'vstinner', 'joshtriplett', 'pv', 'skrah', 'python-dev', 'jcon', 'petri.lehtinen']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue10181'
versions = ['Python 3.3']

rupole · 2010-10-23T18:22:37Z

There are a number of places in memoryobject.c where get_shape0 is used without the return value being checked. If it fails, this leads to hanging exceptions and crashes.

pitrou · 2010-11-02T22:16:19Z

Yes, there are probably many holes like this. I've done what I could to make the simple cases (builtin types) to work, but the spec is rotten from the start. Blame the numpy people, sorry.

ncoghlan · 2010-11-02T22:30:01Z

As far as I know, PEP-3118 serves its purpose of allowing extension modules like numpy and PIL to share data without needing to copy it around all the time. It's just that memoryview wasn't really part of that purpose (since all the affected third party libraries have their own objects for looking at memory), and it shows :P

Our near term goal should be to get memoryview in a good place for handling 1D contiguous data and throw exceptions (rather than crashing) for everything else. I think Antoine has achieved the former, but it sounds like there is still some work to do on the latter.

pitrou · 2010-11-02T22:34:57Z

As far as I know, PEP-3118 serves its purpose of allowing extension
modules like numpy and PIL to share data without needing to copy it
around all the time. It's just that memoryview wasn't really part of
that purpose (since all the affected third party libraries have their
own objects for looking at memory), and it shows :P

The deeper issue is that there are no ownership or lifetime rules for
dynamically allocated fields in Py_buffer (such as strides and shape),
which itself isn't a PyObject. For very simple objects (such as bytes or
bytearray) it isn't really a problem, but it is in more complicated
situations; and there's no possible solution without a clear spec..

(and I'm not even talking of the issue of making slices of memoryviews,
since the buffer API itself doesn't handle slicing, which means the
memoryview object has to modify its inner Py_buffer...)

ncoghlan · 2010-11-02T22:57:06Z

Read the "Releasing the buffer" section in PEP-3118 again. Unless I'm misunderstanding you completely, the rules you're asking for are already in place: those fields are entirely the responsibility of the exporting object, and it needs to ensure they remain valid until the buffer is released.

Now, it may be that we haven't *implemented* this correctly or consistently in the builtin objects, in which case we should do something about it. But the rules are there.

pitrou · 2010-11-02T23:01:12Z

Read the "Releasing the buffer" section in PEP-3118 again. Unless I'm
misunderstanding you completely, the rules you're asking for are
already in place: those fields are entirely the responsibility of the
exporting object, and it needs to ensure they remain valid until the
buffer is released.

In practice, though, we copy Py_buffer structs around and there's no way
for the original object to know about that. That's the fundamental
difference with a PyObject where you would only increase a refcount
instead of copying the structure's contents.

rupole · 2010-11-03T01:17:30Z

The culprit wrt copying Py_buffer structs seems mainly to be dup_buffer, which is called in memory_getbuf. This seems unnecessary in the case where there's an underlying object and it has returned the view thru its own tp_as_buffer. The underlying object at that point is solely responsible for releasing the buffer, so memory_getbuf shouldn't mess with it at all. In the case where there is no underlying object (mainly thru PyMemoryView_FromBuffer), it probably should allocate any memory in the view in such a way that it can be freed in memory_releasebuf when the view->obj is NULL.

rupole · 2010-11-03T11:59:53Z

While on the subject, the docs for PyMemoryView_FromBuffer state that the resulting memoryview takes ownership of the Py_buffer struct and is responsible for freeing any associated memory. It does not do so, which is not surprising. The absence of a standard for allocation makes it absolutely impossible for it to do so safely.

ncoghlan · 2010-11-03T12:19:52Z

Hmm, if we're ever creating a new copy of a Py_buffer without calling GetBuffer again on the original object, then *that* is a bug (exactly comparable to copying a PyObject pointer without incrementing the reference count - it's OK if you can guarantee the lifecycle of your borrowed pointer is shorter than the lifecycle of the original, but disallowed otherwise).

This still doesn't sound like a problem with the spec to me, it sounds like a problem with the implementation strategy that was originally used when integrating the spec into the interpeter core (which I readily agree received far less attention than the spec itself did).

It really sounds like we should be creating a _Py_ManagedBuffer internal object, with each instance containing a PyObject* and a Py_buffer instance. We don't need to alter PEP-3118 to do that - such a change is entirely consumer side, so the objects providing the buffers don't need to care how the lifecycle of the Py_buffer struct is managed.

When we need another reference to the buffer, we can then just increment the refcount of the _Py_ManagedBuffer instance rather than having to call GetBuffer again on the original object.

The contents of Py_buffer objects that have been passed to GetBuffer really shouldn't be getting modified at all - for those cases, we should maintain *two* Py_buffer structs, one (unmodified) with the original results of the GetBuffer call, and a second owned by the caller (for manipulation).

ncoghlan · 2010-11-03T12:51:51Z

It doesn't help that neither PEP-3118 nor the Py_buffer docs mention the "obj" member of the Py_buffer struct that refers back to the original object providing the buffer - that's fairly fundamental to understanding how PyBuffer_Release and PyMemoryView_FromBuffer can work even in theory.

(Given that, an additional _PyManagedBuffer object shouldn't be necessary - MemoryView just needs to call ReleaseBuffer and GetBuffer at the appropriate times)

ncoghlan · 2010-11-03T13:00:32Z

So the overall to-do list here is sounding like:

Document "Py_buffer->obj" appropriately in both the C API documentation and in PEP-3118
Ensure GetBuffer/ReleaseBuffer are used as the moral equivalent of INCREF/DECREF for Py_buffer objects.
Check builtin types correctly manage the lifecycle of all elements of their exported Py_buffer structs
Check return values and throw exceptions as noted in the original post

mdickinson · 2011-01-04T08:18:56Z

It might be instructive to look at how NumPy itself manages sharing of ndarray data and ownership of the corresponding structs. I meant to find time to look at this over the break.

mdickinson · 2011-01-04T08:22:38Z

BTW, I agree that it looks as though significant changes might be needed. I wonder whether it would make sense to exclude the Py_buffer struct fro m the Stable ABI PEP for now.

ncoghlan · 2011-01-04T14:49:16Z

Agreed. I'll put something on python-dev about that, so MvL sees it.

ncoghlan · 2011-01-04T14:54:41Z

More direct - added MvL to nosy list.

Martin, we would like to exclude Py_buffer from the stable ABI for Python 3.2, until we have a chance to thrash out the missing pieces of the documentation for 3.3. I *think* it is a documentation problem, but until we're certain, it seems safer to leave it out.

ncoghlan · 2011-01-04T14:58:15Z

From Lenard Lindstrom in bpo-10001 (closed as dupe of this issue):

The ~Py_buffer.obj field is undocumented. Yet memoryview, that acts as a wrapper, includes the field in gc traversal. Also, if the field is not initialized by bf_getbuffer its value can be indeterminate. For memoryview the gc can crash (see attached C demo program).

ncoghlan · 2011-01-05T12:25:06Z

Added Travis to nosy list - even if he doesn't have time to finish this off himself, hopefully he can point us in the right direction.

mdickinson · 2011-01-05T13:51:10Z

Nick, it sounds as though you have an idea of how you think things should be working here---perhaps you can help me out. I'd really like to understand what the implementation should look like from the POV of a 3rd party module that defines some object exporting the buffer interface.

Here's a specific scenario I'd like to understand: module foo defines a type Foo that implements the buffer protocol. For simplicity, suppose it's exporting 1-dim buffers. When I do:

>> from foo import Foo
>> foo_object = Foo()
>> m = memoryview(foo_object)
>> n = m[::2] # take a slice of m
>> del m # delete the objects, in whichever order.
>> del n

what's the sequence of getbuffer and releasebuffer calls that foo_object should expect to see?

Q1. Does foo get 2 requests to getbuffer (and 2 to releasebuffer), or just one each? I'm assuming at least that getbuffer and releasebuffer calls should be paired.

Q2. For each pair of getbuffer/releasebuffer calls, should the 'view' parameter passed into releasebuffer be identical to that provided to getbuffer? Or is it acceptable for the actual Py_buffer* pointers to be distinct, but the pointed-to Py_buffers to be exact copies. (The existence of the smalltable field complicates the notion of an exact copy a little bit, but I think that's a detail that can be ignored for these questions.)

ncoghlan · 2011-01-05T16:30:52Z

PEP-3118 makes it clear that the underlying object should see *two* pairs of calls to the buffer methods:
http://www.python.org/dev/peps/pep-3118/#the-py-buffer-struct

Even if we ignore the undocumented "obj" field, the target object needs to ensure the exported buffer remains valid as long as any memory views are referencing it. The only way to do that is to treat GetBuffer/ReleaseBuffer as the moral equivalent of INCREF/DECREF.

However, I believe the current memoryview implementation does the wrong thing and only calls them once, and then duplicates the Py_buffer structures without ever going back to the original objects (that opinion was based on a quick scan of the code a while back, but it would fit with the uncomplimentary sentiments Antoine has expressed in trying to get all this to exhibit some kind of consistency)

For point 2, it must be the same pointer. When the PEP says "the same", I agree it could be taken as ambiguous, but the later reference to the exporter managing a linked-list of exported views makes it clear that identity is what matters.

As far as I can see, some of things in the PEP were found to be a PITA in practice (such as every consumer of the API having to implement the equivalent of the "base" attribute in the original memoryview design), so Travis changed them. Unfortunately, those changes never made it back into the protocol documentation, leading to the current confusion.

pitrou · 2011-01-05T16:52:47Z

However, I believe the current memoryview implementation does the
wrong thing and only calls them once, and then duplicates the
Py_buffer structures without ever going back to the original objects
(that opinion was based on a quick scan of the code a while back, but
it would fit with the uncomplimentary sentiments Antoine has expressed
in trying to get all this to exhibit some kind of consistency)

Actually, and unless I made a mistake, it does call them twice.
(and does the slicing by hand afterwards, which explains part of the
hilarity with smalltable and friends :-))

For point 2, it must be the same pointer. When the PEP says "the
same", I agree it could be taken as ambiguous, but the later reference
to the exporter managing a linked-list of exported views makes it
clear that identity is what matters.

The common idiom (including in code not written by me :-)) is currently
to use Py_buffer variables allocated on the C stack.

Also, we have the C API function PyMemoryView_FromBuffer() which
basically mandates that Py_buffer structs can be copied around. And it's
a very useful function since it allows to create a memoryview from a
chunk of anonymous memory.

loewis · 2011-01-05T19:40:42Z

Martin, we would like to exclude Py_buffer from the stable ABI for
Python 3.2, until we have a chance to thrash out the missing pieces
of the documentation for 3.3. I *think* it is a documentation
problem, but until we're certain, it seems safer to leave it out.

Fine with me. Attached is a patch; it would be good if someone could
confirm that this exactly comprises the API that should be hidden.

ncoghlan · 2011-01-06T02:09:53Z

It's OK if the Py_buffer is on the stack - it's just a unique identifier for the exporter to use as a key, not something the exporter controls the lifecycle of (the latter is true only for the pointers *inside* the struct, such as buf, shape, strides, etc).

PyMemoryView_FromBuffer should be calling PyObject_Getbuffer on the view->obj member (it's one of the things that embedding the reference allows, just as it allowed removal of the separate obj argument from the PyObject_ReleaseBuffer signature). That way the source object knows there is now a *second* Py_buffer struct kicking around, and can decide whether to re-use the same internal pointers or create new ones.

ncoghlan · 2011-01-06T02:15:39Z

Martin's patch for the PEP-384 adjustments looks good to me.

A note in the PEP that we've deliberately excluded this on a temporary basis due to the implementation/documentation mismatch and expect to have it back in for 3.3 might be helpful, too.

pitrou · 2011-01-06T06:14:01Z

PyMemoryView_FromBuffer should be calling PyObject_Getbuffer on the
view->obj member (it's one of the things that embedding the reference
allows, just as it allowed removal of the separate obj argument from
the PyObject_ReleaseBuffer signature).

One use case is anonymous memory, i.e. view->obj == NULL.

mdickinson · 2011-01-06T09:20:27Z

[Nick]

For point 2, it must be the same pointer. When the PEP says "the same", > I agree it could be taken as ambiguous, but the later reference to the
exporter managing a linked-list of exported views makes it clear that
identity is what matters.

Okay, thanks. Next point of confusion: does that mean that in the example I gave, the object 'n' (the memoryview slice) needs to know about *two* distinct Py_buffer structs---one to pass back to releasebuffer and one to store its own information? It seems to me that these can't be the same, since (a) the Py_buffer for the slice will have different shape and stride from that returned by getbuffer, (b) what's passed back to releasebuffer should be identical to what came from getbuffer, so we can't just modify the shape and stride information in place.

loewis · 2011-01-06T19:35:25Z

Removal of the buffer interface from the ABI has been committed as r87805, r87806, r87805, r87808, and r87809.

skrah · 2012-01-27T21:42:26Z

I've added some comments on Rietveld. This time, I pasted the email
addresses manually into the CC field, apparently without success
(I didn't receive mail).

Regarding the use of _testbuffer in the docs: I agree that it's strange,
on the other hand it illustrates several examples much better (length of
a multi-dimensional view, result of cast to ND, etc).

So I'm not sure what to do. Of course I'll take out the examples if you
really don't like it.

ncoghlan · 2012-01-28T00:56:32Z

Radical suggestion: make it public as collections.simple_ndarray?

(prefixing with simple_ to be explicit that this is not even *close* to being the all-singing, all-dancing NumPy.ndarray)

skrah · 2012-01-28T14:13:15Z

I've been busy fixing the memoryview.h issues first. Could you take a
look at:

http://hg.python.org/features/pep-3118/file/f020716704d4/Include/memoryobject.h

Changes:

a) Make all functions and the two buffer access macros part
of the limited API again.

b) Make the managed buffer API definitely private.

c) Make PyBUF_MAX_NDIM=64 part of the official buffer API.

I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc.,
since the comment already says "... Don't access their fields directly.".

skrah · 2012-01-28T17:19:17Z

Nick Coghlan <report@bugs.python.org> wrote:

Radical suggestion: make it public as collections.simple_ndarray?

Heh, that's a nice one. :)

While I think that the code in _testbuffer.c is sound and very well tested,
the API is low-level and optimized for testing all sorts of quirks.

Examples: _testbuffer.ndarray has the questionable capability of changing
views while buffers are exported. This is to test the pathological case
that came up in the discussion.

Then, similar to NumPy's ndarray, _testbuffer.ndarray constructs arrays
from a flat list. This is a bit clumsy but quite suitable for testing.

NumPy's ndarray is also the low level API. I think most people use
the high level array:

>>> a = array([[1,2], [3,4]], dtype="L")
>>> a
array([[1, 2],
       [3, 4]], dtype=uint64)

So it would take some effort to polish the API.

Meanwhile, to eliminate the use of _testbuffer in the documentation, I think
the following might work:

I've just added complete support for displaying multi-dimensional lists and
multi-dimensional comparisons to memoryobject.c in my private repo.

The code is quite succinct and follows exactly the same pattern as
copy_base/copy_rec.

Then, in the documentation we can use cast() and memoryview.tolist(),
but add a comment that cast() is just used for demonstration purposes.

pitrou · 2012-01-28T17:29:12Z

Nick Coghlan <report@bugs.python.org> wrote:
> Radical suggestion: make it public as collections.simple_ndarray?

Heh, that's a nice one. :)

Well, the question is: does it have a point? Is this ndarray useful
outside of any third-party infrastructure such as the APIs offered by
numpy?

You might answer that we already have array.array but ISTM that
array.array is quite useless in general :)

skrah · 2012-01-28T17:51:37Z

Antoine Pitrou <report@bugs.python.org> wrote:

> Nick Coghlan <report@bugs.python.org> wrote:
> > Radical suggestion: make it public as collections.simple_ndarray?
>
> Heh, that's a nice one. :)

Well, the question is: does it have a point? Is this ndarray useful
outside of any third-party infrastructure such as the APIs offered by
numpy?

I think it would be mainly educational. It's easier to understand
the whole (multi-dimensional) point of PEP-3118 when there's a
concrete object that you can play with. Of course serious users
would go straight to NumPy.

The other issue is that's it's slightly strange to have a fully
featured memoryview with no object in the stdlib that can utilize
all features (at least the testing side is now complete).

Anyway, right now I don't know myself if I'm for or against it. :)

pitrou · 2012-01-28T20:59:27Z

a) Make all functions and the two buffer access macros part
of the limited API again.

Hmm, I don't think buffer access macros should be part of the limited
API. If they are truely important (which I doubt), we should have
equivalent functions for them.

I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc.,
since the comment already says "... Don't access their fields directly.".

My question is whether there is any point in making these flags. Does
3rd-party code want to manipulate memoryview internals, instead of
querying the Py_buffer?

(and of course the memoryview object is becoming more and more like
another Py_buffer :-))

skrah · 2012-01-28T22:44:50Z

Antoine Pitrou <report@bugs.python.org> wrote:

> a) Make all functions and the two buffer access macros part
> of the limited API again.

Hmm, I don't think buffer access macros should be part of the limited
API. If they are truely important (which I doubt), we should have
equivalent functions for them.

I thought the whole Py_buffer API was only temporarily removed from the
limited API until the issues were sorted out:

http://bugs.python.org/issue10181#msg125462

For instance, here the macros are not excluded:

http://hg.python.org/cpython/file/3292cc222d5c/Include/memoryobject.h

Since the issues seem resolved in general, I thought it was time to
restore the previous state. [I just noticed that I didn't revert all
of Martin's changes, so xxlimited currently does not build in the
pep-3118 repo.]

As for the two macros specifically, I know Pauli was using them:

http://bugs.python.org/issue10181#msg139775

> I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc.,
> since the comment already says "... Don't access their fields directly.".

My question is whether there is any point in making these flags. Does
3rd-party code want to manipulate memoryview internals, instead of
querying the Py_buffer?

The flags are primarily for the memoryview itself to avoid repeated calls
to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER
to get the view and also needs to determine the contiguity type, that
code could also use the flags.

Pauli: If you are still following the issue, do you think having access
to contiguity flags is useful for 3rd-party code or not?

pitrou · 2012-01-28T23:09:36Z

Antoine Pitrou <report@bugs.python.org> wrote:
> > a) Make all functions and the two buffer access macros part
> > of the limited API again.
>
> Hmm, I don't think buffer access macros should be part of the limited
> API. If they are truely important (which I doubt), we should have
> equivalent functions for them.

I thought the whole Py_buffer API was only temporarily removed from the
limited API until the issues were sorted out:

http://bugs.python.org/issue10181#msg125462

I'm talking about the memoryview access macros. It's like
PyList_GET_SIZE, it isn't part of the limited API and you have to use
PyList_GetSize instead.

The limited API promises binary (ABI) compatibility accross releases.
It's a very strong promise to make and we must be careful not to put too
much stuff in the limited API. I don't think the rule for inclusion is
"we should put everything useful in the limited API".

For instance, here the macros are not excluded:

http://hg.python.org/cpython/file/3292cc222d5c/Include/memoryobject.h

It might be a mistake, then.

As for the two macros specifically, I know Pauli was using them:

http://bugs.python.org/issue10181#msg139775

Well, Pauli might use them, but it just means his code isn't compatible
with the limited API, then.

The flags are primarily for the memoryview itself to avoid repeated calls
to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER
to get the view and also needs to determine the contiguity type, that
code could also use the flags.

But why would 3rd-party code use PyMemoryView_GET_BUFFER instead of, for
example, PyObject_GetBuffer? You seldom have APIs which *expect* a
memoryview, I think. Instead, they would expect a buffer-compatible
object.

ncoghlan · 2012-01-29T01:09:52Z

I'm with Antoine here - we want to be *very* conservative with what we expose through the limited API. Due to the ABI compatibility promise, anything exposed that way is very hard to change.

Keeping things out of the limited API isn't really an issue - it just means that anyone that wants to use them needs to rebuild their extensions for each new version of the C API (just as they do now).

Addings things to the stable ABI later is easy though - that's why we had Martin take all the Py_buffer related stuff out of the initial version of the limited API.

ncoghlan · 2012-01-29T02:24:36Z

Also, +1 to replacing _testbuffer with .cast() to demonstrate the multi-dimensional support. Without an actual multi-dimensional array object in the standard library, demonstrating those aspects is always going to be a little tricky. However, it's worth doing so that the likes of NumPy and PIL don't need to explain those interactions.

skrah · 2012-01-29T12:40:42Z

Antoine Pitrou <report@bugs.python.org> wrote:

> I thought the whole Py_buffer API was only temporarily removed from the
> limited API until the issues were sorted out:
>
> http://bugs.python.org/issue10181#msg125462

I'm talking about the memoryview access macros. It's like
PyList_GET_SIZE, it isn't part of the limited API and you have to use
PyList_GetSize instead.

You're right: I presumed that the macros were excluded temporarily
when in fact that had already happened in 62b61abd02b8.

> The flags are primarily for the memoryview itself to avoid repeated calls
> to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER
> to get the view and also needs to determine the contiguity type, that
> code could also use the flags.

But why would 3rd-party code use PyMemoryView_GET_BUFFER instead of, for
example, PyObject_GetBuffer? You seldom have APIs which *expect* a
memoryview, I think. Instead, they would expect a buffer-compatible
object.

That's a good question. It looks to me that the macro was present as
PyMemoryView() initially. You renamed it in bpo-3560 (with Guido's approval),
and later PyMemoryView_GET_BUFFER appeared in the docs.

I think 3rd-party code uses the macros mainly because they are
present and, in the case of PyMemoryView_GET_BUFFER, documented.
In most situations PyObject_GetBuffer() could be used indeed.

Most use cases I can think of would also involve having access to
the managed buffer API. As an example, here's a technique that is
similar to what goes on in PyMemoryView_GetContiguous():

Suppose you have an initialized bytes object that you want to
wrap as a multi-dimensional exporter. Then:

Base the memoryview on the bytes object and keep exactly one
reference to it.
Adjust the shape, strides etc. to get the structure you want.
Return the view: You now have a fully compliant exporter that
only needs a single Py_DECREF() to disappear and do all cleanup.

Of course this could also be exposed as a function, e.g.:

   /* stealing a reference to bytes */
   PyMemoryView_FromBytesAndInfo(PyObject *bytes, Py_buffer *info);

So let's make the flags private. What do you prefer?

Leave them in memoryview.h, but with a leading underscore:

_Py_MEMORYVIEW_C
[...]
Move them to memoryobject.c, with a leading underscore.
Move them to memoryobject.c, without a leading underscore (I find
this more readable).
Move them to memoryobject.c as MV_C, MV_FORTRAN, etc.

Also, I'll add a note to the docs that PyMemoryView_GET_BUFFER can
probably be avoided by using PyObject_GetBuffer() directly.

pitrou · 2012-01-29T17:19:06Z

Of course this could also be exposed as a function, e.g.:

/* stealing a reference to bytes */
PyMemoryView_FromBytesAndInfo(PyObject *bytes, Py_buffer *info);

I think we should minimize the number of reference-stealing functions.

So let's make the flags private. What do you prefer?

I don't really mind, whatever you think is best :)

skrah · 2012-01-31T20:34:29Z

I've uploaded a new patch that should address the remaining issues:

o In the documentation _testbuffer has been replaced by
m.cast() + the now multi-dimensional m.tolist().

o I restored the state of the limited API. If we want
to include Py_buffer again, I think this should be done
in a separate patch.

o Flags of the memoryview object are private.

Additionally, because NumPy allows non-aligned array accesses,
I changed the packing functions to use memcpy for multi-byte types.

On x86/amd64 gcc is smart enough to produce almost exactly the same
asm output as before, with a slowdown of 0-1%, depending on the
benchmark.

On other platforms the situation might be worse, but I don't have
access to real hardware where alignment actually matters.

ncoghlan · 2012-02-21T13:16:02Z

Latest version looks good to me - I vote for landing it whenever you're ready :)

skrah · 2012-02-24T15:16:19Z

Thanks, Nick. I'll try to get it done this weekend.

I've uploaded Misc/NEWS and Doc/whatsnew/3.3.rst (my apologies to Antoine
for plagiarizing the first sentence, I found it hard to come up with a
better version).

I wasn't sure whether to put the "whatsnew" entry into the section
"Other Language Changes" or into the PEP section. I chose the latter,
since many new features have been added that are part of the PEP.

ncoghlan · 2012-02-24T17:25:45Z

PEP section makes sense - I plan to mark PEP-3118 as Final once you commit this (or you can do that yourself, for that matter).

python-dev · 2012-02-25T11:25:30Z

New changeset 3f9b3b6f7ff0 by Stefan Krah in branch 'default':

Issue bpo-10181: New memoryview implementation fixes multiple ownership
http://hg.python.org/cpython/rev/3f9b3b6f7ff0

skrah · 2012-02-26T10:30:33Z

Nick Coghlan <report@bugs.python.org> wrote:

PEP section makes sense - I plan to mark PEP-3118 as Final once you commit
this (or you can do that yourself, for that matter).

Array features are complete except for multi-dimensional indexing and slicing.
I think it would be nice to add those in a separate issue; it's not a lot of
additional code and it doesn't interfere with the current code.

The overall array handling scheme is sound. Life would be a bit easier
if contiguity flags were a mandatory part of the Py_buffer structure
that the exporter has to fill in.

Then there is an open issue (bpo-3132) for expanding the struct module syntax,
where the wording in some sections of the PEP led to a bit of head-scratching. :)

In another issue (bpo-13072) the question came up whether the proposed 'u' and 'w'
formats still make sense after PEP-393 (I think they do, they should map to
UCS-2 and UCS-4).

We need to decide what to do about 2.7 and 3.2. It's pretty difficult by
now to separate the bug fixes from the features. I could follow the example
of CPU manufacturers: start with the whole feature set and disable as much
as possible.

Another problem for 2.7 and 3.2 is that the 'B' format would still need to
accept bytes instead of ints. Or can we change that as a bug fix?

ncoghlan · 2012-02-26T11:34:31Z

Aw, I guess I was too optimistic in thinking this was the last gap before we could declare it Final...

+1 on creating a new feature request for NumPy style multi-dimensional indexing and slicing support on memoryview (I'm assuming that's what you meant).

As far as your last question goes, I'm not sure we *can* salvage this in 2.7/3.2. We do have the option of throwing our hands up in the air and saying "Sorry, we couldn't figure out how to fix it without completely rewriting it. Take a look at 3.3 if you want to see how it's *supposed* to work."

I hesitate to suggest this (since I'm not volunteering to do the work) but perhaps it would be worth packaging up the 3.3. memoryview and publishing it on PyPI for the benefit of 3.2 and 2.7?

May also be worth bringing up on python-dev to see if anyone else has any bright ideas. Myself, I'm not seeing any real choices beyond "won't fix", "backport 3.3 version and remove/disable the new features", "backport 3.3 version, new features and all" and "publish a full featured version on PyPI".

skrah · 2012-02-29T11:33:51Z

I'm busy adding the C-API changes to the docs. Regarding the stable ABI:

The general mood was to *keep* the removal of the buffer interface
for some time, did I get that right?

In that case this removal (especially of the Py_buffer struct) should
be documented also for 3.2.

python-dev · 2012-02-29T16:54:30Z

New changeset 9d3c7dd55c7f by Stefan Krah in branch 'default':
Issue bpo-10181: Add warning that structure layouts in memoryobject.h and
http://hg.python.org/cpython/rev/9d3c7dd55c7f

ncoghlan · 2012-02-29T21:18:37Z

For 3.2, there's no removal to document - we asked MvL to drop the buffer APIs from PEP-384, so they've never been part of the stable ABI.

From the PEP: "The buffer interface (type Py_buffer, type slots bf_getbuffer and bf_releasebuffer, etc) has been omitted from the ABI, since the stability of the Py_buffer structure is not clear at this time. Inclusion in the ABI can be considered in future releases."

I agree we shouldn't be hasty in making even the updated buffer interface implementation officially part of the stable ABI. Better to leave it out for 3.3, get some real world feedback on the latest version, then commit to stability for 3.4+.

skrah · 2012-03-01T08:32:50Z

From the PEP: "The buffer interface (type Py_buffer, type slots bf_getbuffer
and bf_releasebuffer, etc) has been omitted from the ABI, since the stability
of the Py_buffer structure is not clear at this time. Inclusion in the ABI
can be considered in future releases."

Great, that's exactly what I was looking for. - As far as I can see the issue
is finished, so I'm closing it. Thanks again everyone for all the help!

ncoghlan · 2012-03-02T13:14:40Z

It occurs to me that one thing that *could* be backported to early versions are some of the documentation improvements on how to correctly handle the lifecycle of fields in Py_buffer. That gets messy though because memoryview doesn't behave nicely in those versions, so we'd be violating our own guidelines.

Perhaps the relevant sections should be updated with a reference saying that the semantics have been cleaned up in 3.3, and if in doubt, try to follow the 3.3 documentation?

python-dev · 2012-03-05T09:51:06Z

New changeset 373f6cdc6d08 by Stefan Krah in branch 'default':
Issue bpo-10181: The decision was to raise a buffer error in memory_exit()
http://hg.python.org/cpython/rev/373f6cdc6d08

joshtriplett · 2012-04-09T08:08:31Z

I currently use Python 2.7, and I'd like to make use of memoryview. Specifically, I work on BITS (http://biosbits.org/), which runs Python in ring 0 as part of GRUB, and I'd like to use memoryview to give Python access to data in physical memory. I ran into several of the problems documented here when trying to do so. I'd really love to see a backport of this fixed version into 2.7.

joshtriplett · 2012-04-09T08:15:57Z

I currently use Python 2.7, and I'd like to make use of memoryview. Specifically, I work on BITS (http://biosbits.org/), which runs Python in ring 0 as part of GRUB, and I'd like to use memoryview to give Python access to data in physical memory. I ran into several of the problems documented here when trying to do so. I'd really love to see a backport of this fixed version into 2.7.

More specifically, the primary functionality that I'd like to use exists in 3.3 as PyMemoryView_FromMemory. I've tried to approximate that function using the available API in 2.7, but that led me here.

rupole mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 23, 2010

ncoghlan changed the title ~~get_shape0 in memoryobject.c not checked for error return~~ Problems with Py_buffer management in memoryobject.c (and elsewhere?) Nov 3, 2010

pitrou assigned mdickinson Jan 4, 2011

skrah mannequin closed this as completed Mar 1, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Problems with Py_buffer management in memoryobject.c (and elsewhere?) #54390

Problems with Py_buffer management in memoryobject.c (and elsewhere?) #54390

Comments

rupole mannequin commented Oct 23, 2010

rupole mannequin commented Oct 23, 2010

pitrou commented Nov 2, 2010

ncoghlan commented Nov 2, 2010

pitrou commented Nov 2, 2010

ncoghlan commented Nov 2, 2010

pitrou commented Nov 2, 2010

rupole mannequin commented Nov 3, 2010

rupole mannequin commented Nov 3, 2010

ncoghlan commented Nov 3, 2010

ncoghlan commented Nov 3, 2010

ncoghlan commented Nov 3, 2010

mdickinson commented Jan 4, 2011

mdickinson commented Jan 4, 2011

ncoghlan commented Jan 4, 2011

ncoghlan commented Jan 4, 2011

ncoghlan commented Jan 4, 2011

ncoghlan commented Jan 5, 2011

mdickinson commented Jan 5, 2011

ncoghlan commented Jan 5, 2011

pitrou commented Jan 5, 2011

loewis mannequin commented Jan 5, 2011

ncoghlan commented Jan 6, 2011

ncoghlan commented Jan 6, 2011

pitrou commented Jan 6, 2011

mdickinson commented Jan 6, 2011

loewis mannequin commented Jan 6, 2011

skrah mannequin commented Jan 27, 2012

ncoghlan commented Jan 28, 2012

skrah mannequin commented Jan 28, 2012

skrah mannequin commented Jan 28, 2012

pitrou commented Jan 28, 2012

skrah mannequin commented Jan 28, 2012

pitrou commented Jan 28, 2012

skrah mannequin commented Jan 28, 2012

pitrou commented Jan 28, 2012

ncoghlan commented Jan 29, 2012

ncoghlan commented Jan 29, 2012

skrah mannequin commented Jan 29, 2012

pitrou commented Jan 29, 2012

skrah mannequin commented Jan 31, 2012

ncoghlan commented Feb 21, 2012

skrah mannequin commented Feb 24, 2012

ncoghlan commented Feb 24, 2012

python-dev mannequin commented Feb 25, 2012

skrah mannequin commented Feb 26, 2012

ncoghlan commented Feb 26, 2012

skrah mannequin commented Feb 29, 2012

python-dev mannequin commented Feb 29, 2012

ncoghlan commented Feb 29, 2012

skrah mannequin commented Mar 1, 2012

ncoghlan commented Mar 2, 2012

python-dev mannequin commented Mar 5, 2012

joshtriplett mannequin commented Apr 9, 2012

joshtriplett mannequin commented Apr 9, 2012