classification
Title: Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, jcon, joshtriplett, kristjan.jonsson, loewis, mark.dickinson, ncoghlan, petri.lehtinen, pitrou, pmoore, pv, python-dev, rupole, skrah, teoliphant
Priority: normal Keywords: patch

Created on 2010-10-23 18:22 by rupole, last changed 2012-04-09 08:15 by joshtriplett. This issue is now closed.

Files
File name Uploaded Description Edit
pybuffer.diff loewis, 2011-01-05 19:40 review
buffer-interface-clarify.patch pv, 2011-02-13 19:08 Clarifying the behavior of the buffer interface (for discussion, probably shouldn't be applied as-is)
bbe70ca4e0e5.diff skrah, 2011-07-04 10:29 PyManagedBuffer implementation review
718801740bde.diff skrah, 2011-07-05 10:47 PyManagedBuffer implementation #2 review
78fb1181787a.diff skrah, 2011-08-10 11:43 Add Modules/_testbuffer.c and Lib/test/test_buffer.py review
70a8ccd53ade.diff skrah, 2011-08-16 13:20 Fixed suboffsets representation in _testbuffer.c review
51cedae5acfc.diff skrah, 2011-09-08 14:46 PyManagedBuffer, native format, complete 1D slicing, casts review
4492afe05a07.diff skrah, 2011-09-18 09:16 Handle objects with an __index__() method. review
8dd9f0ea4876.diff skrah, 2012-01-25 17:43 review
a5c4a96dc981.diff skrah, 2012-01-31 20:32 review
issue10181-news.diff skrah, 2012-02-24 15:16 review
Repositories containing patches
http://hg.python.org/features/pep-3118#memoryview
Messages (147)
msg119460 - (view) Author: Roger Upole (rupole) Date: 2010-10-23 18:22
There are a number of places in memoryobject.c where get_shape0 is used without the return value being checked.  If it fails, this leads to hanging exceptions and crashes.
msg120267 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-02 22:16
Yes, there are probably many holes like this. I've done what I could to make the simple cases (builtin types) to work, but the spec is rotten from the start. Blame the numpy people, sorry.
msg120268 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-11-02 22:30
As far as I know, PEP 3118 serves its purpose of allowing extension modules like numpy and PIL to share data without needing to copy it around all the time. It's just that memoryview wasn't really part of that purpose (since all the affected third party libraries have their own objects for looking at memory), and it shows :P

Our near term goal should be to get memoryview in a good place for handling 1D contiguous data and throw exceptions (rather than crashing) for everything else. I think Antoine has achieved the former, but it sounds like there is still some work to do on the latter.
msg120270 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-02 22:34
> As far as I know, PEP 3118 serves its purpose of allowing extension
> modules like numpy and PIL to share data without needing to copy it
> around all the time. It's just that memoryview wasn't really part of
> that purpose (since all the affected third party libraries have their
> own objects for looking at memory), and it shows :P

The deeper issue is that there are no ownership or lifetime rules for
dynamically allocated fields in Py_buffer (such as strides and shape),
which itself isn't a PyObject. For very simple objects (such as bytes or
bytearray) it isn't really a problem, but it is in more complicated
situations; and there's no possible solution without a clear spec..

(and I'm not even talking of the issue of making slices of memoryviews,
since the buffer API itself doesn't handle slicing, which means the
memoryview object has to modify its inner Py_buffer...)
msg120280 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-11-02 22:57
Read the "Releasing the buffer" section in PEP 3118 again. Unless I'm misunderstanding you completely, the rules you're asking for are already in place: those fields are entirely the responsibility of the exporting object, and it needs to ensure they remain valid until the buffer is released.

Now, it may be that we haven't *implemented* this correctly or consistently in the builtin objects, in which case we should do something about it. But the rules are there.
msg120281 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-02 23:01
> Read the "Releasing the buffer" section in PEP 3118 again. Unless I'm
> misunderstanding you completely, the rules you're asking for are
> already in place: those fields are entirely the responsibility of the
> exporting object, and it needs to ensure they remain valid until the
> buffer is released.

In practice, though, we copy Py_buffer structs around and there's no way
for the original object to know about that. That's the fundamental
difference with a PyObject where you would only increase a refcount
instead of copying the structure's contents.
msg120297 - (view) Author: Roger Upole (rupole) Date: 2010-11-03 01:17
The culprit wrt copying Py_buffer structs seems mainly to be dup_buffer, which is called in memory_getbuf.  This seems unnecessary in the case where there's an underlying object and it has returned the view thru its own tp_as_buffer.  The underlying object at that point is solely responsible for releasing the buffer, so memory_getbuf shouldn't mess with it at all.  In the case where there is no underlying object (mainly thru PyMemoryView_FromBuffer), it probably should allocate any memory in the view in such a way that it can be freed in memory_releasebuf when the view->obj is NULL.
msg120318 - (view) Author: Roger Upole (rupole) Date: 2010-11-03 11:59
While on the subject, the docs for PyMemoryView_FromBuffer state that the resulting memoryview takes ownership of the Py_buffer struct and is responsible for freeing any associated memory.  It does not do so, which is not surprising.  The absence of a standard for allocation makes it absolutely impossible for it to do so safely.
msg120319 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-11-03 12:19
Hmm, if we're ever creating a new copy of a Py_buffer without calling GetBuffer again on the original object, then *that* is a bug (exactly comparable to copying a PyObject pointer without incrementing the reference count - it's OK if you can guarantee the lifecycle of your borrowed pointer is shorter than the lifecycle of the original, but disallowed otherwise).

This still doesn't sound like a problem with the spec to me, it sounds like a problem with the implementation strategy that was originally used when integrating the spec into the interpeter core (which I readily agree received far less attention than the spec itself did).

It really sounds like we should be creating a _Py_ManagedBuffer internal object, with each instance containing a PyObject* and a Py_buffer instance. We don't need to alter PEP 3118 to do that - such a change is entirely consumer side, so the objects providing the buffers don't need to care how the lifecycle of the Py_buffer struct is managed.

When we need another reference to the buffer, we can then just increment the refcount of the _Py_ManagedBuffer instance rather than having to call GetBuffer again on the original object.

The contents of Py_buffer objects that have been passed to GetBuffer really shouldn't be getting modified at all - for those cases, we should maintain *two* Py_buffer structs, one (unmodified) with the original results of the GetBuffer call, and a second owned by the caller (for manipulation).
msg120321 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-11-03 12:51
It doesn't help that neither PEP 3118 nor the Py_buffer docs mention the "obj" member of the Py_buffer struct that refers back to the original object providing the buffer - that's fairly fundamental to understanding how PyBuffer_Release and PyMemoryView_FromBuffer can work even in theory.

(Given that, an additional _PyManagedBuffer object shouldn't be necessary - MemoryView just needs to call ReleaseBuffer and GetBuffer at the appropriate times)
msg120322 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-11-03 13:00
So the overall to-do list here is sounding like:

1. Document "Py_buffer->obj" appropriately in both the C API documentation and in PEP 3118

2. Ensure GetBuffer/ReleaseBuffer are used as the moral equivalent of INCREF/DECREF for Py_buffer objects.

3. Check builtin types correctly manage the lifecycle of all elements of their exported Py_buffer structs

4. Check return values and throw exceptions as noted in the original post
msg125296 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-01-04 08:18
It might be instructive to look at how NumPy itself manages sharing of ndarray data and ownership of the corresponding structs.  I meant to find time to look at this over the break.
msg125297 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-01-04 08:22
BTW, I agree that it looks as though significant changes might be needed.  I wonder whether it would make sense to exclude the Py_buffer struct fro m the Stable ABI PEP for now.
msg125327 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-01-04 14:49
Agreed. I'll put something on python-dev about that, so MvL sees it.
msg125328 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-01-04 14:54
More direct - added MvL to nosy list.

Martin, we would like to exclude Py_buffer from the stable ABI for Python 3.2, until we have a chance to thrash out the missing pieces of the documentation for 3.3. I *think* it is a documentation problem, but until we're certain, it seems safer to leave it out.
msg125332 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-01-04 14:58
From Lenard Lindstrom in issue 10001 (closed as dupe of this issue):

The ~Py_buffer.obj field is undocumented. Yet memoryview, that acts as a wrapper, includes the field in gc traversal. Also, if the field is not initialized by bf_getbuffer its value can be indeterminate. For memoryview the gc can crash (see attached C demo program).
msg125418 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-01-05 12:25
Added Travis to nosy list - even if he doesn't have time to finish this off himself, hopefully he can point us in the right direction.
msg125424 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-01-05 13:51
Nick, it sounds as though you have an idea of how you think things should be working here---perhaps you can help me out.  I'd really like to understand what the implementation should look like from the POV of a 3rd party module that defines some object exporting the buffer interface.

Here's a specific scenario I'd like to understand:  module foo defines a type Foo that implements the buffer protocol.  For simplicity, suppose it's exporting 1-dim buffers. When I do:

>>> from foo import Foo
>>> foo_object = Foo()
>>> m = memoryview(foo_object)
>>> n = m[::2]  # take a slice of m
>>> del m       # delete the objects, in whichever order.
>>> del n

what's the sequence of getbuffer and releasebuffer calls that foo_object should expect to see?

Q1. Does foo get 2 requests to getbuffer (and 2 to releasebuffer), or just one each?  I'm assuming at least that getbuffer and releasebuffer calls should be paired.

Q2. For each pair of getbuffer/releasebuffer calls, should the 'view' parameter passed into releasebuffer be identical to that provided to getbuffer?  Or is it acceptable for the actual Py_buffer* pointers to be distinct, but the pointed-to Py_buffers to be exact copies.  (The existence of the smalltable field complicates the notion of an exact copy a little bit, but I think that's a detail that can be ignored for these questions.)
msg125431 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-01-05 16:30
PEP 3118 makes it clear that the underlying object should see *two* pairs of calls to the buffer methods:
http://www.python.org/dev/peps/pep-3118/#the-py-buffer-struct

Even if we ignore the undocumented "obj" field, the target object needs to ensure the exported buffer remains valid as long as any memory views are referencing it. The only way to do that is to treat GetBuffer/ReleaseBuffer as the moral equivalent of INCREF/DECREF.

However, I believe the current memoryview implementation does the wrong thing and only calls them once, and then duplicates the Py_buffer structures without ever going back to the original objects (that opinion was based on a quick scan of the code a while back, but it would fit with the uncomplimentary sentiments Antoine has expressed in trying to get all this to exhibit some kind of consistency)

For point 2, it must be the same pointer. When the PEP says "the same", I agree it could be taken as ambiguous, but the later reference to the exporter managing a linked-list of exported views makes it clear that identity is what matters.

As far as I can see, some of things in the PEP were found to be a PITA in practice (such as every consumer of the API having to implement the equivalent of the "base" attribute in the original memoryview design), so Travis changed them. Unfortunately, those changes never made it back into the protocol documentation, leading to the current confusion.
msg125433 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-05 16:52
> However, I believe the current memoryview implementation does the
> wrong thing and only calls them once, and then duplicates the
> Py_buffer structures without ever going back to the original objects
> (that opinion was based on a quick scan of the code a while back, but
> it would fit with the uncomplimentary sentiments Antoine has expressed
> in trying to get all this to exhibit some kind of consistency)

Actually, and unless I made a mistake, it does call them twice.
(and does the slicing by hand afterwards, which explains part of the
hilarity with smalltable and friends :-))

> For point 2, it must be the same pointer. When the PEP says "the
> same", I agree it could be taken as ambiguous, but the later reference
> to the exporter managing a linked-list of exported views makes it
> clear that identity is what matters.

The common idiom (including in code not written by me :-)) is currently
to use Py_buffer variables allocated on the C stack.

Also, we have the C API function PyMemoryView_FromBuffer() which
basically mandates that Py_buffer structs can be copied around. And it's
a very useful function since it allows to create a memoryview from a
chunk of anonymous memory.
msg125462 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-01-05 19:40
> Martin, we would like to exclude Py_buffer from the stable ABI for
> Python 3.2, until we have a chance to thrash out the missing pieces
> of the documentation for 3.3. I *think* it is a documentation
> problem, but until we're certain, it seems safer to leave it out.

Fine with me. Attached is a patch; it would be good if someone could
confirm that this exactly comprises the API that should be hidden.
msg125499 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-01-06 02:09
It's OK if the Py_buffer is on the stack - it's just a unique identifier for the exporter to use as a key, not something the exporter controls the lifecycle of (the latter is true only for the pointers *inside* the struct, such as buf, shape, strides, etc).

PyMemoryView_FromBuffer should be calling PyObject_Getbuffer on the view->obj member (it's one of the things that embedding the reference allows, just as it allowed removal of the separate obj argument from the PyObject_ReleaseBuffer signature). That way the source object knows there is now a *second* Py_buffer struct kicking around, and can decide whether to re-use the same internal pointers or create new ones.
msg125502 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-01-06 02:15
Martin's patch for the PEP 384 adjustments looks good to me.

A note in the PEP that we've deliberately excluded this on a temporary basis due to the implementation/documentation mismatch and expect to have it back in for 3.3 might be helpful, too.
msg125507 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-06 06:14
> PyMemoryView_FromBuffer should be calling PyObject_Getbuffer on the
> view->obj member (it's one of the things that embedding the reference
> allows, just as it allowed removal of the separate obj argument from
> the PyObject_ReleaseBuffer signature).

One use case is anonymous memory, i.e. view->obj == NULL.
msg125531 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-01-06 09:20
[Nick]
> For point 2, it must be the same pointer. When the PEP says "the same", > I agree it could be taken as ambiguous, but the later reference to the 
> exporter managing a linked-list of exported views makes it clear that 
> identity is what matters.

Okay, thanks.  Next point of confusion:  does that mean that in the example I gave, the object 'n' (the memoryview slice) needs to know about *two* distinct Py_buffer structs---one to pass back to releasebuffer and one to store its own information?  It seems to me that these can't be the same, since (a) the Py_buffer for the slice will have different shape and stride from that returned by getbuffer, (b) what's passed back to releasebuffer should be identical to what came from getbuffer, so we can't just modify the shape and stride information in place.
msg125582 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-01-06 19:35
Removal of the buffer interface from the ABI has been committed as r87805, r87806, r87805, r87808, and r87809.
msg125618 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-01-07 03:59
@Antoine: I actually had a closer look at the way dup_buffer is used, and I think it is currently legitimate (including for the anonymous memory case). It just isn't documented very well. For the "FromBuffer" case, it assumes the calling code already invoked GetBuffer as necessary when initialising the view, and for the various internal uses the dup_buffer call is immediately preceded by a GetBuffer call when view->obj is not NULL.

@Mark: I don't think that follows. PEP 3118 says that exporting objects own that memory, but it doesn't say that clients have to treat the structures as read-only. The only thing clients are obligated to do is pass the same Py_buffer address to ReleaseBuffer that was passed to GetBuffer. If the exporter actually needs to release buffer specific resources, then it should maintain an internal data structure keyed off the Py_buffer address.

The one thing clients really shouldn't mess with is the obj field (since ReleaseBuffer needs that to find the correct release method to invoke), but the PEP is obviously silent on that matter since it doesn't even acknowledge the existence of that field.
msg125630 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-01-07 09:05
[Nick]
> @Mark: I don't think that follows. [...]
> If the exporter actually needs to release buffer specific 
> resources, then it should maintain an internal data structure keyed off 
> the Py_buffer address.

Ah, okay.  So that would make issue 9990 just a documentation problem, right?
msg125631 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-01-07 09:10
> an internal data structure keyed off 
> the Py_buffer address.

If we're using the Py_buffer address coming into getbuffer as a key, then we probably shouldn't be using a stack address, since it would be difficult to guarantee uniqueness that way.
msg125635 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-01-07 09:34
It only needs to be unique for the lifetime of the buffer reference - for a Py_buffer struct on the stack, by the time the relevant C stack frame goes away, ReleaseBuffer should already have been called.
msg125640 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-01-07 10:41
> by the time the relevant C stack frame goes away, ReleaseBuffer should 
> already have been called.

Hmm. I'm not sure I understand how/when that would happen.  Looking at the current py3k code, in Objects/memoryobject.c at line 92, we have:

PyObject *
PyMemoryView_FromObject(PyObject *base)
{
    PyMemoryViewObject *mview;
    Py_buffer view;

    if (!PyObject_CheckBuffer(base)) {
        PyErr_SetString(PyExc_TypeError,
            "cannot make memory view because object does "
            "not have the buffer interface");
        return NULL;
    }

    if (PyObject_GetBuffer(base, &view, PyBUF_FULL_RO) < 0)
        return NULL;

    mview = (PyMemoryViewObject *)PyMemoryView_FromBuffer(&view);
    if (mview == NULL) {
        PyBuffer_Release(&view);
        return NULL;
    }

    return (PyObject *)mview;
}

So here 'view' is being allocated on the stack, and its address passed to PyObject_GetBuffer;  PyBuffer_Release isn't called (except when an error happens) before the function exits and the stack frame becomes invalid.

Sorry for the odd questions;  it's clear to me that I'm misunderstanding something fundamental, but I'm struggling to figure out what.
msg125641 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-01-07 11:22
Ah, sorry - no, I misunderstood the question. I think that example actually *is* a bug in the memoryview implementation. The GetBuffer call should use the persistent address (&mview->view) that will be used in the ReleaseBuffer call, as per Antoine's patch on issue 9990.
msg128475 - (view) Author: Pauli Virtanen (pv) Date: 2011-02-13 03:47
I spent today some time to rewrite `memoryobject.c`, and cleaning up the Py_buffer handling in it. (I wrote also the Numpy PEP 3118 implementation, so this was straightforward to do.)

The end result is here: https://bitbucket.org/pv/cpython-stuff/changesets   (bookmark bug/memoryview)

Some points of note:

- Clarification of ownership conventions for Py_buffer in docs:

  http://bitbucket.org/pv/cpython-stuff/changeset/805971191369

  The new implementation is written to respect the invariants
  mentioned there.

- Rewritten memoryobject:

  http://bitbucket.org/pv/cpython-stuff/src/817edc63ce4d/Objects/memoryobject.c

I didn't yet heavily test this (eg. against Numpy), but at least the Python test suite passes. There are also some minor corners that need to be polished. Also some new tests would be needed, as I slightly extended the slicing capabilities of memoryview.

Only one nasty point turned up: the existence of PyMemoryView_FromBuffer. This function breaks the ownership rules: the pointers in Py_buffer structure can point inside the structure --- and the structure can reside e.g. on the stack. Deep copying Py_buffer cannot in general be done, because this can mess up whatever bf_releasebuffer tries to do. The workaround here was to do a deep copy *only* for those pointers that point inside the struct --- although this violates the spirit of the ownership model, it should be essentially always safe to do, and I believe it is the correct solution (at least if PyMemoryView_FromBuffer is to be retained).

    ***

Comments are welcome.
msg128477 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-13 04:23
Hello,

> I spent today some time to rewrite `memoryobject.c`, and cleaning up
> the Py_buffer handling in it. (I wrote also the Numpy PEP 3118
> implementation, so this was straightforward to do.)

Thanks for trying this.

> - Rewritten memoryobject:
> 
> http://bitbucket.org/pv/cpython-stuff/src/817edc63ce4d/Objects/memoryobject.c

Some worrying things here:

* memoryview_getbuffer() doesn't call the original object's getbuffer.
  This means that if I do:
        m = memoryview(some_object)
        n = memoryview(m)
        m.release()
  n ends up holding a buffer to some_object's memory, but some_object 
  doesn't know about it and can free the pointer at any time.

* same for _get_sub_buffer_index() and _get_sub_buffer_slice0().
  Actually, the whole concept of non-owner memoryviews seems flawed.

* the fact that slicing increments the parent memoryview's refcount 
  means that a loop like:
     while len(m):
       # do something
       m = m[1:]
  will end up keeping N chained memoryviews on the heap. Currently only
  the last memoryview remains alive.

Some other things:

* why do you accept the ellipsis when indexing? what is it supposed to 
  mean?

* please don't use #warning. Just put the TODO in a /* ... */.

* please no "#if PY_VERSION_HEX >= 0x03000000". Just make your source
  py3k-compatible and we'll take care of backporting it to 2.x if your
  patch is accepted ;)
msg128491 - (view) Author: Pauli Virtanen (pv) Date: 2011-02-13 13:20
Hi,

Please focus on the constraints of the consumer not mucking with the content of `Py_buffer`, and calling `bf_releasebuffer` only with data obtained from `bf_getbuffer` and only one. If we agree on this, then how to exactly to do the implementation is just a detail.

The problem in the current way is that the structure sent to `bf_releasebuffer` does not contain the same data as what was filled in by `bf_getbuffer`, and since the contents are dup-ed, `bf_releasebuffer` is called multiple times with the same data. So, `bf_releasebuffer` cannot rely on (i) the data in Py_buffer being what `bf_getbuffer` put there, and (ii) getting the same Py_buffer data only once.

So, `bf_releasebuffer` cannot be used to release any resources allocated in `bf_getbuffer`. For example, Numpy's PEP 3118 implementation does not define a `bf_releasebuffer` although it needs to allocate resources in each call to `bf_getbuffer`. Instead, it has to keep a track of allocated memory, and deallocate all of them on `array_dealloc`. This is a PITA, and one that could be avoided by a small change in the PEP implementation and documentation in Python.

> Some worrying things here:
> 
> * memoryview_getbuffer() doesn't call the original object's getbuffer.
>   This means that if I do:
>         m = memoryview(some_object)
>         n = memoryview(m)
>         m.release()
>   n ends up holding a buffer to some_object's memory, but some_object 
>   doesn't know about it and can free the pointer at any time.

Good point. There are two possible solutions to this:

- Keep a count of how many buffers memoryview() has "exported", 
  and do not allow memoryview.release() if some are active.

  In a sense, this would be more in line with the PEP:
  the PyMemoryViewObject would here act as an ordinary object
  exporting some block of memory, and not do any magic tricks.
  It would guarantee that the buffers it has "exported" stay valid.

- Add additional fields to `PyMemoryViewObject` for storing
  new `strides`, `format`, and `shape` that override the stuff
  in Py_buffer.

  This would allow for calling `PyObject_GetBuffer` for a second time.

Both would be reasonably easy to do.

    ***

Calling PyObject_GetBuffer to get a new hold of a buffer needs some precautions, though. For example:

    >>> mem = memoryview(some_object)
    # do some stuff
    >>> mem2 = memoryview(some_object)
    >>> assert mem.format == mem2.format  # not guaranteed

so there is some extra work `memoryview_getbuffer` could do on top of calling PyObject_GetBuffer.

> * same for _get_sub_buffer_index() and _get_sub_buffer_slice0().
>  Actually, the whole concept of non-owner memoryviews seems flawed.

If the "parent" memoryview keeps its the memory valid as long as such sub-memoryviews exist, such concerns go away.

> * the fact that slicing increments the parent memoryview's refcount 
>  means that a loop like:
>     while len(m):
>       # do something
>       m = m[1:]
>  will end up keeping N chained memoryviews on the heap. Currently only
>  the last memoryview remains alive.

This has an easy solution: if the parent is a non-owner memoryview, take a reference to its obj. This then keeps only buffer-owning view on the stack.

> Some other things:
>
> * why do you accept the ellipsis when indexing? what is it supposed to 
>   mean?

Same meaning as in Numpy. a[...] == a

> * please don't use #warning. Just put the TODO in a /* ... */.

Sure.

> * please no "#if PY_VERSION_HEX >= 0x03000000". Just make your source
>   py3k-compatible and we'll take care of backporting it to 2.x if your
>  patch is accepted ;)

Those are just so that the backporting would be easy. It's not on the stage of a serious patch yet :)
msg128495 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-13 14:19
> The problem in the current way is that the structure sent to
> `bf_releasebuffer` does not contain the same data as what was filled
> in by `bf_getbuffer`, and since the contents are dup-ed,
> `bf_releasebuffer` is called multiple times with the same data.

Hmm, there's a misunderstanding. bf_releasebuffer is called exactly once
for each call to bf_getbuffer. Of course, bf_getbuffer can be called
several times!

>  So, `bf_releasebuffer` cannot rely on (i) the data in Py_buffer being
> what `bf_getbuffer` put there,

Well, why should it rely on that?

> So, `bf_releasebuffer` cannot be used to release any resources
> allocated in `bf_getbuffer`.

AFAICT, it can. That's what the "internal" pointer is for:

        This is for use internally by the exporting object. For example,
        this might be re-cast as an integer by the exporter and used to
        store flags about whether or not the shape, strides, and
        suboffsets arrays must be freed when the buffer is released. The
        consumer should never alter this value.

(http://docs.python.org/dev/c-api/buffer.html#Py_buffer.internal)

> > Some worrying things here:
> > 
> > * memoryview_getbuffer() doesn't call the original object's getbuffer.
> >   This means that if I do:
> >         m = memoryview(some_object)
> >         n = memoryview(m)
> >         m.release()
> >   n ends up holding a buffer to some_object's memory, but some_object 
> >   doesn't know about it and can free the pointer at any time.
> 
> Good point. There are two possible solutions to this:
> 
> - Keep a count of how many buffers memoryview() has "exported", 
>   and do not allow memoryview.release() if some are active.

Where would that count be stored?

>   In a sense, this would be more in line with the PEP:
>   the PyMemoryViewObject would here act as an ordinary object
>   exporting some block of memory, and not do any magic tricks.

Well, that sounds wrong to me. The memoryview doesn't export anything;
the original object does.

>   It would guarantee that the buffers it has "exported" stay valid.

How would it, since it doesn't know the original object's semantics?

> - Add additional fields to `PyMemoryViewObject` for storing
>   new `strides`, `format`, and `shape` that override the stuff
>   in Py_buffer.
> 
>   This would allow for calling `PyObject_GetBuffer` for a second time.

Sounds better to me :)

> Calling PyObject_GetBuffer to get a new hold of a buffer needs some precautions, though. For example:
> 
>     >>> mem = memoryview(some_object)
>     # do some stuff
>     >>> mem2 = memoryview(some_object)
>     >>> assert mem.format == mem2.format  # not guaranteed

Well, if the original object decides to change the format between two
calls, then memoryview() should simply reflect that.

> > * same for _get_sub_buffer_index() and _get_sub_buffer_slice0().
> >  Actually, the whole concept of non-owner memoryviews seems flawed.
> 
> If the "parent" memoryview keeps its the memory valid as long as such
> sub-memoryviews exist, such concerns go away.

So release() wouldn't do what it claims to do? That sounds misleading.

> > Some other things:
> >
> > * why do you accept the ellipsis when indexing? what is it supposed to 
> >   mean?
> 
> Same meaning as in Numpy. a[...] == a

Yes, but why we would do that in core Python? a[...] might have a
meaning in NumPy, but it has none in core Python currently.
I'm not against it, but it would warrant discussion on python-dev.
msg128499 - (view) Author: Pauli Virtanen (pv) Date: 2011-02-13 14:35
> Hmm, there's a misunderstanding. bf_releasebuffer is called exactly
> once for each call to bf_getbuffer.

Wrong: http://bugs.python.org/issue7433

static int
memory_getbuf(PyMemoryViewObject *self, Py_buffer *view, int flags)
{
    int res = 0;
    CHECK_RELEASED_INT(self);
    if (self->view.obj != NULL)
        res = PyObject_GetBuffer(self->view.obj, view, flags);
    if (view)
        dup_buffer(view, &self->view);
    return res;
}

After this, PyBuffer_Release will be called twice: once on the data in *view, by whoever acquired the buffer from memoryview, and once on self->view, by memory_dealloc. Both with the same bit-by-bit content of the Py_buffer structure.

Because there are two Py_buffer structures here, setting view.obj to NULL in PyBuffer_Release does not guarantee correct calls to bf_releasebuffer.

Note that the view.internal pointer is also clobbered above.

> >  So, `bf_releasebuffer` cannot rely on (i) the data in Py_buffer
> > being what `bf_getbuffer` put there,
> 
> Well, why should it rely on that?

Because that makes implementing the exporter much easier. Also, writing an implementation for MemoryViewObject does not require clobbering the structure, and I doubt it helps much.

> > So, `bf_releasebuffer` cannot be used to release any resources
> > allocated in `bf_getbuffer`.
>
> AFAICT, it can. That's what the "internal" pointer is for.

Sure, guaranteeing that view->internal pointer is not toyed with would also be enough.

But the documentation should spell out very explicitly what the bf_releasebuffer call can rely on.
msg128503 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-13 15:02
> > Hmm, there's a misunderstanding. bf_releasebuffer is called exactly
> > once for each call to bf_getbuffer.
> 
> Wrong: http://bugs.python.org/issue7433

This is a different issue.

> static int
> memory_getbuf(PyMemoryViewObject *self, Py_buffer *view, int flags)
> {
>     int res = 0;
>     CHECK_RELEASED_INT(self);
>     if (self->view.obj != NULL)
>         res = PyObject_GetBuffer(self->view.obj, view, flags);
>     if (view)
>         dup_buffer(view, &self->view);
>     return res;
> }
> 
> After this, PyBuffer_Release will be called twice: once on the data in
> *view, by whoever acquired the buffer from memoryview
> , and once on self->view, by memory_dealloc.

PyObject_GetBuffer() is called twice too: once when creating the
memoryview, once when calling memory_getbuf.
So again, bf_getbuffer is called the same number of times as
bf_releasebuffer.

> Note that the view.internal pointer is also clobbered above.

Are you sure? memoryobject.c doesn't touch that pointer at all.

> > > So, `bf_releasebuffer` cannot be used to release any resources
> > > allocated in `bf_getbuffer`.
> >
> > AFAICT, it can. That's what the "internal" pointer is for.
> 
> Sure, guaranteeing that view->internal pointer is not toyed with would
> also be enough.
> 
> But the documentation should spell out very explicitly what the
> bf_releasebuffer call can rely on.

Yes.
msg128505 - (view) Author: Pauli Virtanen (pv) Date: 2011-02-13 15:09
[clip]
> This is a different issue.

It is the issue relevant for this discussion. As written in my comment: "So, `bf_releasebuffer` cannot rely on (i) the data in Py_buffer being what `bf_getbuffer` put there, and (ii) getting the same Py_buffer data only once."

> PyObject_GetBuffer() is called twice too: once when creating the
> memoryview, once when calling memory_getbuf.
> So again, bf_getbuffer is called the same number of times as
> bf_releasebuffer.

Yes, it is called twice, but with exactly the same data in Py_buffer.
So I would rather say that bf_releasebuffer is called twice on the Py_buffer returned by the first Getbuffer, and zero times for the buffer returned by the second one.

> > Note that the view.internal pointer is also clobbered above.
> 
> Are you sure? memoryobject.c doesn't touch that pointer at all.

dup_buffer does *dst = *src, which overwrites the view.internal pointer obtained from one GetBuffer call with a pointer obtained from a previous one.
msg128506 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-13 15:11
> dup_buffer does *dst = *src, which overwrites the view.internal
> pointer obtained from one GetBuffer call with a pointer obtained from
> a previous one.

Ah, ok, then it deserves fixing.
msg128508 - (view) Author: Pauli Virtanen (pv) Date: 2011-02-13 15:34
Ok, good, that diversion was then avoided :)

    ***

So, am I on the right track that there would not be many objections to clarifying the docs of Py_buffer spec by stating:

- For each buffer yielded by `bf_getbuffer`, `bf_releasebuffer`
  is called exactly once.

  Each `bf_releasebuffer` call is guaranteed to get the same
  view->internal pointer as filled in previously by the
  corresponding `bf_getbuffer`.

  All other fields in `Py_buffer` may be modified by the consumer,
  and `bf_releasebuffer` may not assume they contain valid data.

- The exporter of the buffer must ensure that apart from the contents
  of the memory pointed to by `buf`, the contents of the returned
  Py_buffer (format, strides, shape, etc.) remain unchanged.

If the latter is not true, it will lead to no end of trouble in e.g. multithreaded programs.

What about the more strict requirement:

- `bf_releasebuffer` is guaranteed that all fields except `obj`
  are not modified?

This could simplify implementation of exporters, and I suspect that MemoryViewObject is a special case, as most consumers probably would not want to make such modifications.

I can try preparing an updated patch based on some combination of the above, taking into account your comments.

BTW, should I take this discussion to Python-dev? So far, I kept it here, as this bug report seemed to be about general issues in the current implementation and spec.
msg128514 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-13 16:04
> - For each buffer yielded by `bf_getbuffer`, `bf_releasebuffer`
>   is called exactly once.
> 
>   Each `bf_releasebuffer` call is guaranteed to get the same
>   view->internal pointer as filled in previously by the
>   corresponding `bf_getbuffer`.
> 
>   All other fields in `Py_buffer` may be modified by the consumer,
>   and `bf_releasebuffer` may not assume they contain valid data.
> 
> - The exporter of the buffer must ensure that apart from the contents
>   of the memory pointed to by `buf`, the contents of the returned
>   Py_buffer (format, strides, shape, etc.) remain unchanged.

Yes! Can you write a patch that does these docs changes and also fixes
the internal pointer issue?

> What about the more strict requirement:
> 
> - `bf_releasebuffer` is guaranteed that all fields except `obj`
>   are not modified?

Well, if you manage to change the memoryview implementation for that,
then fine. But I don't know how you plan to manage ownership of fields
such as strides and friends.

> BTW, should I take this discussion to Python-dev? So far, I kept it
> here, as this bug report seemed to be about general issues in the
> current implementation and spec.

I don't think you need to. Nick and Mark follow this issue (and so does
Travis if he's still alive ;)).
msg128516 - (view) Author: Pauli Virtanen (pv) Date: 2011-02-13 19:08
Ok, here's a patch with some suggested documentation changes + the minimal changes in memoryview to make bf_releasebuffer behave as advertised. Probably shouldn't be applied as-is, though.

Problems remain. Suppose `bf_getbuffer` for memoryview is implemented so that it returns a view exported by the original object (and not the memoryview). Consider an example:

    >>> obj = PictureSet()       # exposes a buffer, say, to "pic_1"
    >>> a = memoryview(obj)      # points to pic_1

Here, `a` grabs a lock to pic_1.

    >>> obj.choose_picture(2)    # switches the exposed buffer to pic_2
    >>> b = memoryview(obj)      # points to pic_2
    >>> c = memoryview(a)

Here, PyObject_GetBuffer in bf_getbuffer for `a` grabs a lock to pic_2 and passes it on to `c`. The spec specifies no ways to get additional locks on pic_1.

    >>> a.release()

Now lock on pic_1 is released, and the buffer may be freed (also, if the handling of shape/stride arrays in memoryview is as it is now, also those are invalidated). One of the two is now true: `memoryview(a)` does not always describe the same area of memory as `a`, OR, `c` ends up with a lock on the wrong area of memory and the program hits a SEGV. 

[clip]
> >   In a sense, this would be more in line with the PEP:
> >   the PyMemoryViewObject would here act as an ordinary object
> >   exporting some block of memory, and not do any magic tricks.
> 
> Well, that sounds wrong to me. The memoryview doesn't export 
> anything; the original object does.

Having the memoryview "own" the exported buffer would be a simple solution to the above issue.

The alternative would be to adjust the spec so that the above type of behavior is forbidden (any buffers and the shape/stride arrays ever exported must remain valid until the last one is released). Not sure if it makes sense to do this if the only gain is a simplification in the implementation of memoryview.

> > It would guarantee that the buffers it has "exported" stay valid.
>
> How would it, since it doesn't know the original object's semantics?

The original object must guarantee that the buffers remain valid until released, as specified in the PEP. Of course, this requires that memoryview counts how many buffers it has exported, and allows release() only if the count is zero.
msg128517 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-13 19:31
> Problems remain. Suppose `bf_getbuffer` for memoryview is implemented
> so that it returns a view exported by the original object (and not the
> memoryview). Consider an example:
> 
>     >>> obj = PictureSet()       # exposes a buffer, say, to "pic_1"
>     >>> a = memoryview(obj)      # points to pic_1
> 
> Here, `a` grabs a lock to pic_1.
> 
>     >>> obj.choose_picture(2)    # switches the exposed buffer to
> pic_2

That's a very bad API to begin with. Anyone thinking it is reasonable to
have the concept of a "current element" in a container should write PHP
code instead. If you have individual pictures, please use individual
objects.

>     >>> b = memoryview(obj)      # points to pic_2
>     >>> c = memoryview(a)
> 
> Here, PyObject_GetBuffer in bf_getbuffer for `a` grabs a lock to pic_2
> and passes it on to `c`

So what? c will represent pic_2 anyway, so the behaviour is right. Awful
certainly for the user, but that's because PictureSet implements a
braindead API.

> , and the buffer may be freed (also, if the handling of shape/stride
> arrays in memoryview is as it is now, also those are invalidated). One
> of the two is now true: `memoryview(a)` does not always describe the
> same area of memory as `a`, OR, `c` ends up with a lock on the wrong
> area of memory and the program hits a SEGV. 

This all points to completely broken logic in the imaginary PictureSet
class. If someone doesn't get their semantics right, it is not the
buffer API's problem. bf_getbuffer and bf_releasebuffer must implement
symmetric operations; perhaps that's not explicitly worded, but it seems
quite obvious to me.

> The alternative would be to adjust the spec so that the above type of
> behavior is forbidden (any buffers and the shape/stride arrays ever
> exported must remain valid until the last one is released).

Why would you adjust the spec? This just stems from common sense.
If you provide a pointer to a consuming API, you have to ensure the
pointer remains valid as long as the consuming API is using that
pointer. Actually, that's *exactly* why there's a bf_releasebuffer
method at all.
msg128518 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-13 19:42
That said:

> Having the memoryview "own" the exported buffer would be a simple
> solution to the above issue.

If you can implement that without exhibing the issues we discussed above (e.g. O(N) memory consumption when doing repetitive slicing), then why not.

Also, thanks for the patch. Please refrain from spurious formatting changes, though.
msg128525 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-02-13 22:01
I'm still not comfortable with a convention that relies on *clients* of the PEP 3118 API not mucking with the internals of the Py_buffer struct.

I'm *much* happier with the rule based on malloc/free semantics where the *pointer* passed to PyObject_GetBuffer must match a single later call to PyObject_ReleaseBuffer. This is also a convention that should be quite familiar to any C/C++ developers using the API directly.

Note my comment earlier, pointing out that under those rules, dup_buffer actually does the right thing given the way it is used in the current implementation, but there's one current bug (issue 9990) where the wrong address is passed to the release buffer call - get_buffer is called with a stack address, the contents are copied to a heap address, then the heap address is used in the release_buffer call.
msg128526 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-02-13 22:12
As far as the question of re-exporting the underlying view or not goes, I agree having "memoryview(a)" potentially refer to different underlying memory from "a" itself (because the source object has changed since the first view was exported) is a recipe for confusion. It is also important for ensuring memoryview slicing works correctly.

From a didactic point of view, this would also have the benefit of providing a canonical example of managing exported Py_buffer structs in the CPython source code.

The "repeated slicing" issue is definitely a concern, though. If we declare memoryview objects officially immutable, then we can avoid that by shortcutting the chaining with a typecheck on the referenced object (i.e. call GetBuffer on the source object directly if it is a memoryview, but on the memoryview we're copying otherwise)
msg128527 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-02-13 22:20
I should note that the idea of using a Py_buffer's "internal" field as the unique ID rather than the Py_buffer's address does have some merit, but I still overall prefer a design that places fewer constraints on API consumers.
msg128528 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-13 22:33
> I'm still not comfortable with a convention that relies on *clients*
> of the PEP 3118 API not mucking with the internals of the Py_buffer
> struct.

Which clients? Those who export the buffer, or those who consume it?

> I'm *much* happier with the rule based on malloc/free semantics where
> the *pointer* passed to PyObject_GetBuffer must match a single later
> call to PyObject_ReleaseBuffer.

Agreed that Py_buffer should have been a PyObject from the start, but
the PEP chose differently.

We now have backwards compatibility constraints. What do we do with
PyMemoryView_FromBuffer? Also, there's probably some code out there that
likes to copy Py_buffers around.

> As far as the question of re-exporting the underlying view or not
> goes, I agree having "memoryview(a)" potentially refer to different
> underlying memory from "a" itself (because the source object has
> changed since the first view was exported) is a recipe for confusion.

If an object changes its buffer while it's exported somewhere, it will
always result in confusion for the user, regardless of how the
memoryview object is implemented. All normal uses of the buffer API
assume that the buffer's memory doesn't change while it's being accessed
by its consumer (what would it mean to SHA1-hash or zlib-compress a
changing piece of memory?).
So I don't know why the memoryview object *in particular* should care
about this.
msg128537 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-02-14 11:11
On Mon, Feb 14, 2011 at 8:33 AM, Antoine Pitrou <report@bugs.python.org> wrote:
>> I'm still not comfortable with a convention that relies on *clients*
>> of the PEP 3118 API not mucking with the internals of the Py_buffer
>> struct.
>
> Which clients? Those who export the buffer, or those who consume it?

Consumers. (I'll try to stick to provider/consumer terminology, as
that's clearer in this context

>> I'm *much* happier with the rule based on malloc/free semantics where
>> the *pointer* passed to PyObject_GetBuffer must match a single later
>> call to PyObject_ReleaseBuffer.
>
> Agreed that Py_buffer should have been a PyObject from the start, but
> the PEP chose differently.

malloc/free modelled semantics have *nothing* to do with Py_buffer
being a full PyObject in its own right. All I mean is that whatever
pointer you call ReleaseBuffer with should be the one you passed to
GetBuffer, and the only thing tp_releasebuffer implementations should
rely on is the address of that pointer rather than the struct
contents. However, from what Pauli has said, we may want to go with
the alternative approach of saying the struct address is irrelevant,
and only the content matter, using the "internal" field to
disambiguate different exported buffers. I believe either will work,
and either places additional constraints on buffer API consumers that
aren't currently clearly documented.

> We now have backwards compatibility constraints. What do we do with
> PyMemoryView_FromBuffer? Also, there's probably some code out there that
> likes to copy Py_buffers around.

Such code is likely to be broken regardless of how we clarify the
semantics, in the same way that our own dup_buffer is currently broken
under either set of semantics (i.e. calling ReleaseBuffer with a
different address in one case, clobbering the "internal" field in
other cases). We will probably need to expose an official Py_buffer
copying function that gets all the subtle details right so that
extension authors can more easily avoid making the same mistakes.

>> As far as the question of re-exporting the underlying view or not
>> goes, I agree having "memoryview(a)" potentially refer to different
>> underlying memory from "a" itself (because the source object has
>> changed since the first view was exported) is a recipe for confusion.
>
> If an object changes its buffer while it's exported somewhere, it will
> always result in confusion for the user, regardless of how the
> memoryview object is implemented. All normal uses of the buffer API
> assume that the buffer's memory doesn't change while it's being accessed
> by its consumer (what would it mean to SHA1-hash or zlib-compress a
> changing piece of memory?).
> So I don't know why the memoryview object *in particular* should care
> about this.

I'm not talking about an exported view changing its details (that's
explicitly disallowed by the PEP), I'm talking about the fact that
sequential calls to PyObject_GetBuffer are permitted to return
different answers. That's the point Pauli's PictureSet example
illustrated - even though the toy example uses a somewhat clumsy API,
it's perfectly legitimate according to the documentation, and it shows
that the current memoryview implementation may behave strangely when
you copy or slice a view of a mutable object, even though the view
itself is guaranteed to remain valid.

Consider the following:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: Existing exports of data: object cannot be re-sized

Now suppose that instead of disallowing the resize, bytearray (or a
similar object) permitted it by allocating a new memory buffer, while
keeping a reference to the old buffer around until the memoryview
releases it (an approach that is perfectly legitimate according to the
PEP). In that case, our current "use the source object" approach to
memoryview copying and slicing will backfire badly, since copies and
slices will be working off the *new* (empty) state of the object,
while the original memoryview will still be looking at the old
populated state. I think Pauli's right, we need to make memoryview
re-exporting significantly smarter in order to cope correctly with
mutable objects.
msg128538 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-02-14 11:29
Another idea we may want to revisit is the PyManagedBuffer concept, which would be a true PyObject that existed solely to simplify sharing of Py_buffer structs.

If memoryview used such an object internally, then copying and slicing would be quite simple - just INCREF the managed buffer instance without letting the original source object know anything was going on.
msg128539 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-02-14 11:41
OK, to summarise that revisit into a coherent proposal:

1. Add "PyManagedBuffer" as an even lower level wrapper around Py_buffer. The only thing this would support is doing GetBuffer on construction and ReleaseBuffer when destroyed (or when explicitly asked to do so), as well as access to the information in the Py_buffer struct.

2. Adjust PyMemoryView to use that object to access the source object that implements the PEP 3118 API.

3. When copying or slicing memoryview objects, the new PyMemoryView instance acquires a reference to the existing PyManagedBuffer instance instead of creating a new one.

4. If memoryview.release() attempts to explicitly release the buffer, but there are additional references to the PyManagedBuffer object, the request will fail with a BufferError.

5. dup_buffer is deleted (with extreme prejudice)

6. 3rd party code is encouraged in the C API documentation to never ever copy Py_buffer structs around and to use PyManagedBuffer instead.
msg128540 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-02-14 12:02
On further reflection, I realised point 4 in that suggestion isn't reliable in a formal sense. Consider:

with memoryview(a) as m:
    m[:]

The slice won't reference the buffer again, but isn't guaranteed by the language definition to have been garbage collected at the time m.release() is called by the with statement.

Altering release() to simply decrement the reference count of the managed buffer would defeat the whole point of having that method, so it may be necessary to allow early release with outstanding references and then include a "still alive" check in the code that allows access to the buffer details (similar to the way weak references work).
msg128542 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-14 13:03
> >> I'm *much* happier with the rule based on malloc/free semantics where
> >> the *pointer* passed to PyObject_GetBuffer must match a single later
> >> call to PyObject_ReleaseBuffer.
> >
> > Agreed that Py_buffer should have been a PyObject from the start, but
> > the PEP chose differently.
> 
> malloc/free modelled semantics have *nothing* to do with Py_buffer
> being a full PyObject in its own right.

Well, the moment you're talking about having a struct of heterogeneous
fields whose address should remain identical, why not use the one
internal abstraction that is designed exactly for that, aka PyObject.
Sure you can devise an other ownership model, but what's the point?
I'm talking about avoiding NIH and an additional maintenance burden due
to avoidable cruft.

> > We now have backwards compatibility constraints. What do we do with
> > PyMemoryView_FromBuffer? Also, there's probably some code out there that
> > likes to copy Py_buffers around.
> 
> Such code is likely to be broken regardless of how we clarify the
> semantics,

Well, PyMemoryView_FromBuffer is used in the stdlib right now (in the IO
lib).

> I think Pauli's right, we need to make memoryview
> re-exporting significantly smarter in order to cope correctly with
> mutable objects.

Ok.

> 1. Add "PyManagedBuffer" as an even lower level wrapper around
> Py_buffer. The only thing this would support is doing GetBuffer on
> construction and ReleaseBuffer when destroyed (or when explicitly
> asked to do so), as well as access to the information in the Py_buffer
> struct.
> 
> 2. Adjust PyMemoryView to use that object to access the source object
> that implements the PEP 3118 API.

Ouch. Why not fold all that directly into memoryview? A third
abstraction for support of what is supposed to be a simple API sounds 
too much.
(see, if Py_buffer was a PyObject, we'd have only one abstraction)
msg128544 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-02-14 13:49
The managed buffer needs to be a separate object so that multiple memoryview objects can share it. Essentially, we would end up with two APIs - the raw Py_buffer API and the higher level PyManagedBuffer one.

C extensions would have the choice of using the low level API (most likely the case for existing extensions) or making life easier for themselves by using the higher level one. It is, at least from the consumer side, almost exactly what you suggested in the first place: a PEP 3118 style API where the main interface object is a true PyObject instance. Providers will still need to deal with correctly populating the passed in Py_buffer instances, but that can be handled easily enough by storing exported managed buffers in a Python list and storing indices in the Py_buffer internal field.

PyMemoryView itself isn't appropriate as that higher level API, since it does a lot more than just provide an easy way to share a Py_buffer struct (specifically, thanks to slicing, a memoryview instance may not expose the entire underlying buffer). Managed buffers would be much simpler, simply providing read-only access to the fields of the managed Py_buffer struct.

Depending on how we end up defining the required consumer semantics, PyMemoryView_FromBuffer may or may not be fully salvageable. If we find dup_buffer semantics that are universally correct, then we may be able to keep it in its current form by building those into an alternate ManagedBuffer constructor, otherwise we may need to restrict it to cases where the obj field is NULL.
msg128546 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-02-14 14:03
If it helps, one way to think of it is that post-redesign, the PyManagedBuffer would be the "real" API object, with Py_buffer merely a way for data exporters to provide the information needed to initialise the managed buffer properly. (That actually fairly accurately tracks the origin of the Py_buffer struct - to avoid huge numbers of output variables in the GetBuffer API calls)

Is such a solution ideal? No, but it would be significantly better than what we have now. If the PEP had been implemented as written, we would also have been much better off, but given that we failed to pick up the discrepancies between spec and implementation at the time, we're now stuck with trying to improve the situation without breaking backwards compatibility.

Since I now agree with your diagnosis that the heart of the problem is the fact that Py_buffer contains lots of pointers and thus is difficult to copy safely, wrapping it in a PyObject (without adding any additional behaviour) in order to minimise copying seems like the most obvious solution.
msg128551 - (view) Author: Pauli Virtanen (pv) Date: 2011-02-14 15:13
Nick's plan of action above seems mostly OK to me.

Though, it might be simpler to collapse the PyManagedBuffer object to PyMemoryView, even if this requires memoryviews to serve two purposes.

[Nick:]
> I'm still not comfortable with a convention that relies on *clients*
> of the PEP 3118 API not mucking with the internals of the Py_buffer
> struct.

Some points against: (i) Having to look up keys by memory address from an additional PyDict is more work for the exporter than just passing around some PyMem_Malloc'd information in `internal`. (ii) There is already an "obj" field in the structure that the consumers are supposed to not mess with. (iii) The exporter cannot use the `internal` field for anything if bf_releasebuffer cannot rely on it being intact.

If the recommended consumer API is changed so that Py_buffer mainly sits inside a PyObject, it becomes more clear that Py_buffer is read-only for the consumer (-- which is what I believe the PEP intended, but did not write down).

[Nick:]
> Altering release() to simply decrement the reference count of the 
> managed buffer would defeat the whole point of having that method, so
> it may be necessary to allow early release with outstanding references
> and then include a "still alive" check in the code that allows access
> to the buffer details (similar to the way weak references work).

Early release does not seem possible if the buffer does not come from the original object:

    lst = []
    with memoryview(a) as m:
        b = numpy.array(m)
        lst.append(b)

Now, m.__exit__ cannot release the buffer, since `b` holds a buffer-interface lock to `m`. `b` is 3rd party code, and does not know anything about MemoryViews.

Some alternatives: (i) require that bf_getbuffer always gives a new lock on all exported buffers, if there are multiple, (ii) allow memoryview.__exit__ to fail silently, (iii) drop the context management.

I guess (i) would be a sane choice -- I don't see many use cases for the same object exporting multiple different buffers. It's not needed by Numpy, and I suspect there is no existing 3rd party code that relies on this (because it doesn't work with the current implementation of memoryview :)
msg128878 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-02-19 19:14
Sorry, I'm unassigning myself from this;  I'm still following the issue, but just don't forsee having time to work on it at all.
msg139162 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-06-26 10:30
[Nick]
> Another idea we may want to revisit is the PyManagedBuffer concept,
> which would be a true PyObject that existed solely to simplify sharing
> of Py_buffer structs.

> If memoryview used such an object internally, then copying and slicing
> would be quite simple - just INCREF the managed buffer instance without
> letting the original source object know anything was going on.


I think this is the nicest solution since memoryview would then always 
have a proper base object. Do I understand correctly that PyManagedBuffer
should only handle 1-dimensional objects?


There is an additional point about slicing and sub-views:

I think slicing (esp. multidimensional slicing) would be greatly simplified
if we added a requirement for the *exporting* object to provide a sliced
view. (The same applies to sub-views, also see source comments below [1]).

For example, an exporting object could provide a sliced view by adding 
a getslicedbufferproc to PyBufferProcs:

int PyObject_GetSlicedBuffer(PyObject *obj, Py_buffer *view, int flags, PyObject *key);


By "sliced view" I mean that the exporting object changes buf, shape and 
strides. There are several advantages:
 
  o The invariant that all allocated memory in the buffer belongs
    to the exporting object remains intact.

  o The responsibility for creating correct multidimensional sliced views
    is shifted to the implementor of the exporting object.





[1] memoryobject.c: 
 
      /* XXX There should be an API to create a subbuffer */
      /* XXX:  This needs to be fixed so it actually returns a sub-view */
msg139172 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-06-26 12:31
The idea of PyManagedBuffer is for it to be an almost completely passive object that *just* acts as a refcounted wrapper around the Py_buffer structure, so it doesn't care about the actual contents. The only supplemental functionality I think it should provide is to disallow explicitly releasing the buffer while the reference count is greater than 1. I'm OK with my example cited above being unreliable. The correct way to write such code would then be:

  with memoryview(obj) as m:
    with m[:] as m2:
      ...

I think separating the concerns this way, letting PyManagedBuffer worry about the lifecycle issues of the underlying buffer reference, while PyMemoryView deals with the *interpretation* of the buffer description (such as by providing useful slicing functionality) will make the whole arrangement easier to handle. When a memoryview is sliced, it would create a new memoryview that has a reference to the same PyManagedBuffer object, but different internal state that affects how that buffer is accessed. This is better than requiring that every implementor of the buffer API worry about the slicing logic - we can do it right in memoryview and then implementers of producer objects don't have to worry about it.

Currently, however, memoryview gets tied up in knots since it is trying to do everything itself in a way that makes it unclear what is going on. The semantics of copying the Py_buffer struct or of accessing the PEP 3118 API on the underlying object when slicing or copying views are demonstrably broken. If we try to shoehorn reference counting semantics into the current object model, we would end up with two distinct modes of operation for memoryview:

  Direct: the view is directly accessing an underlying object via the PEP 3118 API
  Indirect: the view has a reference to another memoryview object that it is using as a data source

That's complicated - hard to implement in the first place and hard to follow when reading the code. Adding the PyManagedBuffer object makes the object model more complex, but simplifies the runtime semantics: every memoryview instance will access a PyManagedBuffer object which takes care of the underlying PEP 3118 details. Direct use of the PEP 3118 consumer API in 3rd party code will also be strongly discouraged, with PyManagedBuffer promoted as the preferred alternative (producers, of course, will still need to provide the raw Py_buffer data that PyManagedBuffer exposes).

At the Python level, I don't think it is necessary to expose a new object, so we can stick with Antoine's preferred model where memoryview is the only public API. My proposed new PyManagedBuffer object would just be about making life easier at the C level.
msg139173 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-06-26 12:35
I should also note that if memoryview ends up growing enough state to cope correctly with numpy-style multi-dimensional slicing, I'm actually OK with that. You could get a long way just using the array module to allocate a large chunk of memory and then a suitably enhanced memoryview to manipulate it as a multidimensional array.

That's a future concern, though - for now, the key task is to migrate to reliable, reference based semantics for Py_buffer management.
msg139256 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-06-27 10:32
Nick, you know a lot about this issue and I'm probably missing many things
here. I misunderstood your concept of PyManagedBuffer, so my previous
posting might have been hard to understand.

I'd appreciate if you (or anyone in this thread) could comment if the
following would work *in theory*, even if you are against an additional
getslicedbufferproc:


As I understand, there are two major issues that complicate the code:

  1) the copying in PyMemoryView_FromBuffer()
  2) slicing


To address 1), I wanted to create new memoryview objects exclusively
from proper base objects that implement the buffer protocol. So the
plan was to create small wrapper object inside PyMemoryView_FromBuffer()
that handles enough of the buffer protocol to be usable inside the stdlib
for one-dimensional objects. The actual memoryview would then be created
by calling PyMemoryView_FromObject() on that wrapper.

[PyMemoryView_FromObject() would then obviously not call PyMemoryView_FromBuffer(),
but would create the view directly.]


To address 2), buffers would *always* have to be filled in by the original
exporting object, hence the proposal to add a getslicedbufferproc.

Then memoryview would always have a proper base object and could always call
getbuffer()/INCREF(base) and releasebuffer()/DECREF(base). I thought this would
make the code much cleaner.


> Direct: the view is directly accessing an underlying object via the PEP 3118 API
> Indirect: the view has a reference to another memoryview object that it is using
>           as a data source

Is there still a difference if only the original base object manages buffers
and they are never copied?


> This is better than requiring that every implementor of the buffer API
> worry about the slicing logic - we can do it right in memoryview and then
> implementers of producer objects don't have to worry about it.

I'm not sure, but my reasoning was that e.g. in numpy the slicing logic
is already in place. Then again, I don't know if it is a legitimate use
of buf, shapes and strides to implement slicing.

According to this mail, slicing information was supposed to be
part of the memoryview struct:

http://mail.python.org/pipermail/python-dev/2007-April/072584.html
msg139257 - (view) Author: Pauli Virtanen (pv) Date: 2011-06-27 11:28
skrah writes:
> I think slicing (esp. multidimensional slicing) would be greatly
> simplified if we added a requirement for the *exporting* object
> to provide a sliced view. (The same applies to sub-views, also
> see source comments below [1]).
>
> For example, an exporting object could provide a sliced view by adding 
> a getslicedbufferproc to PyBufferProcs:
>
> int PyObject_GetSlicedBuffer(PyObject *obj, Py_buffer *view, 
>                              int flags, PyObject *key);

The same thing can be done via

    PyObject_GetBuffer(obj, &view, flags);
    PyBuffer_Slice(&view, &sliced_view, flags, key);

given an implementation of PyBuffer_Slice. The logic in PyBuffer_Slice does not depend on where the buffer comes from, and every buffer can be sliced.

As far as I see, the advantage of `getslicedbufferproc` would be to make the implementation of PyMemoryView simpler, but not much else. In my view, having each exporter implement the same logic by itself would only be an unnecessary burden.

>  o The invariant that all allocated memory in the buffer belongs
>    to the exporting object remains intact.

Numpy arrays do not have this invariant, and they happily re-export memory owned by someone else. This is one root of problems here: the PEP implicitly assumes that re-exporting buffers (e.g. memoryview's implementation of `getbuffer`) is done in the way Numpy does it. Because of this, there is no mechanism for "incrementing the refcount" of an existing buffer export. Maintaining the above invariant then unavoidably leads to strange behavior in corner cases (which probably are very rare, as mentioned above), and as happened here, make the implementation messy and lead to bugs.

The invariant *is* required for guaranteeing that `memoryview.release()` always succeeds. Such a method probably wasn't foreseen in the PEP (and I did not remember that it existed in my first patch), as Numpy arrays don't have any equivalent. The alternatives here are (i) do as Numpy does and give up the invariant and allow `.release()` to fail in some cases, or (ii) document the corner cases in the interface spec and try to detect them and fail if they occur. Which of these is chosen probably does not matter much in practice, but having PyManagedBuffer will make implementing either choice easier.
msg139266 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-06-27 13:17
I'll try to do a summary of the conversation so far, since it's quite long and hard to follow.

The basic issue is that memoryview needs to support copying and slicing that creates a new memoryview object. The major problem with that is that the PEP 3118 semantics as implemented operate in such a way that neither copying the Py_buffer struct *nor* requesting a new copy of the struct from the underlying object will do the right thing in all cases. (According to the PEP *as written* copying probably should have been OK, but the implementation doesn't match the PEP in several important respects such that copying is definitely wrong in the absence of tight control of the lifecycles of copies relative to the original).

Therefore, we either need to redesign the buffer export from memoryview to use daisy chaining (such that in "m = memoryview(obj); m2 = m[:]; m3 = m2[:]" m3 references m2 which references m which in turn references obj) or else we need to introduce an internal reference counted object (PyManagedBuffer) which allows a single view of an underlying object to be safely shared amongst multiple clients (such that m, m2 and m3 would all reference the same managed buffer instance which holds the reference to obj). My preference is strongly for the latter approach as it prevents unbounded and wasteful daisy chaining while also providing a clean, easy to use interface that will make it easier for 3rd parties to write PEP 3118 API consumers (by using PyManagedBuffer instead of the raw Py_buffer struct).

Once that basic lifecycle problem for the underlying buffers is dealt with then we can start worrying about other problems like exporting Py_buffer objects from memoryview instances correctly. The lifecycle problem is unrelated to the details of the buffer *contents* though - it's entirely about the fact that clients can't safely copy all those pointers (as some may refer to addresses inside the struct) and asking the original object for a fresh copy is permitted to return a different answer each time.

The actual *slicing* code in memoryview isn't too bad - it just needs to use dedicated storage rather than messing with the contents of the Py_buffer struct it received from the underlying object. Probably the easiest way to handle that is by having the PyManagedBuffer reference be in *addition* to the current Py_buffer struct in the internal state - then the latter can be used to record the effects of the slicing, if any. Because we know the original Py_buffer struct is guaranteed to remain alive and unmodified, we don't need to worry about fiddling with any copied pointers - we can just leave them pointing into the original structure.

When accessed via the PEP 3118 API, memoryview objects would then export that modified Py_buffer struct rather than the original one (so daisychaining would be possible, but we wouldn't make it easy to do from pure Python code, as both the memoryview constructor and slicing would give each new memoryview object a reference to the original managed buffer and just update the internal view details as appropriate.

Here's the current MemoryView definition:

typedef struct {
    PyObject_HEAD
    Py_buffer view;
} PyMemoryViewObject;

The TL;DR version of the above is that I would like to see it become:

typedef struct {
    PyObject_HEAD
    PyManagedBuffer source_data; // shared read-only Py_buffer access
    Py_buffer view;  // shape, strides, etc potentially modified
} PyMemoryViewObject;

Once the internal Py_buffer had been initialised, the memoryview code actually wouldn't *use* the source data reference all that much (aside from eventually releasing the buffer, it wouldn't use it at all). Instead, that reference would be retained solely to control the lifecycle of the original Py_buffer object relative to the modified copies in the various memoryview instances.

Does all that make my perspective any clearer?
msg139314 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-06-27 16:47
Pauli Virtanen <report@bugs.python.org> wrote:
> skrah writes:
> > For example, an exporting object could provide a sliced view by adding 
> > a getslicedbufferproc to PyBufferProcs:
> >
> > int PyObject_GetSlicedBuffer(PyObject *obj, Py_buffer *view, 
> >                              int flags, PyObject *key);
> 
> The same thing can be done via
> 
>     PyObject_GetBuffer(obj, &view, flags);
>     PyBuffer_Slice(&view, &sliced_view, flags, key);
> 
> given an implementation of PyBuffer_Slice. The logic in PyBuffer_Slice does
> not depend on where the buffer comes from, and every buffer can be sliced.

Ok, that sounds good. I came across a comment that base objects can change
their memory layout:

http://mail.python.org/pipermail/python-dev/2007-April/072606.html

Is that something that must be taken care of?

> >  o The invariant that all allocated memory in the buffer belongs
> >    to the exporting object remains intact.
> 
> Numpy arrays do not have this invariant, and they happily re-export memory
> owned by someone else.

I'm not sure if we use the same terminology. By "exporting object" I meant
the original base object and this is the invariant I wanted:

   m1 = memoryview(base) # directly from base
   m2 = memoryview(m1)   # redirects getbuffer/releasebuffer to base
   m3 = memoryview(m2)   # redirects getbuffer/releasebuffer to base
   s1 = m3[1::2, 1::2]   # redirects getslicedbuffer/releasebuffer to base

That's also what you mean by "re-exporting", right?
msg139316 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-06-27 17:17
Nick Coghlan <report@bugs.python.org> wrote:
[Snip liberally]
> The lifecycle problem is unrelated to the details of the buffer *contents*
  though - it's entirely about the fact that clients can't safely copy all
  those pointers (as some may refer to addresses inside the struct) and asking
  the original object for a fresh copy is permitted to return a different answer
  each time.

> The actual *slicing* code in memoryview isn't too bad

I promise that I'll keep quiet about the getslicedbufferproc from now on, since 
there isn't much enthusiasm. :)

The reason I kept mentioning it was that I thought it would eliminate the
need to copy anything at all. All buffers would come from a single, memory
owning base object.

> Does all that make my perspective any clearer?

Yes, thank you. The tricky part is to understand why always redirecting
getbuffer/releasebuffer to the underlying *original base object* is not
sufficient, but as I understood Pauli's last posting that is due to the
addition of the release() method.
msg139317 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-27 17:21
Le Mon, 27 Jun 2011 13:17:57 +0000,
Nick Coghlan <report@bugs.python.org> a écrit :
> 
> The TL;DR version of the above is that I would like to see it become:
> 
> typedef struct {
>     PyObject_HEAD
>     PyManagedBuffer source_data; // shared read-only Py_buffer access
>     Py_buffer view;  // shape, strides, etc potentially modified
> } PyMemoryViewObject;

Looks ok to me.
(well, acceptable anyway)
msg139320 - (view) Author: Lenard Lindstrom (kermode) Date: 2011-06-27 18:14
Using Python reference counting and the garbage collector to control 
when PyBuffer_Release is called has problems. First, it assumes the 
CPython interpreter will always use reference counting. Second, 
PyBuffer_Release may never get called if circular references exist. 
Third, an exception could keep a memoryview object alive in a traceback, 
delaying the PyBuffer_Release call. Finally, it does not play well with 
the memoryview __enter__, __exit__, and release methods. It makes Python 
level context management pointless; instead, just del the memoryview 
instance and hope the garbage collector cooperates. For the old buffer 
protocol and the Numpy array interface, resources have to be released in 
an object's destructor at garbage collection time. There is no other 
choice. If that also becomes the case for the new buffer protocol, then 
there is little incentive to migrate to it.
msg139323 - (view) Author: Pauli Virtanen (pv) Date: 2011-06-27 18:36
Lenard Lindstrom writes:
> Using Python reference counting and the garbage collector to control 
> when PyBuffer_Release is called has problems.

That's not what's being suggested. The refcounting discussed here is an implementation detail internal to memoryview, and does not affect the possibility to implement `release()` and context management reliably.
msg139325 - (view) Author: Lenard Lindstrom (kermode) Date: 2011-06-27 20:03
It would if the proposed PyManagedBuffer only releases the Py_buffer 
struct - calls PyBuffer_Release - when its Python reference count goes 
to zero. So a separate reference count will be maintained instead.
msg139341 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-06-28 00:58
memoryview.release() will raise an exception if you call it when the underlying PyManagedBuffer instance has a refcount > 1. So the explicit memory management still works, and if you mess it up by leaving dangling references around you'll run the risk of getting an exception (CPython will mostly cleanly things up for you due to the refcounting, but we're well used to CPython users getting away with that kind of bug and being surprised when they port to a fully garbage collected implementation like PyPy, IronPython or Jython).

It isn't pretty, but it's the most practical way to cope with the lifecycle issues. The other alternative is to obediently force release of the buffer and then throw exceptions on subsequent *access*, the way we do with files. I'm less in favour of that approach, since it results in additional runtime overhead as you have to keep checking the state validity and I think it generates the exception in the wrong place - the exception should be triggered where the implicit assertion that "all other references are gone and this object should be released right now" has been violated, not at some later random location where the object happens to be used. (I'm also swayed by the backwards compatibility argument: we can easily move from "refcount must be 1" enforcement to the laissez-faire approach used by files. You can't go in the other direction without the potential to break working code)

The reason redirecting all requests to the underlying object doesn't work is because repeated calls to getbuffer on mutable objects are allowed to return *different* answers. Assume we have a mutable array type that allows changes while memory is exported by keeping the old buffer around until all references are released. Then we want the following behaviour:

  a = mutable_byte_array(100) # 100 byte array
  m1 = memoryview(a) # View of original 100 byte array
  a.resize(0) # Array is now zero length, but old buffer is still valid
  m2 = m1[:]
  m3 = memoryview(m1)
  # Both m2 and m3 should provide access to the *original* 100 byte
  # array, not to the now empty *current* array
  # Having len(m2) and len(m3) differ from len(m1) because the
  # original source array had changed would be incredibly confusing
  # When m1, m2 and m3 all go away then the original array will be
  # released

The builtin bytearray avoids this issue by simply disallowing mutation while a buffer is exported, but memoryview needs to cope with arbitrary third party objects which may behave like the hypothetical mutable_byte_array().
msg139747 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-07-04 10:28
In order to have a basis for discussion, I've set up a repo at

    http://hg.python.org/features/pep-3118#memoryview

with an implementation of PyManagedBuffer. The whole test suite
passes, also with refleak counting and Valgrind.


Review is most welcome. If you think that this is roughly what
PyManagedBuffer is supposed to look like, I can add some tests
and documentation.


A couple of remarks:

  o I have used refcounting assumptions that appear to be valid
    for the Python codebase, but that need to be documented.

  o The docs state that if an underlying object supports writable
    views, the memoryview will be readable and writable. I use this
    in PyManagedBuffer_FromObject(). In fact there are tests that
    write to the original exporting object.

  o Releasing a view raises if more than one view is active.

  o get_shape0() is used in memory_length(). This does not seem
    correct if ndim > 1.

  o memory_getbuf() is still not a real getbufferproc, since it
    disregards almost all flags.

  o PyMemoryView_GetContiguous() needs to create a view with an
    underlying PyManagedBuffer. If the obj parameter is already
    a memoryview, the existing PyManagedBuffer has to be used,
    so I just do a check if the requested buffertype agrees with
    view->readonly.

    It would be possible to complicate the code further by having
    a PyMemoryView_FromObjectAndFlags().

  o In the non-contiguous case, PyMemoryView_GetContiguous() creates
    a new buffer, but the returned view does not use that buffer.
msg139748 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-07-04 10:42
Nick Coghlan <report@bugs.python.org> wrote:
> The reason redirecting all requests to the underlying object doesn't work
> is because repeated calls to getbuffer on mutable objects are allowed to
> return *different* answers. Assume we have a mutable array type that allows
> changes while memory is exported by keeping the old buffer around until all
> references are released. Then we want the following behaviour:
> 
> The builtin bytearray avoids this issue by simply disallowing mutation while
> a buffer is exported, but memoryview needs to cope with arbitrary third party
> objects which may behave like the hypothetical mutable_byte_array().

I find this behavior quite sane. Would it not be an option to make this
a requirement in the PEP? As far as I can see, Numpy also disallows
mutations on exported buffers.
msg139751 - (view) Author: Pauli Virtanen (pv) Date: 2011-07-04 10:59
@skrah:

Yes, Numpy exposes only a single buffer per object. Making this a requirement in the PEP would probably be a sane change, as there is probably little real-world need to allow it behave otherwise.

Comment on the patch: it seems you do not track the re-export count in memory_getbuf:

    a = memoryview(obj)
    b = numpy.asarray(a)
    a.release()
    b[0] = 123 # <-- BOOM: the buffer was already released

Could be fixed by Py_INCREF(self->mbuf) in getbuffer and DECREF in releasebuffer. In this design, the only choice is to make the `release()` call to fail. (I had some code for n-dim slicing etc. in my first patch that could be useful to have too; I'll see if I find time to dig them out here.)
msg139758 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-07-04 12:21
Pauli Virtanen <report@bugs.python.org> wrote:
> Comment on the patch: it seems you do not track the re-export count in memory_getbuf:
> 
>     a = memoryview(obj)
>     b = numpy.asarray(a)
>     a.release()
>     b[0] = 123 # <-- BOOM: the buffer was already released

Could you give the exact sequence of events (including the creation of obj)?

For me this works:

Python 3.3a0 (memoryview:bbe70ca4e0e5+, Jul  4 2011, 13:55:55) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> obj = bytearray(b'123456789')
>>> a = memoryview(obj)
>>> b = numpy.asarray(a)
>>> a.release()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: several memoryviews are active
>>> b[0] = 123
>>> b
array([123,  50,  51,  52,  53,  54,  55,  56,  57], dtype=uint8)
>>> del a
>>> b[0] = 224
>>> b
array([224,  50,  51,  52,  53,  54,  55,  56,  57], dtype=uint8)

> (I had some code for n-dim slicing etc. in my first patch that could be
>  useful to have too; I'll see if I find time to dig them out here.)

That would be nice.
msg139760 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-04 12:27
It might be worth postponing the multi-dimensional support to a second patch, though. If we can get the buffer lifecycle solid first then that provides a better foundation for any further development.
msg139762 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-04 12:30
As far as the rule of disallowing shape changes while a buffer is exported, I actually think that's a more sane approach as well. However, I've been burned enough times by going "nobody would be insane enough to rely on that, would they?" that I'm seriously reluctant to impose additional backwards incompatible rules on a published spec when it isn't *too* difficult to adjust the infrastructure to better handle the existing complexity.
msg139764 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-07-04 13:00
[I agree that multi-dimensional support should not be part of this
 patch. I was thinking about creating a separate branch for that.]

Nick Coghlan <report@bugs.python.org> wrote:
> As far as the rule of disallowing shape changes while a buffer is exported, 
> I actually think that's a more sane approach as well. However, I've been 
> burned enough times by going "nobody would be insane enough to rely on that, 
> would they?" that I'm seriously reluctant to impose additional backwards 
> incompatible rules on a published spec when it isn't *too* difficult to 
> adjust the infrastructure to better handle the existing complexity.

Yes, I understand. However, Pauli said earlier:

"I don't see many use cases for the same object exporting multiple 
 different buffers. It's not needed by Numpy, and I suspect there is 
 no existing 3rd party code that relies on this (because it doesn't 
 work with the current implementation of memoryview :)"


That's why I thought that no one could be using this feature at present.


The main problem that have with PyManagedBuffer is that the capabilities
of the original underlying buffer (flags) *at the time of the export* 
aren't stored anywhere.

This makes it hard to respond to specific queries in memory_getbuf()
and PyMemoryObject_GetContiguous(). You can't query the original
object again since it might have changed.

I suppose that an existing memoryview could recompute its own
capabilities by using PyBuffer_IsContiguous() and friends. It
would be much easier though if the original flags were stored
in the buffer by PyObject_GetBuffer().
msg139766 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-04 13:26
Nice work with the patch Stefan - I've added a few review comments (including a suggestion on how to deal with the GetContiguous problem).

One idea that review did prompt is that if we aren't going back to the original object for fresh buffer requests, perhaps we should expose the underlying object as a read-only property of the memoryview objects. That way if the original view is unsuitable (e.g. read-only when a writable buffer is needed) it should be straightforward to go back to the underlying object to request something different.

Another question Stefan raised is whether or not PyMemoryView_FromObject should accept a "flags" argument that it passes on to the underlying "getbuffer" call. I'm inclined to say yes - ideally I'd like to *only* expose PyMemoryView and PyManagedBuffer (rather than GetBuffer/ReleaseBuffer) for the limited API, which means accepting the flags argument for the two higher level interfaces.
msg139767 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-04 13:29
Is there anything stopping us just storing the flags on PyManagedBuffer? It's OK if the construction API requires the flag information in addition to the Py_buffer struct.
msg139775 - (view) Author: Pauli Virtanen (pv) Date: 2011-07-04 14:36
@skrah:

Ahh, this always happens when I don't run it :)  But my point stands -- the reason why it does not crash with Numpy is that Numpy calls PyMemoryView_FromObject to obtain a new memoryview and then uses PyMemoryView_GET_BUFFER. Along this code path the refcount of self->mbuf gets properly incremented, as it's explicitly done in PyMemoryView_FromObject. However, if you have a custom object Foo which just calls `PyObject_GetBuffer`, and then do the same sequence

    a = memoryview(whatever)
    b = Foo(a)  # --> only grabs the buffer with PyObject_GetBuffer
    a.release() # --> will invalidate the buffer
    b.mogrify_buffer_contents()  # --> boom

Here, the buffer is released too early, as the refcount of `self->mbuf` is not incremented during the `PyObject_GetBuffer` call.

> Is there anything stopping us just storing the flags on
> PyManagedBuffer?

Slicing memoryviews can invalidate the contiguity flags, and no-strides flags, so some checks are still probably needed in `memory_getbuf`.

> It's OK if the construction API requires the flag
> information in addition to the Py_buffer struct. 

This would be useful, yes.
msg139826 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-05 02:43
On Tue, Jul 5, 2011 at 12:36 AM, Pauli Virtanen <report@bugs.python.org> wrote:
> Slicing memoryviews can invalidate the contiguity flags, and no-strides flags, so some checks are still probably needed in `memory_getbuf`.

That makes sense, so just as memoryview has its own shape and strides
data, it will also need its own flags data that *starts* as a copy of
that in the original buffer, but may be subsequently modified.
msg139834 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-07-05 08:39
Nick, Pauli, thanks for all the comments. I'm busy implementing the easy changes; then it'll be easier to deal with the flags issues.


Pauli: 
 
Does numpy use the (undocumented) smalltable array in the Py_buffer
structure? We would like to drop it.
msg139846 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-07-05 10:49
I've uploaded a revised version that addresses several suggestions. I think 
we have agreement on those now:

  - Officially ditch smalltable.

  - Comment static storage fields inside PyMemoryViewObject.

  - Improve refcounting in PyMemoryView_FromBuffer()/PyMemoryView_FromObject().

  - Increment mbuf refcount in memory_getbuf().

  - Create separate sections for managedbuffer and memoryview.


Still open:

  - Update documentation.

  - Should PyManagedBuffer be private to this file? Do we need mbuf_new()?

  - Add test to _testcapimodule.c. I wrote a small test for the problematic
    case in PyMemoryView_GetContiguous(), and it indeed returns an unaltered
    view. I suggest that we leave the NotImplementedError for now and handle
    that in a separate issue.

  - Flag handling.
msg139856 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-07-05 11:44
I'm slightly confused about the implication chain in the flags. PyBUF_STRIDES 
seem to allow for discontiguous arrays, yet STRIDES -> ND -> C_CONTIGUOUS.



                                 PyBUF_FULL[_RO]
                                       |
    PyBUF_INDIRECT -------------- PyBUF_FORMAT ----------[PyBUF_WRITABLE]         
          |
    PyBUF_STRIDES (This would be used when the consumer can handle strided, discontiguous arrays ...)
          |
      PyBUF_ND  <-> PyBUF_CONTIG (why?) 
          |
  PyBUF_C_CONTIGUOUS (... but the implication chain leads us to a contiguous buffer)
msg139859 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-05 11:49
> I'm slightly confused about the implication chain in the flags. PyBUF_STRIDES 
> seem to allow for discontiguous arrays, yet STRIDES -> ND -> C_CONTIGUOUS.

To be honest I have never understood anything about these flags, and I
doubt anyone without a numpy background would.
msg139861 - (view) Author: Pauli Virtanen (pv) Date: 2011-07-05 12:11
The flags don't seem to be meant to describe the properties of the buffer, only what the exporter is required to fill in. STRIDES does not imply necessarily discontinuous, only that the `strides` field is present. The C_/F_/ANY_CONTIGUOUS flags imply that the memory layout of an n-dim array is C/Fortran/either contiguous. Why these flags imply STRIDES is probably to make the result unambiguous, and because typically when dealing with n-d arrays you usually need to know the strides anyway. `NULL` `strides` implies C-contiguous, so the CONTIG flag does not imply STRIDES (no idea why it's different from PyBUF_C_CONTIGUOUS).
msg139865 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-05 12:47
It took me a bit of thinking, but I figured out why the "contiguous" flags imply STRIDES. A quick recap of all the flags:

WRITABLE -> error if can't support write access

FORMAT -> request format info in Py_buffer struct. Should never error, but report unsigned bytes if not requested

ND -> requests shape info in Py_buffer struct. Report 1 dimensional if not requested. Error if data is not C contiguous (as STRIDES is required to handle any non-C contiguous cases).

STRIDES -> requests shape and stride info. Error if correct buffer access requires stride support and this flag is not passed.

C_CONTIGUOUS/F_CONTIGUOUS/ANY_CONTIGUOUS -> variants that also request shape and stride info but are limited to handling C contiguous memory, Fortran contiguous memory or either.

INDIRECT -> requests shape and suboffset info. Error if correct buffer access requires suboffset support and this flag is not passed.

So, to address the specific confusion, the basic "STRIDES" request just says "give me the strides info" and I can deal with whatever you give me. The "CONTIGUOUS" variants say "give me the strides info, but I can only cope with certain layouts, so error if you can't provide them". "ND" is a way to say "I can copy with multiple dimensions, but only the C version without using strides info"

Suppose we have a 3x4 array of unsigned bytes (i.e. 12 bytes of data). In C format, the strides info would be [4, 1] (buf[0][0] and buf[0][1] are adjacent in memory, while buf[0][0] and buf[1][0] are 4 bytes apart). In FORTRAN format that layout is different, so the strides info would be [1, 3] (and now buf[0][0] and buf[1][0] are adjacent while buf[0][0] and buf[0][1] are 3 bytes apart).

The difference between ND and C_CONTIGUOUS is that the latter asks for both the shape and strides fields in the Py_buffer object to be populated while the former only requests shape information.
msg139866 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-05 12:49
At least, that's the explanation based on the PEP - not sure where "CONTIG" as an alias for ND (N-dimensional) comes from. But then, smalltable was an undocumented novelty, too :)
msg139868 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-05 12:59
To address the "should PyManagedBuffer be public?" question: yes, I think so.

Given the amount of grief the raw PEP 3118 API has caused the memoryview implementation, I expect the easier lifecycle management provided by the PyObject based API may also help 3rd parties.
msg139870 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-05 13:13
Regarding the Reitveld cc field: I tend not to add anyone to that and instead post comments to the tracker item to say that I've finished a review in Reitveld. If people want to see details they can go look at the review itself (or remove themselves from the bug nosy list if they have genuinely lost interest).
msg139871 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-05 13:42
> Regarding the Reitveld cc field: I tend not to add anyone to that and
> instead post comments to the tracker item to say that I've finished a
> review in Reitveld. If people want to see details they can go look at
> the review itself (or remove themselves from the bug nosy list if they
> have genuinely lost interest).

Be aware the Rietveld integration is buggy: for example, I got no
notification of the current reviews. So it's better to post a message
mentioning the review anyway.
msg139872 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-05 14:02
I don't think that's a bug, it's a missing feature in the integration (there's a request on the metatracker to add automatic notifications of new reviews on the bug itself).

I did mention the review above but it would have been easy to miss amongst the other comments.
msg139873 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-05 14:09
> I don't think that's a bug, it's a missing feature in the integration
> (there's a request on the metatracker to add automatic notifications
> of new reviews on the bug itself).

It is a bug, actually. People on the nosy list are also on the Rietveld
cc list, but in the wrong form. See
http://psf.upfronthosting.co.za/roundup/meta/issue382
msg139874 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-05 14:09
(and so, for the record, I've added my own small review :))
msg139875 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-05 14:37
Moving this discussion out of the review comments:

Antoine is wanting to make release() nondeterministic by having the underlying buffer only released when all views using it either have release() called or are no longer referenced.

I contend that release() needs to mean "release the underlying memory *right now*" or it is completely pointless. The "I don't want to care about lifecycle issues" approach is already handled quite adequately by the ordinary refcounting semantics.

If ensuring that all references have been eliminated before release() is called is too much work for a user then the answer is simple: don't call release() and let the refcounting do the work.
msg139877 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-05 14:55
> Antoine is wanting to make release() nondeterministic by having the
> underlying buffer only released when all views using it either have
> release() called or are no longer referenced.

That's not "nondeterministic" if everyone calls release(). Less so than
garbage collection anyway.

> I contend that release() needs to mean "release the underlying memory
> *right now*" or it is completely pointless. The "I don't want to care
> about lifecycle issues" approach is already handled quite adequately
> by the ordinary refcounting semantics.

Well, if you assume refcounting and no reference cycles, then release()
is AFAICT already useless. See issue9757 for the argument we had with
Guido ;)

My issue is that until now sliced memoryviews are independent objects
and are not affected by the releasing of the original memoryview. With
this patch, they are, and that's why I'm advocating for a subtler
approach (which would really mirror the current slicing semantics, and
wouldn't break compatibility ;)).

release() is supposed to mean "you can dispose of this memoryview", not
"you can dispose of any underlying memory area, even if there's some
sharing that I as an user don't know anything about" (*). By making
release() affect "related" memoryviews we are exposing an internal
implementation detail (the PyManagedBuffer sharing) as part of the API.

(*) for something similar, if you close() a file-like object obtained
through socket.makefile(), it doesn't close the underlying fd until all
other file-like objects are closed too
msg139878 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-07-05 15:33
Antoine Pitrou <report@bugs.python.org> wrote:
> My issue is that until now sliced memoryviews are independent objects
> and are not affected by the releasing of the original memoryview. With
> this patch, they are, and that's why I'm advocating for a subtler
> approach (which would really mirror the current slicing semantics, and
> wouldn't break compatibility ;)).

I wrote a comment on rietveld (which failed to get mailed again). My plan
is to make the sliced views more independent by copying shape, strides,
and suboffsets unconditionally on construction.

Then it should always be possible to delete views independently.

With respect to releasing, the views are of course still dependent.

> release() is supposed to mean "you can dispose of this memoryview", not
> "you can dispose of any underlying memory area, even if there's some
> sharing that I as an user don't know anything about" (*). By making
> release() affect "related" memoryviews we are exposing an internal
> implementation detail (the PyManagedBuffer sharing) as part of the API.

I thought the rationale for the release() method was to allow sequences like:

b = bytearray()
m1 = memoryview(b)
m1.release() -> must call releasebuffer instantly.
b.resize(10) -> this might fail otherwise if the garbage collection is too slow.

So I think releasebuffer must be called on the original base object,
and only the ManagedBuffer can do that.
msg139883 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-05 15:46
> I thought the rationale for the release() method was to allow sequences like:
> 
> b = bytearray()
> m1 = memoryview(b)
> m1.release() -> must call releasebuffer instantly.
> b.resize(10) -> this might fail otherwise if the garbage collection is too slow.

Well, that would still work with my proposal.
Now consider:

def some_library_function(byteslike):
    with memoryview(byteslike) as m2:
        # do something with m2

with memoryview(some_object) as m1:
    some_library_function(m1)
    ...
    print(m1[0])

That m1 becomes unusable after m2 is released in the library function is
completely counter-intuitive, and will make memoryviews a pain to use in
real life.
msg139919 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-06 04:07
If someone is calling release() on all of their views (including slices) than they won't have any problems. The only way they can get into trouble is if they have a slice or copy that they *aren't* explicitly releasing, and in that case they *already* have a problem and we're just picking it up sooner (although, in the case of CPython, refcounting will likely take care of it, just as it does for similar problems with files).

This is why I suggested that we should start exposing the source object at the Python level as an attribute of memoryview objects. Then people can decide for themselves if they want a "view-of-a-view" by slicing/copying the memoryview directly or a new, independent view by going back to the original object.
msg139920 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-06 04:16
Oops, Antoine's right, the release() semantics in the patch are broken, albeit for the precisely opposite reasons: that example will actually blow up with BufferError inside some_library_function().

I withdraw my objection - Antoine's right that release() on a memoryview object has to just mean "drop the reference to the ManagedBuffer instance". Calling release() when there are actual *exported* buffers outstanding should still trigger BufferError though (slices and copies don't count in that case, as they'll have their own independent references to the ManagedBuffer instance).
msg139921 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-06 04:21
Sorry, I mischaracterised Antoine's suggestion in my last comment. It's more in the nature of ManagedBuffer exposing two APIs:

1. request_release(): tells ManagedBuffer "if I have the last reference, release the buffer now".

2. release(): immediately releases the buffer, or triggers SystemError if there are additional references to the buffer. 

memoryview.release() would call the first method before dropping the reference to the buffer.
msg139932 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-07-06 12:39
Antoine is right, this needs to be fixed. I think that for *practical*
purposes, the existing release() method already behaves like a tryrelease()
method:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: Existing exports of data: object cannot be re-sized
>>>

So while m1.release() in fact *does* release a buffer, the desired
effect (freeing up 'b' for subsequent operations) only happens
after also calling m2.release(). This applies to Python and NumPy
objects.

With this amount of complexity, it might be a good idea to separate
refcounting and exports, like so:

typedef struct {
    PyObject_HEAD
    int released;
    Py_ssize_t exports; /* total number of exports */
    Py_buffer master;
} PyManagedBufferObject;

typedef struct {
    PyObject_HEAD
    PyManagedBufferObject *mbuf;
    Py_ssize_t shape[Py_MEMORYVIEW_MAXSTATIC];
    Py_ssize_t strides[Py_MEMORYVIEW_MAXSTATIC];
    Py_ssize_t suboffsets[Py_MEMORYVIEW_MAXSTATIC];
    int released; /* for clarity in the code */
    Py_ssize_t exports; /* buffer exports */
    Py_buffer view;
} PyMemoryViewObject;

Then it is possible for a MemoryView to call mbuf_release() from
memory_release() if self->exports == 0 and --self->mbuf->exports == 0.

So, if I understand Antoine correctly, in following graph m1, m2 and
m4 could mark themselves as 'released' and decrement self->mbuf->exports.
This could happen in any order.

What should happen with m2.release()? Raising would be odd here as well,
since Pauli stated earlier that some people might actually construct
new objects from an exported buffer.

m2 could mark itself as 'release_pending', and henceforth only accept
releasebuffer requests from b1, b2 and x1. Once those are all in, it
releases itself properly and sends the message to mbuf.

'release_pending' could be expressed as (self->released && self->exports > 0).

                        ManagedBuffer (mbuf)
                             |
             -----------------------------------
             |                                 |
             m1 (private arrays)              m3 (private arrays)
             |                                 |
             m2                               m4 (private arrays)
             |
     -----------------
     |       |       |
     |      b1      b2 (regular temporary buffer exports)
     |
     x1 (new object constructed from a buffer export)

Antoine, was this roughly your suggestion?
msg139933 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-07-06 12:42
[The first part of the message again, this time via the web interface.]

Antoine is right, this needs to be fixed. I think that for *practical*
purposes, the existing release() method already behaves like a tryrelease()
method:


>>> b = bytearray(b'123456789')
>>> m1 = memoryview(b)
>>> m2 = memoryview(m1)
>>> m1.release()
>>> b.append(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: Existing exports of data: object cannot be re-sized
>>>


So while m1.release() in fact *does* release a buffer, the desired
effect (freeing up 'b' for subsequent operations) only happens
after also calling m2.release(). This applies to Python and NumPy
objects.
msg139935 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-06 13:01
Le mercredi 06 juillet 2011 à 12:39 +0000, Stefan Krah a écrit :
> Antoine, was this roughly your suggestion?

I think so (!), but I also agree with Nick that raising BufferError when
calling release() on a memoryview with exports is a reasonable
alternative.
msg139936 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-07-06 13:16
Yeah, the reason my originally proposed semantics were wrong is because copying (or slicing) a memoryview object and then explicitly releasing that object would always fail through no fault of that code. That would be broken and the only way to fix it is to allow the release() call but not actually call the ReleaseBuffer API until all the buffer references are gone.

Exports are different - whenever you copy or slice a memoryview, you get a new memoryview object with the export count set to zero, so it is reasonable to require that anyone that wants to explicitly release the memoryview's buffer reference make sure that there aren't any exports hanging around. Keep in mind that memoryview copies and slices don't increase the export count, as those will refer directly back to the original managed buffer.

+1 on tracking buffer exports explicitly rather than relying solely on playing games with refcounts though - while technically redundant in the ManagedBuffer case, I agree it will make the code far more self-explanatory.
msg141858 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-08-10 11:46
I thought it might be productive to switch to documentation/test driven
development for PEP-3118 in general. So I updated the documentation,
trying to spell out the responsibilities of both exporter and consumer
as clearly as possible.

In order to have a PEP-3118 reference implementation, I wrote
Modules/_testbuffer.c and Lib/test/test_buffer.py. The test module
contains an ndarray object (independent from NumPy's ndarray) with
these features:

  o Full base object capabilities, including responding to flag
    specific requests.

  o Full re-exporter capability: The object obtains a buffer from
    another exporter and poses as a base object.

  o Optional capability to change layout while buffers are exported.

  o Full support for arbitrary format strings using the struct
    module.

  o Fortran style arrays.

  o Arbitrary multidimensional structures, including offsets and
    negative strides.

  o Support for converting arrays to suboffset representations.

  o Support for multidimensional indexing, slicing and tolist().

  o Optional support for testing against NumPy.


In memoryobject.c I only fixed the buffer release issues that came up.
Before proceeding with allocating private arrays for each memoryview,
it would be great to have agreement on the revised documentation and a
couple of other issues.

Documentation
-------------

I'll highlight some changes here:

  1) view->obj: Must be a new reference to the provider if the buffer
     is obtained via getbuffer(), otherwise NULL. In case of failure
     the field MUST be set to NULL (was: undefined).

     So, logically this should be seen the same way as returning
     a new reference from a standard Python function.

  2) view->buf: This can (and _does_ for Numpy arrays) point to
     any location in the underlying memory block. If a consumer
     doesn't request one of the simple or contiguous buffers,
     all bets are off.

  3) view->len: Make it clear that the PEP defines it as
     product(shape) * itemsize.

  4) Ownership for shape, strides, internal: read-only for the
     consumer.

  5) Flag explanations: The new section puts emphasis on which fields
     a consumer can expect in response to a buffer request, starting
     with the fields that will always be set.

  6) Rule for writable/read-only requests. Only raise if the
     buffer is read-only and the request is 'writable'. This
     seems to be the most practical solution.

  7) Add NumPy-style and PIL-style sections to make clear what
     a consumer must be able to handle if complicated structures
     are requested (thanks Pauli Virtanen for clarifying what
     valid NumPy arrays may look like).


What I would like to spell out clearly:

  8) A PEP-3118 compliant provider MUST always be able to respond
     to a PyBUF_FULL_RO request (i.e. fill in shape AND strides).
     ctypes doesn't currently do that, but could be fixed easily.

     This is easy to implement for the exporter (see how
     PyBuffer_FillInfo() does it in a few lines).


Memoryview
----------

  8) Add PyMemoryView_FromBytes(). This could potentially replace
     PyMemoryView_FromBuffer().

  9) I've come to think that memoryviews should only be created
     from PyBUF_FULL_RO requests (this also automatically succeeds
     if an object is writable, see above).

     One reason is that we don't constantly have to check for
     the presence of shape and strides and format (unless ndim = 0).

     Currently it is possible to create any kind of view via
     PyMemoryView_FromBuffer(). It would be possible to deprecate
     it or make it a rule that the buffer argument must have full
     information.

  10) NumPy has a limit of ndim = 64. It would be possible to
      use that limit as well and make shape, strides and
      suboffsets static arrays of size 64. Perhaps this is a bit
      wasteful, it depends on how many views are typically
      created.

  11) test_buffer.py contains an example (see: test_memoryview_release())
      that Antoine's test case will not work if a re-exporter is
      involved. Should we leave that or fix it, too?



My apologies for the long list, but it'll be easier to proceed if
some rules are written in stone. :)
msg142191 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-08-16 13:25
70a8ccd53ade fixes conversion of NumPy-style arrays to a suboffset
representation. The previous representation did not work for
slicing non-contiguous arrays with backward strides.

Also, I've added long comments that hopefully explain how slicing
with arbitrary strides and suboffsets is supposed to work.
msg143727 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-09-08 14:53
Here's a completely restructured memoryview implementation that I believe
is quite robust. Both memoryobject.c (the parts I worked on, which is 91%)
and _testbuffer.c have 100% code coverage, including all error conditions [1].

memoryview is tested against the struct module (via _testbuffer's ndarray),
array.array and bytearray. To deal with the case explosions inherent to the
specification test_buffer.py resorts to brute force in some cases (most
notably in testing getbuffer() flags).

PyMemoryViewObject is now a PyVarObject with private arrays. Unless ndim = 0,
shape and strides are always present after initialization.

Memoryview now has support for all native single-value struct module
specifiers. I abandoned the idea of using the struct module for
packing/unpacking. New benchmarks (See #10227) indicate that _testbuffer's
tolist() is over 1000x slower that the optimized versions of memoryview.

The cast function from #5231 is completely implemented.


Review would be much appreciated. Perhaps it would be possible to
do a mini PEP-3118 sprint? Usually I'm never on IRC, but I could
dust off irssi for the occasion.


These are the main changes in detail:
-------------------------------------

o Restructure memoryobject.c into sections.

o Use struct hack in PyMemoryViewObject for the dynamic arrays.

o Rework initialization.

o Add a couple of invariants: A new memoryview will always have complete
  shape/strides/format information, unless ndim = 0.

o Add buffer flags: A new memoryview will always have flag information
  that determines whether it is a scalar (ndim = 0), C/Fortran contiguous,
  NumPY or PIL style, released or active. This eliminates the need for
  expensive re-computations during a getbuffer() request.

o Add support for all native struct module formats in indexing, assigning
  and tolist().

o Add memoryview.cast(format=x, shape=y), where x is a native format specifier
  and y is any multidimensional shape.

o Add PEP-3118 compliant getbuffer() method.

o Slice assignments now support non-contiguous 1-D slices.

o Comparisons now support non-contiguous 1-D buffers.

o Representation of empty shape etc. is now an empty tuple (O.K.
  with Mark Dickinson and Travis Oliphant).


[1] 100% coverage requires a patch for systematically triggering failures
    of Python API functions.
msg143730 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-09-08 15:05
> is over 1000x slower that the optimized versions of memoryview.

Oh dear, scratch that. Lets make that 15x. ;)
msg144234 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-09-18 09:22
Revision 4492afe05a07 allows memoryview to handle objects with an
__index__() method. This is for compatibility with the struct module
(See also #8300).
msg151360 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-01-16 13:23
Finally reviewed Stefan's latest patch. It mostly looks great to me. Aside from a few minor cosmetic quibbles, the only substantive change I would like is to expose the Py_buffer len field as "memoryview.size" instead of "memoryview.len" (to reduce confusion with "len(memoryview)" and to encourage a conceptual link with sys.getsizeof()).

As noted in the review, I also think fixing #12384 should be fairly straightforward and it would be nice to have that working when the change is applied.

(Kristjan: I added you to the nosy list, since you commented on another issue that you had been poking around in the memoryview internals and noticed a few dodgy things. Stefan's work on this issue is aimed at comprehensively addressing those problems).
msg151445 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-01-17 11:41
Thanks Nick.  Looking through this discussion it looks as though you encountered all the confusing bits that I ran into (dup_buffer, ownership issues, reference counting and so forth.) Good work.
msg151447 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-17 11:53
Thanks for the comments. Most of them should be easy to fix.

Nick Coghlan <report@bugs.python.org> wrote:
> [...] expose the Py_buffer len field as "memoryview.size" instead of
> "memoryview.len" (to reduce confusion with "len(memoryview)" and
> to encourage a conceptual link with sys.getsizeof()).

I agree with this. The best thing is probably to have both versions
work in 3.3 and remove memoryview.len in 3.4 (or later).

> As noted in the review, I also think fixing #12384 should be fairly
> straightforward and it would be nice to have that working when the
> change is applied.

tobytes() currently calls PyBuffer_ToContiguous(..., 'C') from
Objects/abstract.c. Since the function repeatedly calls
PyBuffer_GetPointer(), it has this comment:

    /* XXX : This is not going to be the fastest code in the world
             several optimizations are possible.
     */

So if the fix for tobytes() should go in, I'd like to use the
copy_buffer() function from _testbuffer.c. PyBuffer_ToContiguous()
would need to be fixed separately.

test_buffer.py already asserts that for each result, result.tolist()
and result.tobytes() match the results obtained from PyBuffer_GetPointer(indices)
(See: test_buffer.py:779), so I'm not worried about using the same
implementation in the tests as in the production code.
msg151537 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-18 13:34
Revisiting memoryview.size: I foresee problems for NumPy users, since array.size
has a different meaning there:

>>> x = array([[1,2,3], [4,5,6]], dtype='q')
>>> x.shape
(2, 3)
>>> x.itemsize
8
>>> len(x)
2
>>> x.size
6
>>> x.nbytes
48

So here we have:

x.nbytes == product(shape) * itemsize == Py_buffer.len  == (virtual!) byte length
x.size   == product(shape) == number of elements


My suggestion is to use memoryview.nbytes as well. memoryview.size would have
the additional problem that Py_buffer.len is always the byte size of the logical 
structure (e.g. after slicing) and not necessarily the byte size of the physical
memory area.
msg151538 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-01-18 14:40
"nbytes" sounds reasonable to me, given the unfortunate ambiguity of both size and len.

As far as #12834 goes, I'm happy to go along with whatever you think is best. You've spent a lot more time down in the guts of the implementation than I have, and the approach you describe seems to make sense :)
msg152086 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-01-27 12:06
Aside from some minor comments that I included in my review, the latest patch gets a +1 from me.
msg152090 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-01-27 12:41
Antoine's review picked up on several issues I missed or glossed over - I actually agree with his point about making most of the new APIs private rather than public.

With regards to exposing _testbuffer in the documentation of memoryview's hash support, perhaps it would be better to use a 1D bytes object + memoryview.cast() to get an officially supported multi-dimensional view of a region of memory?
msg152091 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-27 12:42
> With regards to exposing _testbuffer in the documentation of
> memoryview's hash support, perhaps it would be better to use a 1D
> bytes object + memoryview.cast() to get an officially supported
> multi-dimensional view of a region of memory?

By the way, I didn't point them, but there are other places in the
memoryview doc where _testbuffer is mentioned.
msg152124 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-27 21:42
I've added some comments on Rietveld. This time, I pasted the email
addresses manually into the CC field, apparently without success
(I didn't receive mail).

Regarding the use of _testbuffer in the docs: I agree that it's strange,
on the other hand it illustrates several examples much better (length of
a multi-dimensional view, result of cast to ND, etc).


So I'm not sure what to do. Of course I'll take out the examples if you
really don't like it.
msg152145 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-01-28 00:56
Radical suggestion: make it public as collections.simple_ndarray?

(prefixing with simple_ to be explicit that this is not even *close* to being the all-singing, all-dancing NumPy.ndarray)
msg152161 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-28 14:13
I've been busy fixing the memoryview.h issues first. Could you take a
look at:

http://hg.python.org/features/pep-3118/file/f020716704d4/Include/memoryobject.h

Changes:

   a) Make all functions and the two buffer access macros part
      of the limited API again.

   b) Make the managed buffer API definitely private.

   c) Make PyBUF_MAX_NDIM=64 part of the official buffer API.


I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc.,
since the comment already says "... Don't access their fields directly.".
msg152173 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-28 17:19
Nick Coghlan <report@bugs.python.org> wrote:
> Radical suggestion: make it public as collections.simple_ndarray?

Heh, that's a nice one. :)


While I think that the code in _testbuffer.c is sound and very well tested,
the API is low-level and optimized for testing all sorts of quirks.

Examples: _testbuffer.ndarray has the questionable capability of changing
views while buffers are exported. This is to test the pathological case
that came up in the discussion.

Then, similar to NumPy's ndarray, _testbuffer.ndarray constructs arrays
from a flat list. This is a bit clumsy but quite suitable for testing.

NumPy's ndarray is also the low level API. I think most people use
the high level array:

>>> a = array([[1,2], [3,4]], dtype="L")
>>> a
array([[1, 2],
       [3, 4]], dtype=uint64)

So it would take some effort to polish the API.


Meanwhile, to eliminate the use of _testbuffer in the documentation, I think
the following might work:

I've just added complete support for displaying multi-dimensional lists and
multi-dimensional comparisons to memoryobject.c in my private repo.

The code is quite succinct and follows exactly the same pattern as
copy_base/copy_rec.


Then, in the documentation we can use cast() and memoryview.tolist(),
but add a comment that cast() is just used for demonstration purposes.
msg152174 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-28 17:29
> Nick Coghlan <report@bugs.python.org> wrote:
> > Radical suggestion: make it public as collections.simple_ndarray?
> 
> Heh, that's a nice one. :)

Well, the question is: does it have a point? Is this ndarray useful
outside of any third-party infrastructure such as the APIs offered by
numpy?

You might answer that we already have array.array but ISTM that
array.array is quite useless in general :)
msg152175 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-28 17:51
Antoine Pitrou <report@bugs.python.org> wrote:
> > Nick Coghlan <report@bugs.python.org> wrote:
> > > Radical suggestion: make it public as collections.simple_ndarray?
> > 
> > Heh, that's a nice one. :)
> 
> Well, the question is: does it have a point? Is this ndarray useful
> outside of any third-party infrastructure such as the APIs offered by
> numpy?

I think it would be mainly educational. It's easier to understand
the whole (multi-dimensional) point of PEP-3118 when there's a
concrete object that you can play with. Of course serious users
would go straight to NumPy.

The other issue is that's it's slightly strange to have a fully
featured memoryview with no object in the stdlib that can utilize
all features (at least the testing side is now complete).

Anyway, right now I don't know myself if I'm for or against it. :)
msg152188 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-28 20:59
>    a) Make all functions and the two buffer access macros part
>       of the limited API again.

Hmm, I don't think buffer access macros should be part of the limited
API. If they are truely important (which I doubt), we should have
equivalent functions for them.

> I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc.,
> since the comment already says "... Don't access their fields directly.".

My question is whether there is any point in making these flags. Does
3rd-party code want to manipulate memoryview internals, instead of
querying the Py_buffer?

(and of course the memoryview object is becoming more and more like
another Py_buffer :-))
msg152196 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-28 22:44
Antoine Pitrou <report@bugs.python.org> wrote:
> >    a) Make all functions and the two buffer access macros part
> >       of the limited API again.
> 
> Hmm, I don't think buffer access macros should be part of the limited
> API. If they are truely important (which I doubt), we should have
> equivalent functions for them.

I thought the whole Py_buffer API was only temporarily removed from the
limited API until the issues were sorted out:

http://bugs.python.org/issue10181#msg125462

For instance, here the macros are not excluded:

http://hg.python.org/cpython/file/3292cc222d5c/Include/memoryobject.h

Since the issues seem resolved in general, I thought it was time to
restore the previous state. [I just noticed that I didn't revert all
of Martin's changes, so xxlimited currently does not build in the
pep-3118 repo.]

As for the two macros specifically, I know Pauli was using them:

http://bugs.python.org/issue10181#msg139775

> > I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc.,
> > since the comment already says "... Don't access their fields directly.".
> 
> My question is whether there is any point in making these flags. Does
> 3rd-party code want to manipulate memoryview internals, instead of
> querying the Py_buffer?

The flags are primarily for the memoryview itself to avoid repeated calls
to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER
to get the view and also needs to determine the contiguity type, that
code could also use the flags.

Pauli: If you are still following the issue, do you think having access
to contiguity flags is useful for 3rd-party code or not?
msg152197 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-28 23:09
> Antoine Pitrou <report@bugs.python.org> wrote:
> > >    a) Make all functions and the two buffer access macros part
> > >       of the limited API again.
> > 
> > Hmm, I don't think buffer access macros should be part of the limited
> > API. If they are truely important (which I doubt), we should have
> > equivalent functions for them.
> 
> I thought the whole Py_buffer API was only temporarily removed from the
> limited API until the issues were sorted out:
> 
> http://bugs.python.org/issue10181#msg125462

I'm talking about the memoryview access macros. It's like
PyList_GET_SIZE, it isn't part of the limited API and you have to use
PyList_GetSize instead.

The limited API promises binary (ABI) compatibility accross releases.
It's a very strong promise to make and we must be careful not to put too
much stuff in the limited API. I don't think the rule for inclusion is
"we should put everything useful in the limited API".

> For instance, here the macros are not excluded:
> 
> http://hg.python.org/cpython/file/3292cc222d5c/Include/memoryobject.h

It might be a mistake, then.

> As for the two macros specifically, I know Pauli was using them:
> 
> http://bugs.python.org/issue10181#msg139775

Well, Pauli might use them, but it just means his code isn't compatible
with the limited API, then.

> The flags are primarily for the memoryview itself to avoid repeated calls
> to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER
> to get the view and also needs to determine the contiguity type, that
> code could also use the flags.

But why would 3rd-party code use PyMemoryView_GET_BUFFER instead of, for
example, PyObject_GetBuffer? You seldom have APIs which *expect* a
memoryview, I think. Instead, they would expect a buffer-compatible
object.
msg152206 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-01-29 01:09
I'm with Antoine here - we want to be *very* conservative with what we expose through the limited API. Due to the ABI compatibility promise, anything exposed that way is very hard to change.

Keeping things out of the limited API isn't really an issue - it just means that anyone that wants to use them needs to rebuild their extensions for each new version of the C API (just as they do now).

Addings things to the stable ABI later is easy though - that's why we had Martin take all the Py_buffer related stuff out of the initial version of the limited API.
msg152208 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-01-29 02:24
Also, +1 to replacing _testbuffer with .cast() to demonstrate the multi-dimensional support. Without an actual multi-dimensional array object in the standard library, demonstrating those aspects is always going to be a little tricky. However, it's worth doing so that the likes of NumPy and PIL don't need to explain those interactions.
msg152223 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-29 12:40
Antoine Pitrou <report@bugs.python.org> wrote:
> > I thought the whole Py_buffer API was only temporarily removed from the
> > limited API until the issues were sorted out:
> > 
> > http://bugs.python.org/issue10181#msg125462
> 
> I'm talking about the memoryview access macros. It's like
> PyList_GET_SIZE, it isn't part of the limited API and you have to use
> PyList_GetSize instead.

You're right: I presumed that the macros were excluded temporarily
when in fact that had already happened in 62b61abd02b8.

> > The flags are primarily for the memoryview itself to avoid repeated calls
> > to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER
> > to get the view and also needs to determine the contiguity type, that
> > code could also use the flags.
> 
> But why would 3rd-party code use PyMemoryView_GET_BUFFER instead of, for
> example, PyObject_GetBuffer? You seldom have APIs which *expect* a
> memoryview, I think. Instead, they would expect a buffer-compatible
> object.

That's a good question. It looks to me that the macro was present as
PyMemoryView() initially. You renamed it in #3560 (with Guido's approval),
and later PyMemoryView_GET_BUFFER appeared in the docs.

I think 3rd-party code uses the macros mainly because they are
present and, in the case of PyMemoryView_GET_BUFFER, documented.
In most situations PyObject_GetBuffer() could be used indeed.

Most use cases I can think of would also involve having access to
the managed buffer API. As an example, here's a technique that is
similar to what goes on in PyMemoryView_GetContiguous():

Suppose you have an initialized bytes object that you want to
wrap as a multi-dimensional exporter. Then:

   - Base the memoryview on the bytes object and keep exactly one
     reference to it.

   - Adjust the shape, strides etc. to get the structure you want.

   - Return the view: You now have a fully compliant exporter that
     only needs a single Py_DECREF() to disappear and do all cleanup.

Of course this could also be exposed as a function, e.g.:

   /* stealing a reference to bytes */
   PyMemoryView_FromBytesAndInfo(PyObject *bytes, Py_buffer *info);

So let's make the flags private. What do you prefer?

  1) Leave them in memoryview.h, but with a leading underscore:

       _Py_MEMORYVIEW_C
       [...]

  2) Move them to memoryobject.c, with a leading underscore.

  3) Move them to memoryobject.c, without a leading underscore (I find
     this more readable).

  4) Move them to memoryobject.c as MV_C, MV_FORTRAN, etc.

Also, I'll add a note to the docs that PyMemoryView_GET_BUFFER can
probably be avoided by using PyObject_GetBuffer() directly.
msg152245 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-29 17:19
> Of course this could also be exposed as a function, e.g.:
> 
>    /* stealing a reference to bytes */
>    PyMemoryView_FromBytesAndInfo(PyObject *bytes, Py_buffer *info);

I think we should minimize the number of reference-stealing functions.

> So let's make the flags private. What do you prefer?

I don't really mind, whatever you think is best :)
msg152404 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-31 20:34
I've uploaded a new patch that should address the remaining issues:

   o In the documentation _testbuffer has been replaced by
     m.cast() + the now multi-dimensional m.tolist().

   o I restored the state of the limited API. If we want
     to include Py_buffer again, I think this should be done
     in a separate patch.

   o Flags of the memoryview object are private.


Additionally, because NumPy allows non-aligned array accesses,
I changed the packing functions to use memcpy for multi-byte types.

On x86/amd64 gcc is smart enough to produce almost exactly the same
asm output as before, with a slowdown of 0-1%, depending on the
benchmark.

On other platforms the situation might be worse, but I don't have
access to real hardware where alignment actually matters.
msg153870 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-02-21 13:16
Latest version looks good to me - I vote for landing it whenever you're ready :)
msg154138 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-02-24 15:16
Thanks, Nick. I'll try to get it done this weekend.

I've uploaded Misc/NEWS and Doc/whatsnew/3.3.rst (my apologies to Antoine 
for plagiarizing the first sentence, I found it hard to come up with a 
better version).

I wasn't sure whether to put the "whatsnew" entry into the section
"Other Language Changes" or into the PEP section. I chose the latter,
since many new features have been added that are part of the PEP.
msg154141 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-02-24 17:25
PEP section makes sense - I plan to mark PEP 3118 as Final once you commit this (or you can do that yourself, for that matter).
msg154240 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-25 11:25
New changeset 3f9b3b6f7ff0 by Stefan Krah in branch 'default':
- Issue #10181: New memoryview implementation fixes multiple ownership
http://hg.python.org/cpython/rev/3f9b3b6f7ff0
msg154323 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-02-26 10:30
Nick Coghlan <report@bugs.python.org> wrote:
> PEP section makes sense - I plan to mark PEP 3118 as Final once you commit
> this (or you can do that yourself, for that matter).

Array features are complete except for multi-dimensional indexing and slicing.
I think it would be nice to add those in a separate issue; it's not a lot of
additional code and it doesn't interfere with the current code.

The overall array handling scheme is sound. Life would be a bit easier
if contiguity flags were a mandatory part of the Py_buffer structure
that the exporter has to fill in.

Then there is an open issue (#3132) for expanding the struct module syntax,
where the wording in some sections of the PEP led to a bit of head-scratching. :)

In another issue (#13072) the question came up whether the proposed 'u' and 'w'
formats still make sense after PEP-393 (I think they do, they should map to
UCS-2 and UCS-4).

We need to decide what to do about 2.7 and 3.2. It's pretty difficult by
now to separate the bug fixes from the features. I could follow the example
of CPU manufacturers: start with the whole feature set and disable as much
as possible.

Another problem for 2.7 and 3.2 is that the 'B' format would still need to
accept bytes instead of ints. Or can we change that as a bug fix?
msg154327 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-02-26 11:34
Aw, I guess I was too optimistic in thinking this was the last gap before we could declare it Final...

+1 on creating a new feature request for NumPy style multi-dimensional indexing and slicing support on memoryview (I'm assuming that's what you meant).

As far as your last question goes, I'm not sure we *can* salvage this in 2.7/3.2. We do have the option of throwing our hands up in the air and saying "Sorry, we couldn't figure out how to fix it without completely rewriting it. Take a look at 3.3 if you want to see how it's *supposed* to work."

I hesitate to suggest this (since I'm not volunteering to do the work) but perhaps it would be worth packaging up the 3.3. memoryview and publishing it on PyPI for the benefit of 3.2 and 2.7?

May also be worth bringing up on python-dev to see if anyone else has any bright ideas. Myself, I'm not seeing any real choices beyond "won't fix", "backport 3.3 version and remove/disable the new features", "backport 3.3 version, new features and all" and "publish a full featured version on PyPI".
msg154616 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-02-29 11:33
I'm busy adding the C-API changes to the docs. Regarding the stable ABI:

The general mood was to *keep* the removal of the buffer interface
for some time, did I get that right?

In that case this removal (especially of the Py_buffer struct) should
be documented also for 3.2.
msg154651 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-29 16:54
New changeset 9d3c7dd55c7f by Stefan Krah in branch 'default':
Issue #10181: Add warning that structure layouts in memoryobject.h and
http://hg.python.org/cpython/rev/9d3c7dd55c7f
msg154665 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-02-29 21:18
For 3.2, there's no removal to document - we asked MvL to drop the buffer APIs from PEP 384, so they've never been part of the stable ABI.

From the PEP: "The buffer interface (type Py_buffer, type slots bf_getbuffer and bf_releasebuffer, etc) has been omitted from the ABI, since the stability of the Py_buffer structure is not clear at this time. Inclusion in the ABI can be considered in future releases."

I agree we shouldn't be hasty in making even the updated buffer interface implementation officially part of the stable ABI. Better to leave it out for 3.3, get some real world feedback on the latest version, then commit to stability for 3.4+.
msg154680 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-01 08:32
> From the PEP: "The buffer interface (type Py_buffer, type slots bf_getbuffer
> and bf_releasebuffer, etc) has been omitted from the ABI, since the stability
> of the Py_buffer structure is not clear at this time. Inclusion in the ABI
> can be considered in future releases."

Great, that's exactly what I was looking for. - As far as I can see the issue
is finished, so I'm closing it. Thanks again everyone for all the help!
msg154764 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-03-02 13:14
It occurs to me that one thing that *could* be backported to early versions are some of the documentation improvements on how to correctly handle the lifecycle of fields in Py_buffer. That gets messy though because memoryview doesn't behave nicely in those versions, so we'd be violating our own guidelines.

Perhaps the relevant sections should be updated with a reference saying that the semantics have been cleaned up in 3.3, and if in doubt, try to follow the 3.3 documentation?
msg154937 - (view) Author: Roundup Robot (python-dev) Date: 2012-03-05 09:51
New changeset 373f6cdc6d08 by Stefan Krah in branch 'default':
Issue #10181: The decision was to raise a buffer error in memory_exit()
http://hg.python.org/cpython/rev/373f6cdc6d08
msg157834 - (view) Author: Josh Triplett (joshtriplett) Date: 2012-04-09 08:08
I currently use Python 2.7, and I'd like to make use of memoryview.  Specifically, I work on BITS (http://biosbits.org/), which runs Python in ring 0 as part of GRUB, and I'd like to use memoryview to give Python access to data in physical memory.  I ran into several of the problems documented here when trying to do so.  I'd really love to see a backport of this fixed version into 2.7.
msg157835 - (view) Author: Josh Triplett (joshtriplett) Date: 2012-04-09 08:15
> I currently use Python 2.7, and I'd like to make use of memoryview.  Specifically, I work on BITS (http://biosbits.org/), which runs Python in ring 0 as part of GRUB, and I'd like to use memoryview to give Python access to data in physical memory.  I ran into several of the problems documented here when trying to do so.  I'd really love to see a backport of this fixed version into 2.7.

More specifically, the primary functionality that I'd like to use exists in 3.3 as PyMemoryView_FromMemory.  I've tried to approximate that function using the available API in 2.7, but that led me here.
History
Date User Action Args
2012-04-09 08:15:56joshtriplettsetmessages: + msg157835
2012-04-09 08:08:31joshtriplettsetnosy: + joshtriplett
messages: + msg157834
2012-03-05 09:51:05python-devsetmessages: + msg154937
2012-03-02 13:14:40ncoghlansetmessages: + msg154764
2012-03-01 08:32:50skrahsetstatus: open -> closed
resolution: fixed
messages: + msg154680

stage: resolved
2012-02-29 21:18:37ncoghlansetmessages: + msg154665
2012-02-29 16:54:30python-devsetmessages: + msg154651
2012-02-29 11:33:51skrahsetnosy: + loewis
messages: + msg154616
2012-02-26 11:34:31ncoghlansetmessages: + msg154327
2012-02-26 10:42:23skrahlinkissue2394 superseder
2012-02-26 10:30:34skrahsetmessages: + msg154323
2012-02-25 11:25:30python-devsetnosy: + python-dev
messages: + msg154240
2012-02-24 17:25:45ncoghlansetmessages: + msg154141
2012-02-24 15:16:20skrahsetfiles: + issue10181-news.diff

messages: + msg154138
2012-02-21 13:16:01ncoghlansetmessages: + msg153870
2012-02-13 15:39:53skrahunlinkissue10227 dependencies
2012-02-09 22:42:30skrahlinkissue9990 superseder
2012-02-09 22:42:30skrahunlinkissue9990 dependencies
2012-02-09 22:33:54skrahlinkissue8305 superseder
2012-02-09 22:33:54skrahunlinkissue8305 dependencies
2012-02-09 22:31:39skrahlinkissue7433 superseder
2012-02-09 22:31:39skrahunlinkissue7433 dependencies
2012-01-31 20:34:30skrahsetmessages: + msg152404
2012-01-31 20:33:08skrahsetfiles: + a5c4a96dc981.diff
2012-01-29 20:59:16skrahlinkissue5231 superseder
2012-01-29 20:59:16skrahunlinkissue5231 dependencies
2012-01-29 17:19:05pitrousetmessages: + msg152245
2012-01-29 12:40:42skrahsetmessages: + msg152223
2012-01-29 02:24:36ncoghlansetmessages: + msg152208
2012-01-29 01:09:51ncoghlansetmessages: + msg152206
2012-01-28 23:09:36pitrousetmessages: + msg152197
2012-01-28 22:44:49skrahsetmessages: + msg152196
2012-01-28 20:59:27pitrousetmessages: + msg152188
2012-01-28 17:51:37skrahsetmessages: + msg152175
2012-01-28 17:29:12pitrousetmessages: + msg152174
2012-01-28 17:19:17skrahsetmessages: + msg152173
2012-01-28 14:13:15skrahsetmessages: + msg152161
2012-01-28 00:56:31ncoghlansetmessages: + msg152145
2012-01-27 21:42:25skrahsetmessages: + msg152124
2012-01-27 12:42:39pitrousetmessages: + msg152091
2012-01-27 12:41:03ncoghlansetmessages: + msg152090
2012-01-27 12:06:48ncoghlansetmessages: + msg152086
2012-01-25 18:00:18skrahunlinkissue12834 dependencies
2012-01-25 17:44:16skrahunlinkissue13411 dependencies
2012-01-25 17:43:55skrahsetfiles: + 8dd9f0ea4876.diff
2012-01-18 14:41:48ncoghlansetmessages: - msg151539
2012-01-18 14:41:24ncoghlansetmessages: + msg151539
2012-01-18 14:40:40ncoghlansetmessages: + msg151538
2012-01-18 13:34:54skrahsetmessages: + msg151537
2012-01-17 12:58:03skrahlinkissue13411 dependencies
2012-01-17 11:53:19skrahsetmessages: + msg151447
2012-01-17 11:41:55kristjan.jonssonsetmessages: + msg151445
2012-01-16 13:23:45ncoghlansetnosy: + kristjan.jonsson

messages: + msg151360
versions: + Python 3.3, - Python 3.2
2012-01-16 13:09:27ncoghlanlinkissue12834 dependencies
2012-01-16 10:29:30ncoghlanlinkissue13797 dependencies
2011-11-13 11:56:08pmooresetnosy: + pmoore
2011-09-18 09:22:30skrahsetmessages: + msg144234
2011-09-18 09:17:06skrahsetfiles: + 4492afe05a07.diff
2011-09-08 15:05:06skrahsetmessages: + msg143730
2011-09-08 14:57:25skrahlinkissue10227 dependencies
2011-09-08 14:56:21skrahlinkissue5231 dependencies
2011-09-08 14:53:24skrahsetmessages: + msg143727
2011-09-08 14:47:15skrahsetfiles: + 51cedae5acfc.diff
2011-08-16 13:25:53skrahsetmessages: + msg142191
2011-08-16 13:20:32skrahsetfiles: + 70a8ccd53ade.diff
2011-08-10 12:43:05skrahlinkissue7433 dependencies
2011-08-10 12:39:21skrahlinkissue8305 dependencies
2011-08-10 12:31:52skrahlinkissue9990 dependencies
2011-08-10 11:46:42skrahsetmessages: + msg141858
2011-08-10 11:43:36skrahsetfiles: + 78fb1181787a.diff
2011-08-10 03:19:45kermodesetnosy: - kermode
2011-07-06 13:16:43ncoghlansetmessages: + msg139936
2011-07-06 13:01:03pitrousetmessages: + msg139935
2011-07-06 12:42:22skrahsetmessages: + msg139933
2011-07-06 12:39:20skrahsetmessages: + msg139932
2011-07-06 04:21:36ncoghlansetmessages: + msg139921
2011-07-06 04:16:49ncoghlansetmessages: + msg139920
2011-07-06 04:07:31ncoghlansetmessages: + msg139919
2011-07-05 15:46:40pitrousetmessages: + msg139883
2011-07-05 15:33:19skrahsetmessages: + msg139878
2011-07-05 14:55:51pitrousetmessages: + msg139877
2011-07-05 14:37:02ncoghlansetmessages: + msg139875
2011-07-05 14:09:46pitrousetmessages: + msg139874
2011-07-05 14:09:25pitrousetmessages: + msg139873
2011-07-05 14:02:55ncoghlansetmessages: + msg139872
2011-07-05 13:42:07pitrousetmessages: + msg139871
2011-07-05 13:13:27ncoghlansetmessages: + msg139870
2011-07-05 12:59:01ncoghlansetmessages: + msg139868
2011-07-05 12:49:17ncoghlansetmessages: + msg139866
2011-07-05 12:47:13ncoghlansetmessages: + msg139865
2011-07-05 12:11:51pvsetmessages: + msg139861
2011-07-05 11:49:04pitrousetmessages: + msg139859
2011-07-05 11:44:53skrahsetmessages: + msg139856
2011-07-05 10:49:41skrahsetmessages: + msg139846
2011-07-05 10:47:50skrahsetfiles: + 718801740bde.diff
2011-07-05 08:39:58skrahsetmessages: + msg139834
2011-07-05 02:43:09ncoghlansetmessages: + msg139826
2011-07-04 14:36:56pvsetmessages: + msg139775
2011-07-04 13:29:22ncoghlansetmessages: + msg139767
2011-07-04 13:26:46ncoghlansetmessages: + msg139766
2011-07-04 13:00:53skrahsetmessages: + msg139764
2011-07-04 12:30:50ncoghlansetmessages: + msg139762
2011-07-04 12:27:27ncoghlansetmessages: + msg139760
2011-07-04 12:21:09skrahsetmessages: + msg139758
2011-07-04 10:59:06pvsetmessages: + msg139751
2011-07-04 10:42:28skrahsetmessages: + msg139748
2011-07-04 10:36:14hayposetnosy: + haypo
2011-07-04 10:29:41skrahsetfiles: + bbe70ca4e0e5.diff
2011-07-04 10:28:58skrahsethgrepos: + hgrepo36
messages: + msg139747
2011-06-28 00:58:24ncoghlansetmessages: + msg139341
2011-06-27 20:03:59kermodesetmessages: + msg139325
2011-06-27 18:36:27pvsetmessages: + msg139323
2011-06-27 18:14:36kermodesetmessages: + msg139320
2011-06-27 17:21:58pitrousetmessages: + msg139317
2011-06-27 17:17:07skrahsetmessages: + msg139316
2011-06-27 16:47:35skrahsetmessages: + msg139314
2011-06-27 13:17:56ncoghlansetmessages: + msg139266
2011-06-27 11:28:13pvsetmessages: + msg139257
2011-06-27 10:32:49skrahsetmessages: + msg139256
2011-06-26 17:58:05petri.lehtinensetnosy: + petri.lehtinen
2011-06-26 12:35:47ncoghlansetmessages: + msg139173
2011-06-26 12:31:34ncoghlansetmessages: + msg139172
2011-06-26 10:30:20skrahsetnosy: + skrah
messages: + msg139162
2011-06-20 18:58:35jconsetnosy: + jcon
2011-02-19 19:14:26mark.dickinsonsetassignee: mark.dickinson ->
messages: + msg128878
nosy: teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
2011-02-14 21:15:10loewissetnosy: - loewis
2011-02-14 15:13:21pvsetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128551
2011-02-14 14:03:10ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128546
2011-02-14 13:49:02ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128544
2011-02-14 13:03:50pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128542
2011-02-14 12:02:23ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128540
2011-02-14 11:41:42ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128539
2011-02-14 11:29:13ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128538
2011-02-14 11:11:53ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128537
2011-02-13 22:33:44pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128528
2011-02-13 22:20:19ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128527
2011-02-13 22:12:42ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128526
2011-02-13 22:01:05ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128525
2011-02-13 19:42:17pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128518
2011-02-13 19:31:31pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128517
2011-02-13 19:08:43pvsetfiles: + buffer-interface-clarify.patch
nosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128516
2011-02-13 16:04:23pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128514
2011-02-13 15:34:17pvsetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128508
2011-02-13 15:11:38pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128506
2011-02-13 15:09:37pvsetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128505
2011-02-13 15:02:14pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128503
2011-02-13 14:35:10pvsetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128499
2011-02-13 14:19:13pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128495
2011-02-13 13:20:55pvsetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128491
2011-02-13 04:23:03pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv
messages: + msg128477
2011-02-13 03:47:32pvsetnosy: + pv
messages: + msg128475
2011-01-07 11:22:35ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125641
2011-01-07 10:41:14mark.dickinsonsetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125640
2011-01-07 09:34:48ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125635
2011-01-07 09:10:24mark.dickinsonsetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125631
2011-01-07 09:05:53mark.dickinsonsetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125630
2011-01-07 03:59:23ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125618
2011-01-06 19:35:24loewissetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125582
2011-01-06 09:20:26mark.dickinsonsetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125531
2011-01-06 06:14:00pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125507
2011-01-06 02:15:39ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125502
2011-01-06 02:09:52ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125499
2011-01-05 19:40:41loewissetfiles: + pybuffer.diff

messages: + msg125462
keywords: + patch
nosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
2011-01-05 16:52:46pitrousetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125433
2011-01-05 16:30:52ncoghlansetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125431
2011-01-05 13:51:10mark.dickinsonsetnosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou
messages: + msg125424
2011-01-05 12:25:05ncoghlansetnosy: + teoliphant
messages: + msg125418
2011-01-05 07:30:22kermodesetnosy: + kermode
2011-01-04 14:58:15ncoghlansetnosy: loewis, mark.dickinson, ncoghlan, rupole, pitrou
messages: + msg125332
2011-01-04 14:56:10ncoghlanlinkissue10001 superseder
2011-01-04 14:54:40ncoghlansetnosy: + loewis
messages: + msg125328
2011-01-04 14:49:16ncoghlansetnosy: mark.dickinson, ncoghlan, rupole, pitrou
messages: + msg125327
2011-01-04 08:22:38mark.dickinsonsetnosy: mark.dickinson, ncoghlan, rupole, pitrou
messages: + msg125297
2011-01-04 08:18:56mark.dickinsonsetnosy: mark.dickinson, ncoghlan, rupole, pitrou
messages: + msg125296
2011-01-04 00:20:02pitrousetassignee: mark.dickinson

nosy: + mark.dickinson
2010-11-03 13:00:31ncoghlansetmessages: + msg120322
2010-11-03 12:51:50ncoghlansetmessages: + msg120321
2010-11-03 12:21:52ncoghlansettitle: get_shape0 in memoryobject.c not checked for error return -> Problems with Py_buffer management in memoryobject.c (and elsewhere?)
2010-11-03 12:19:54ncoghlansetmessages: + msg120319
2010-11-03 11:59:53rupolesetmessages: + msg120318
2010-11-03 01:17:31rupolesetmessages: + msg120297
2010-11-02 23:01:12pitrousetmessages: + msg120281
2010-11-02 22:57:06ncoghlansetmessages: + msg120280
2010-11-02 22:34:57pitrousetmessages: + msg120270
2010-11-02 22:30:01ncoghlansetmessages: + msg120268
2010-11-02 22:16:18pitrousetnosy: + ncoghlan
messages: + msg120267
2010-11-02 21:30:35eric.araujosetnosy: + pitrou
2010-10-23 18:22:37rupolecreate