Issue10181
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010-10-23 18:22 by rupole, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
pybuffer.diff | loewis, 2011-01-05 19:40 | review | ||
buffer-interface-clarify.patch | pv, 2011-02-13 19:08 | Clarifying the behavior of the buffer interface (for discussion, probably shouldn't be applied as-is) | ||
bbe70ca4e0e5.diff | skrah, 2011-07-04 10:29 | PyManagedBuffer implementation | review | |
718801740bde.diff | skrah, 2011-07-05 10:47 | PyManagedBuffer implementation #2 | review | |
78fb1181787a.diff | skrah, 2011-08-10 11:43 | Add Modules/_testbuffer.c and Lib/test/test_buffer.py | review | |
70a8ccd53ade.diff | skrah, 2011-08-16 13:20 | Fixed suboffsets representation in _testbuffer.c | review | |
51cedae5acfc.diff | skrah, 2011-09-08 14:46 | PyManagedBuffer, native format, complete 1D slicing, casts | review | |
4492afe05a07.diff | skrah, 2011-09-18 09:16 | Handle objects with an __index__() method. | review | |
8dd9f0ea4876.diff | skrah, 2012-01-25 17:43 | review | ||
a5c4a96dc981.diff | skrah, 2012-01-31 20:32 | review | ||
issue10181-news.diff | skrah, 2012-02-24 15:16 | review |
Repositories containing patches | |||
---|---|---|---|
http://hg.python.org/features/pep-3118#memoryview |
Messages (147) | |||
---|---|---|---|
msg119460 - (view) | Author: Roger Upole (rupole) | Date: 2010-10-23 18:22 | |
There are a number of places in memoryobject.c where get_shape0 is used without the return value being checked. If it fails, this leads to hanging exceptions and crashes. |
|||
msg120267 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-11-02 22:16 | |
Yes, there are probably many holes like this. I've done what I could to make the simple cases (builtin types) to work, but the spec is rotten from the start. Blame the numpy people, sorry. |
|||
msg120268 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2010-11-02 22:30 | |
As far as I know, PEP 3118 serves its purpose of allowing extension modules like numpy and PIL to share data without needing to copy it around all the time. It's just that memoryview wasn't really part of that purpose (since all the affected third party libraries have their own objects for looking at memory), and it shows :P Our near term goal should be to get memoryview in a good place for handling 1D contiguous data and throw exceptions (rather than crashing) for everything else. I think Antoine has achieved the former, but it sounds like there is still some work to do on the latter. |
|||
msg120270 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-11-02 22:34 | |
> As far as I know, PEP 3118 serves its purpose of allowing extension > modules like numpy and PIL to share data without needing to copy it > around all the time. It's just that memoryview wasn't really part of > that purpose (since all the affected third party libraries have their > own objects for looking at memory), and it shows :P The deeper issue is that there are no ownership or lifetime rules for dynamically allocated fields in Py_buffer (such as strides and shape), which itself isn't a PyObject. For very simple objects (such as bytes or bytearray) it isn't really a problem, but it is in more complicated situations; and there's no possible solution without a clear spec.. (and I'm not even talking of the issue of making slices of memoryviews, since the buffer API itself doesn't handle slicing, which means the memoryview object has to modify its inner Py_buffer...) |
|||
msg120280 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2010-11-02 22:57 | |
Read the "Releasing the buffer" section in PEP 3118 again. Unless I'm misunderstanding you completely, the rules you're asking for are already in place: those fields are entirely the responsibility of the exporting object, and it needs to ensure they remain valid until the buffer is released. Now, it may be that we haven't *implemented* this correctly or consistently in the builtin objects, in which case we should do something about it. But the rules are there. |
|||
msg120281 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-11-02 23:01 | |
> Read the "Releasing the buffer" section in PEP 3118 again. Unless I'm > misunderstanding you completely, the rules you're asking for are > already in place: those fields are entirely the responsibility of the > exporting object, and it needs to ensure they remain valid until the > buffer is released. In practice, though, we copy Py_buffer structs around and there's no way for the original object to know about that. That's the fundamental difference with a PyObject where you would only increase a refcount instead of copying the structure's contents. |
|||
msg120297 - (view) | Author: Roger Upole (rupole) | Date: 2010-11-03 01:17 | |
The culprit wrt copying Py_buffer structs seems mainly to be dup_buffer, which is called in memory_getbuf. This seems unnecessary in the case where there's an underlying object and it has returned the view thru its own tp_as_buffer. The underlying object at that point is solely responsible for releasing the buffer, so memory_getbuf shouldn't mess with it at all. In the case where there is no underlying object (mainly thru PyMemoryView_FromBuffer), it probably should allocate any memory in the view in such a way that it can be freed in memory_releasebuf when the view->obj is NULL. |
|||
msg120318 - (view) | Author: Roger Upole (rupole) | Date: 2010-11-03 11:59 | |
While on the subject, the docs for PyMemoryView_FromBuffer state that the resulting memoryview takes ownership of the Py_buffer struct and is responsible for freeing any associated memory. It does not do so, which is not surprising. The absence of a standard for allocation makes it absolutely impossible for it to do so safely. |
|||
msg120319 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2010-11-03 12:19 | |
Hmm, if we're ever creating a new copy of a Py_buffer without calling GetBuffer again on the original object, then *that* is a bug (exactly comparable to copying a PyObject pointer without incrementing the reference count - it's OK if you can guarantee the lifecycle of your borrowed pointer is shorter than the lifecycle of the original, but disallowed otherwise). This still doesn't sound like a problem with the spec to me, it sounds like a problem with the implementation strategy that was originally used when integrating the spec into the interpeter core (which I readily agree received far less attention than the spec itself did). It really sounds like we should be creating a _Py_ManagedBuffer internal object, with each instance containing a PyObject* and a Py_buffer instance. We don't need to alter PEP 3118 to do that - such a change is entirely consumer side, so the objects providing the buffers don't need to care how the lifecycle of the Py_buffer struct is managed. When we need another reference to the buffer, we can then just increment the refcount of the _Py_ManagedBuffer instance rather than having to call GetBuffer again on the original object. The contents of Py_buffer objects that have been passed to GetBuffer really shouldn't be getting modified at all - for those cases, we should maintain *two* Py_buffer structs, one (unmodified) with the original results of the GetBuffer call, and a second owned by the caller (for manipulation). |
|||
msg120321 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2010-11-03 12:51 | |
It doesn't help that neither PEP 3118 nor the Py_buffer docs mention the "obj" member of the Py_buffer struct that refers back to the original object providing the buffer - that's fairly fundamental to understanding how PyBuffer_Release and PyMemoryView_FromBuffer can work even in theory. (Given that, an additional _PyManagedBuffer object shouldn't be necessary - MemoryView just needs to call ReleaseBuffer and GetBuffer at the appropriate times) |
|||
msg120322 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2010-11-03 13:00 | |
So the overall to-do list here is sounding like: 1. Document "Py_buffer->obj" appropriately in both the C API documentation and in PEP 3118 2. Ensure GetBuffer/ReleaseBuffer are used as the moral equivalent of INCREF/DECREF for Py_buffer objects. 3. Check builtin types correctly manage the lifecycle of all elements of their exported Py_buffer structs 4. Check return values and throw exceptions as noted in the original post |
|||
msg125296 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2011-01-04 08:18 | |
It might be instructive to look at how NumPy itself manages sharing of ndarray data and ownership of the corresponding structs. I meant to find time to look at this over the break. |
|||
msg125297 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2011-01-04 08:22 | |
BTW, I agree that it looks as though significant changes might be needed. I wonder whether it would make sense to exclude the Py_buffer struct fro m the Stable ABI PEP for now. |
|||
msg125327 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-01-04 14:49 | |
Agreed. I'll put something on python-dev about that, so MvL sees it. |
|||
msg125328 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-01-04 14:54 | |
More direct - added MvL to nosy list. Martin, we would like to exclude Py_buffer from the stable ABI for Python 3.2, until we have a chance to thrash out the missing pieces of the documentation for 3.3. I *think* it is a documentation problem, but until we're certain, it seems safer to leave it out. |
|||
msg125332 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-01-04 14:58 | |
From Lenard Lindstrom in issue 10001 (closed as dupe of this issue): The ~Py_buffer.obj field is undocumented. Yet memoryview, that acts as a wrapper, includes the field in gc traversal. Also, if the field is not initialized by bf_getbuffer its value can be indeterminate. For memoryview the gc can crash (see attached C demo program). |
|||
msg125418 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-01-05 12:25 | |
Added Travis to nosy list - even if he doesn't have time to finish this off himself, hopefully he can point us in the right direction. |
|||
msg125424 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2011-01-05 13:51 | |
Nick, it sounds as though you have an idea of how you think things should be working here---perhaps you can help me out. I'd really like to understand what the implementation should look like from the POV of a 3rd party module that defines some object exporting the buffer interface. Here's a specific scenario I'd like to understand: module foo defines a type Foo that implements the buffer protocol. For simplicity, suppose it's exporting 1-dim buffers. When I do: >>> from foo import Foo >>> foo_object = Foo() >>> m = memoryview(foo_object) >>> n = m[::2] # take a slice of m >>> del m # delete the objects, in whichever order. >>> del n what's the sequence of getbuffer and releasebuffer calls that foo_object should expect to see? Q1. Does foo get 2 requests to getbuffer (and 2 to releasebuffer), or just one each? I'm assuming at least that getbuffer and releasebuffer calls should be paired. Q2. For each pair of getbuffer/releasebuffer calls, should the 'view' parameter passed into releasebuffer be identical to that provided to getbuffer? Or is it acceptable for the actual Py_buffer* pointers to be distinct, but the pointed-to Py_buffers to be exact copies. (The existence of the smalltable field complicates the notion of an exact copy a little bit, but I think that's a detail that can be ignored for these questions.) |
|||
msg125431 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-01-05 16:30 | |
PEP 3118 makes it clear that the underlying object should see *two* pairs of calls to the buffer methods: http://www.python.org/dev/peps/pep-3118/#the-py-buffer-struct Even if we ignore the undocumented "obj" field, the target object needs to ensure the exported buffer remains valid as long as any memory views are referencing it. The only way to do that is to treat GetBuffer/ReleaseBuffer as the moral equivalent of INCREF/DECREF. However, I believe the current memoryview implementation does the wrong thing and only calls them once, and then duplicates the Py_buffer structures without ever going back to the original objects (that opinion was based on a quick scan of the code a while back, but it would fit with the uncomplimentary sentiments Antoine has expressed in trying to get all this to exhibit some kind of consistency) For point 2, it must be the same pointer. When the PEP says "the same", I agree it could be taken as ambiguous, but the later reference to the exporter managing a linked-list of exported views makes it clear that identity is what matters. As far as I can see, some of things in the PEP were found to be a PITA in practice (such as every consumer of the API having to implement the equivalent of the "base" attribute in the original memoryview design), so Travis changed them. Unfortunately, those changes never made it back into the protocol documentation, leading to the current confusion. |
|||
msg125433 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-01-05 16:52 | |
> However, I believe the current memoryview implementation does the > wrong thing and only calls them once, and then duplicates the > Py_buffer structures without ever going back to the original objects > (that opinion was based on a quick scan of the code a while back, but > it would fit with the uncomplimentary sentiments Antoine has expressed > in trying to get all this to exhibit some kind of consistency) Actually, and unless I made a mistake, it does call them twice. (and does the slicing by hand afterwards, which explains part of the hilarity with smalltable and friends :-)) > For point 2, it must be the same pointer. When the PEP says "the > same", I agree it could be taken as ambiguous, but the later reference > to the exporter managing a linked-list of exported views makes it > clear that identity is what matters. The common idiom (including in code not written by me :-)) is currently to use Py_buffer variables allocated on the C stack. Also, we have the C API function PyMemoryView_FromBuffer() which basically mandates that Py_buffer structs can be copied around. And it's a very useful function since it allows to create a memoryview from a chunk of anonymous memory. |
|||
msg125462 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2011-01-05 19:40 | |
> Martin, we would like to exclude Py_buffer from the stable ABI for > Python 3.2, until we have a chance to thrash out the missing pieces > of the documentation for 3.3. I *think* it is a documentation > problem, but until we're certain, it seems safer to leave it out. Fine with me. Attached is a patch; it would be good if someone could confirm that this exactly comprises the API that should be hidden. |
|||
msg125499 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-01-06 02:09 | |
It's OK if the Py_buffer is on the stack - it's just a unique identifier for the exporter to use as a key, not something the exporter controls the lifecycle of (the latter is true only for the pointers *inside* the struct, such as buf, shape, strides, etc). PyMemoryView_FromBuffer should be calling PyObject_Getbuffer on the view->obj member (it's one of the things that embedding the reference allows, just as it allowed removal of the separate obj argument from the PyObject_ReleaseBuffer signature). That way the source object knows there is now a *second* Py_buffer struct kicking around, and can decide whether to re-use the same internal pointers or create new ones. |
|||
msg125502 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-01-06 02:15 | |
Martin's patch for the PEP 384 adjustments looks good to me. A note in the PEP that we've deliberately excluded this on a temporary basis due to the implementation/documentation mismatch and expect to have it back in for 3.3 might be helpful, too. |
|||
msg125507 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-01-06 06:14 | |
> PyMemoryView_FromBuffer should be calling PyObject_Getbuffer on the > view->obj member (it's one of the things that embedding the reference > allows, just as it allowed removal of the separate obj argument from > the PyObject_ReleaseBuffer signature). One use case is anonymous memory, i.e. view->obj == NULL. |
|||
msg125531 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2011-01-06 09:20 | |
[Nick] > For point 2, it must be the same pointer. When the PEP says "the same", > I agree it could be taken as ambiguous, but the later reference to the > exporter managing a linked-list of exported views makes it clear that > identity is what matters. Okay, thanks. Next point of confusion: does that mean that in the example I gave, the object 'n' (the memoryview slice) needs to know about *two* distinct Py_buffer structs---one to pass back to releasebuffer and one to store its own information? It seems to me that these can't be the same, since (a) the Py_buffer for the slice will have different shape and stride from that returned by getbuffer, (b) what's passed back to releasebuffer should be identical to what came from getbuffer, so we can't just modify the shape and stride information in place. |
|||
msg125582 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2011-01-06 19:35 | |
Removal of the buffer interface from the ABI has been committed as r87805, r87806, r87805, r87808, and r87809. |
|||
msg125618 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-01-07 03:59 | |
@Antoine: I actually had a closer look at the way dup_buffer is used, and I think it is currently legitimate (including for the anonymous memory case). It just isn't documented very well. For the "FromBuffer" case, it assumes the calling code already invoked GetBuffer as necessary when initialising the view, and for the various internal uses the dup_buffer call is immediately preceded by a GetBuffer call when view->obj is not NULL. @Mark: I don't think that follows. PEP 3118 says that exporting objects own that memory, but it doesn't say that clients have to treat the structures as read-only. The only thing clients are obligated to do is pass the same Py_buffer address to ReleaseBuffer that was passed to GetBuffer. If the exporter actually needs to release buffer specific resources, then it should maintain an internal data structure keyed off the Py_buffer address. The one thing clients really shouldn't mess with is the obj field (since ReleaseBuffer needs that to find the correct release method to invoke), but the PEP is obviously silent on that matter since it doesn't even acknowledge the existence of that field. |
|||
msg125630 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2011-01-07 09:05 | |
[Nick] > @Mark: I don't think that follows. [...] > If the exporter actually needs to release buffer specific > resources, then it should maintain an internal data structure keyed off > the Py_buffer address. Ah, okay. So that would make issue 9990 just a documentation problem, right? |
|||
msg125631 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2011-01-07 09:10 | |
> an internal data structure keyed off > the Py_buffer address. If we're using the Py_buffer address coming into getbuffer as a key, then we probably shouldn't be using a stack address, since it would be difficult to guarantee uniqueness that way. |
|||
msg125635 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-01-07 09:34 | |
It only needs to be unique for the lifetime of the buffer reference - for a Py_buffer struct on the stack, by the time the relevant C stack frame goes away, ReleaseBuffer should already have been called. |
|||
msg125640 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2011-01-07 10:41 | |
> by the time the relevant C stack frame goes away, ReleaseBuffer should > already have been called. Hmm. I'm not sure I understand how/when that would happen. Looking at the current py3k code, in Objects/memoryobject.c at line 92, we have: PyObject * PyMemoryView_FromObject(PyObject *base) { PyMemoryViewObject *mview; Py_buffer view; if (!PyObject_CheckBuffer(base)) { PyErr_SetString(PyExc_TypeError, "cannot make memory view because object does " "not have the buffer interface"); return NULL; } if (PyObject_GetBuffer(base, &view, PyBUF_FULL_RO) < 0) return NULL; mview = (PyMemoryViewObject *)PyMemoryView_FromBuffer(&view); if (mview == NULL) { PyBuffer_Release(&view); return NULL; } return (PyObject *)mview; } So here 'view' is being allocated on the stack, and its address passed to PyObject_GetBuffer; PyBuffer_Release isn't called (except when an error happens) before the function exits and the stack frame becomes invalid. Sorry for the odd questions; it's clear to me that I'm misunderstanding something fundamental, but I'm struggling to figure out what. |
|||
msg125641 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-01-07 11:22 | |
Ah, sorry - no, I misunderstood the question. I think that example actually *is* a bug in the memoryview implementation. The GetBuffer call should use the persistent address (&mview->view) that will be used in the ReleaseBuffer call, as per Antoine's patch on issue 9990. |
|||
msg128475 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-02-13 03:47 | |
I spent today some time to rewrite `memoryobject.c`, and cleaning up the Py_buffer handling in it. (I wrote also the Numpy PEP 3118 implementation, so this was straightforward to do.) The end result is here: https://bitbucket.org/pv/cpython-stuff/changesets (bookmark bug/memoryview) Some points of note: - Clarification of ownership conventions for Py_buffer in docs: http://bitbucket.org/pv/cpython-stuff/changeset/805971191369 The new implementation is written to respect the invariants mentioned there. - Rewritten memoryobject: http://bitbucket.org/pv/cpython-stuff/src/817edc63ce4d/Objects/memoryobject.c I didn't yet heavily test this (eg. against Numpy), but at least the Python test suite passes. There are also some minor corners that need to be polished. Also some new tests would be needed, as I slightly extended the slicing capabilities of memoryview. Only one nasty point turned up: the existence of PyMemoryView_FromBuffer. This function breaks the ownership rules: the pointers in Py_buffer structure can point inside the structure --- and the structure can reside e.g. on the stack. Deep copying Py_buffer cannot in general be done, because this can mess up whatever bf_releasebuffer tries to do. The workaround here was to do a deep copy *only* for those pointers that point inside the struct --- although this violates the spirit of the ownership model, it should be essentially always safe to do, and I believe it is the correct solution (at least if PyMemoryView_FromBuffer is to be retained). *** Comments are welcome. |
|||
msg128477 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-02-13 04:23 | |
Hello, > I spent today some time to rewrite `memoryobject.c`, and cleaning up > the Py_buffer handling in it. (I wrote also the Numpy PEP 3118 > implementation, so this was straightforward to do.) Thanks for trying this. > - Rewritten memoryobject: > > http://bitbucket.org/pv/cpython-stuff/src/817edc63ce4d/Objects/memoryobject.c Some worrying things here: * memoryview_getbuffer() doesn't call the original object's getbuffer. This means that if I do: m = memoryview(some_object) n = memoryview(m) m.release() n ends up holding a buffer to some_object's memory, but some_object doesn't know about it and can free the pointer at any time. * same for _get_sub_buffer_index() and _get_sub_buffer_slice0(). Actually, the whole concept of non-owner memoryviews seems flawed. * the fact that slicing increments the parent memoryview's refcount means that a loop like: while len(m): # do something m = m[1:] will end up keeping N chained memoryviews on the heap. Currently only the last memoryview remains alive. Some other things: * why do you accept the ellipsis when indexing? what is it supposed to mean? * please don't use #warning. Just put the TODO in a /* ... */. * please no "#if PY_VERSION_HEX >= 0x03000000". Just make your source py3k-compatible and we'll take care of backporting it to 2.x if your patch is accepted ;) |
|||
msg128491 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-02-13 13:20 | |
Hi, Please focus on the constraints of the consumer not mucking with the content of `Py_buffer`, and calling `bf_releasebuffer` only with data obtained from `bf_getbuffer` and only one. If we agree on this, then how to exactly to do the implementation is just a detail. The problem in the current way is that the structure sent to `bf_releasebuffer` does not contain the same data as what was filled in by `bf_getbuffer`, and since the contents are dup-ed, `bf_releasebuffer` is called multiple times with the same data. So, `bf_releasebuffer` cannot rely on (i) the data in Py_buffer being what `bf_getbuffer` put there, and (ii) getting the same Py_buffer data only once. So, `bf_releasebuffer` cannot be used to release any resources allocated in `bf_getbuffer`. For example, Numpy's PEP 3118 implementation does not define a `bf_releasebuffer` although it needs to allocate resources in each call to `bf_getbuffer`. Instead, it has to keep a track of allocated memory, and deallocate all of them on `array_dealloc`. This is a PITA, and one that could be avoided by a small change in the PEP implementation and documentation in Python. > Some worrying things here: > > * memoryview_getbuffer() doesn't call the original object's getbuffer. > This means that if I do: > m = memoryview(some_object) > n = memoryview(m) > m.release() > n ends up holding a buffer to some_object's memory, but some_object > doesn't know about it and can free the pointer at any time. Good point. There are two possible solutions to this: - Keep a count of how many buffers memoryview() has "exported", and do not allow memoryview.release() if some are active. In a sense, this would be more in line with the PEP: the PyMemoryViewObject would here act as an ordinary object exporting some block of memory, and not do any magic tricks. It would guarantee that the buffers it has "exported" stay valid. - Add additional fields to `PyMemoryViewObject` for storing new `strides`, `format`, and `shape` that override the stuff in Py_buffer. This would allow for calling `PyObject_GetBuffer` for a second time. Both would be reasonably easy to do. *** Calling PyObject_GetBuffer to get a new hold of a buffer needs some precautions, though. For example: >>> mem = memoryview(some_object) # do some stuff >>> mem2 = memoryview(some_object) >>> assert mem.format == mem2.format # not guaranteed so there is some extra work `memoryview_getbuffer` could do on top of calling PyObject_GetBuffer. > * same for _get_sub_buffer_index() and _get_sub_buffer_slice0(). > Actually, the whole concept of non-owner memoryviews seems flawed. If the "parent" memoryview keeps its the memory valid as long as such sub-memoryviews exist, such concerns go away. > * the fact that slicing increments the parent memoryview's refcount > means that a loop like: > while len(m): > # do something > m = m[1:] > will end up keeping N chained memoryviews on the heap. Currently only > the last memoryview remains alive. This has an easy solution: if the parent is a non-owner memoryview, take a reference to its obj. This then keeps only buffer-owning view on the stack. > Some other things: > > * why do you accept the ellipsis when indexing? what is it supposed to > mean? Same meaning as in Numpy. a[...] == a > * please don't use #warning. Just put the TODO in a /* ... */. Sure. > * please no "#if PY_VERSION_HEX >= 0x03000000". Just make your source > py3k-compatible and we'll take care of backporting it to 2.x if your > patch is accepted ;) Those are just so that the backporting would be easy. It's not on the stage of a serious patch yet :) |
|||
msg128495 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-02-13 14:19 | |
> The problem in the current way is that the structure sent to > `bf_releasebuffer` does not contain the same data as what was filled > in by `bf_getbuffer`, and since the contents are dup-ed, > `bf_releasebuffer` is called multiple times with the same data. Hmm, there's a misunderstanding. bf_releasebuffer is called exactly once for each call to bf_getbuffer. Of course, bf_getbuffer can be called several times! > So, `bf_releasebuffer` cannot rely on (i) the data in Py_buffer being > what `bf_getbuffer` put there, Well, why should it rely on that? > So, `bf_releasebuffer` cannot be used to release any resources > allocated in `bf_getbuffer`. AFAICT, it can. That's what the "internal" pointer is for: This is for use internally by the exporting object. For example, this might be re-cast as an integer by the exporter and used to store flags about whether or not the shape, strides, and suboffsets arrays must be freed when the buffer is released. The consumer should never alter this value. (http://docs.python.org/dev/c-api/buffer.html#Py_buffer.internal) > > Some worrying things here: > > > > * memoryview_getbuffer() doesn't call the original object's getbuffer. > > This means that if I do: > > m = memoryview(some_object) > > n = memoryview(m) > > m.release() > > n ends up holding a buffer to some_object's memory, but some_object > > doesn't know about it and can free the pointer at any time. > > Good point. There are two possible solutions to this: > > - Keep a count of how many buffers memoryview() has "exported", > and do not allow memoryview.release() if some are active. Where would that count be stored? > In a sense, this would be more in line with the PEP: > the PyMemoryViewObject would here act as an ordinary object > exporting some block of memory, and not do any magic tricks. Well, that sounds wrong to me. The memoryview doesn't export anything; the original object does. > It would guarantee that the buffers it has "exported" stay valid. How would it, since it doesn't know the original object's semantics? > - Add additional fields to `PyMemoryViewObject` for storing > new `strides`, `format`, and `shape` that override the stuff > in Py_buffer. > > This would allow for calling `PyObject_GetBuffer` for a second time. Sounds better to me :) > Calling PyObject_GetBuffer to get a new hold of a buffer needs some precautions, though. For example: > > >>> mem = memoryview(some_object) > # do some stuff > >>> mem2 = memoryview(some_object) > >>> assert mem.format == mem2.format # not guaranteed Well, if the original object decides to change the format between two calls, then memoryview() should simply reflect that. > > * same for _get_sub_buffer_index() and _get_sub_buffer_slice0(). > > Actually, the whole concept of non-owner memoryviews seems flawed. > > If the "parent" memoryview keeps its the memory valid as long as such > sub-memoryviews exist, such concerns go away. So release() wouldn't do what it claims to do? That sounds misleading. > > Some other things: > > > > * why do you accept the ellipsis when indexing? what is it supposed to > > mean? > > Same meaning as in Numpy. a[...] == a Yes, but why we would do that in core Python? a[...] might have a meaning in NumPy, but it has none in core Python currently. I'm not against it, but it would warrant discussion on python-dev. |
|||
msg128499 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-02-13 14:35 | |
> Hmm, there's a misunderstanding. bf_releasebuffer is called exactly > once for each call to bf_getbuffer. Wrong: http://bugs.python.org/issue7433 static int memory_getbuf(PyMemoryViewObject *self, Py_buffer *view, int flags) { int res = 0; CHECK_RELEASED_INT(self); if (self->view.obj != NULL) res = PyObject_GetBuffer(self->view.obj, view, flags); if (view) dup_buffer(view, &self->view); return res; } After this, PyBuffer_Release will be called twice: once on the data in *view, by whoever acquired the buffer from memoryview, and once on self->view, by memory_dealloc. Both with the same bit-by-bit content of the Py_buffer structure. Because there are two Py_buffer structures here, setting view.obj to NULL in PyBuffer_Release does not guarantee correct calls to bf_releasebuffer. Note that the view.internal pointer is also clobbered above. > > So, `bf_releasebuffer` cannot rely on (i) the data in Py_buffer > > being what `bf_getbuffer` put there, > > Well, why should it rely on that? Because that makes implementing the exporter much easier. Also, writing an implementation for MemoryViewObject does not require clobbering the structure, and I doubt it helps much. > > So, `bf_releasebuffer` cannot be used to release any resources > > allocated in `bf_getbuffer`. > > AFAICT, it can. That's what the "internal" pointer is for. Sure, guaranteeing that view->internal pointer is not toyed with would also be enough. But the documentation should spell out very explicitly what the bf_releasebuffer call can rely on. |
|||
msg128503 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-02-13 15:02 | |
> > Hmm, there's a misunderstanding. bf_releasebuffer is called exactly > > once for each call to bf_getbuffer. > > Wrong: http://bugs.python.org/issue7433 This is a different issue. > static int > memory_getbuf(PyMemoryViewObject *self, Py_buffer *view, int flags) > { > int res = 0; > CHECK_RELEASED_INT(self); > if (self->view.obj != NULL) > res = PyObject_GetBuffer(self->view.obj, view, flags); > if (view) > dup_buffer(view, &self->view); > return res; > } > > After this, PyBuffer_Release will be called twice: once on the data in > *view, by whoever acquired the buffer from memoryview > , and once on self->view, by memory_dealloc. PyObject_GetBuffer() is called twice too: once when creating the memoryview, once when calling memory_getbuf. So again, bf_getbuffer is called the same number of times as bf_releasebuffer. > Note that the view.internal pointer is also clobbered above. Are you sure? memoryobject.c doesn't touch that pointer at all. > > > So, `bf_releasebuffer` cannot be used to release any resources > > > allocated in `bf_getbuffer`. > > > > AFAICT, it can. That's what the "internal" pointer is for. > > Sure, guaranteeing that view->internal pointer is not toyed with would > also be enough. > > But the documentation should spell out very explicitly what the > bf_releasebuffer call can rely on. Yes. |
|||
msg128505 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-02-13 15:09 | |
[clip] > This is a different issue. It is the issue relevant for this discussion. As written in my comment: "So, `bf_releasebuffer` cannot rely on (i) the data in Py_buffer being what `bf_getbuffer` put there, and (ii) getting the same Py_buffer data only once." > PyObject_GetBuffer() is called twice too: once when creating the > memoryview, once when calling memory_getbuf. > So again, bf_getbuffer is called the same number of times as > bf_releasebuffer. Yes, it is called twice, but with exactly the same data in Py_buffer. So I would rather say that bf_releasebuffer is called twice on the Py_buffer returned by the first Getbuffer, and zero times for the buffer returned by the second one. > > Note that the view.internal pointer is also clobbered above. > > Are you sure? memoryobject.c doesn't touch that pointer at all. dup_buffer does *dst = *src, which overwrites the view.internal pointer obtained from one GetBuffer call with a pointer obtained from a previous one. |
|||
msg128506 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-02-13 15:11 | |
> dup_buffer does *dst = *src, which overwrites the view.internal > pointer obtained from one GetBuffer call with a pointer obtained from > a previous one. Ah, ok, then it deserves fixing. |
|||
msg128508 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-02-13 15:34 | |
Ok, good, that diversion was then avoided :) *** So, am I on the right track that there would not be many objections to clarifying the docs of Py_buffer spec by stating: - For each buffer yielded by `bf_getbuffer`, `bf_releasebuffer` is called exactly once. Each `bf_releasebuffer` call is guaranteed to get the same view->internal pointer as filled in previously by the corresponding `bf_getbuffer`. All other fields in `Py_buffer` may be modified by the consumer, and `bf_releasebuffer` may not assume they contain valid data. - The exporter of the buffer must ensure that apart from the contents of the memory pointed to by `buf`, the contents of the returned Py_buffer (format, strides, shape, etc.) remain unchanged. If the latter is not true, it will lead to no end of trouble in e.g. multithreaded programs. What about the more strict requirement: - `bf_releasebuffer` is guaranteed that all fields except `obj` are not modified? This could simplify implementation of exporters, and I suspect that MemoryViewObject is a special case, as most consumers probably would not want to make such modifications. I can try preparing an updated patch based on some combination of the above, taking into account your comments. BTW, should I take this discussion to Python-dev? So far, I kept it here, as this bug report seemed to be about general issues in the current implementation and spec. |
|||
msg128514 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-02-13 16:04 | |
> - For each buffer yielded by `bf_getbuffer`, `bf_releasebuffer` > is called exactly once. > > Each `bf_releasebuffer` call is guaranteed to get the same > view->internal pointer as filled in previously by the > corresponding `bf_getbuffer`. > > All other fields in `Py_buffer` may be modified by the consumer, > and `bf_releasebuffer` may not assume they contain valid data. > > - The exporter of the buffer must ensure that apart from the contents > of the memory pointed to by `buf`, the contents of the returned > Py_buffer (format, strides, shape, etc.) remain unchanged. Yes! Can you write a patch that does these docs changes and also fixes the internal pointer issue? > What about the more strict requirement: > > - `bf_releasebuffer` is guaranteed that all fields except `obj` > are not modified? Well, if you manage to change the memoryview implementation for that, then fine. But I don't know how you plan to manage ownership of fields such as strides and friends. > BTW, should I take this discussion to Python-dev? So far, I kept it > here, as this bug report seemed to be about general issues in the > current implementation and spec. I don't think you need to. Nick and Mark follow this issue (and so does Travis if he's still alive ;)). |
|||
msg128516 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-02-13 19:08 | |
Ok, here's a patch with some suggested documentation changes + the minimal changes in memoryview to make bf_releasebuffer behave as advertised. Probably shouldn't be applied as-is, though. Problems remain. Suppose `bf_getbuffer` for memoryview is implemented so that it returns a view exported by the original object (and not the memoryview). Consider an example: >>> obj = PictureSet() # exposes a buffer, say, to "pic_1" >>> a = memoryview(obj) # points to pic_1 Here, `a` grabs a lock to pic_1. >>> obj.choose_picture(2) # switches the exposed buffer to pic_2 >>> b = memoryview(obj) # points to pic_2 >>> c = memoryview(a) Here, PyObject_GetBuffer in bf_getbuffer for `a` grabs a lock to pic_2 and passes it on to `c`. The spec specifies no ways to get additional locks on pic_1. >>> a.release() Now lock on pic_1 is released, and the buffer may be freed (also, if the handling of shape/stride arrays in memoryview is as it is now, also those are invalidated). One of the two is now true: `memoryview(a)` does not always describe the same area of memory as `a`, OR, `c` ends up with a lock on the wrong area of memory and the program hits a SEGV. [clip] > > In a sense, this would be more in line with the PEP: > > the PyMemoryViewObject would here act as an ordinary object > > exporting some block of memory, and not do any magic tricks. > > Well, that sounds wrong to me. The memoryview doesn't export > anything; the original object does. Having the memoryview "own" the exported buffer would be a simple solution to the above issue. The alternative would be to adjust the spec so that the above type of behavior is forbidden (any buffers and the shape/stride arrays ever exported must remain valid until the last one is released). Not sure if it makes sense to do this if the only gain is a simplification in the implementation of memoryview. > > It would guarantee that the buffers it has "exported" stay valid. > > How would it, since it doesn't know the original object's semantics? The original object must guarantee that the buffers remain valid until released, as specified in the PEP. Of course, this requires that memoryview counts how many buffers it has exported, and allows release() only if the count is zero. |
|||
msg128517 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-02-13 19:31 | |
> Problems remain. Suppose `bf_getbuffer` for memoryview is implemented > so that it returns a view exported by the original object (and not the > memoryview). Consider an example: > > >>> obj = PictureSet() # exposes a buffer, say, to "pic_1" > >>> a = memoryview(obj) # points to pic_1 > > Here, `a` grabs a lock to pic_1. > > >>> obj.choose_picture(2) # switches the exposed buffer to > pic_2 That's a very bad API to begin with. Anyone thinking it is reasonable to have the concept of a "current element" in a container should write PHP code instead. If you have individual pictures, please use individual objects. > >>> b = memoryview(obj) # points to pic_2 > >>> c = memoryview(a) > > Here, PyObject_GetBuffer in bf_getbuffer for `a` grabs a lock to pic_2 > and passes it on to `c` So what? c will represent pic_2 anyway, so the behaviour is right. Awful certainly for the user, but that's because PictureSet implements a braindead API. > , and the buffer may be freed (also, if the handling of shape/stride > arrays in memoryview is as it is now, also those are invalidated). One > of the two is now true: `memoryview(a)` does not always describe the > same area of memory as `a`, OR, `c` ends up with a lock on the wrong > area of memory and the program hits a SEGV. This all points to completely broken logic in the imaginary PictureSet class. If someone doesn't get their semantics right, it is not the buffer API's problem. bf_getbuffer and bf_releasebuffer must implement symmetric operations; perhaps that's not explicitly worded, but it seems quite obvious to me. > The alternative would be to adjust the spec so that the above type of > behavior is forbidden (any buffers and the shape/stride arrays ever > exported must remain valid until the last one is released). Why would you adjust the spec? This just stems from common sense. If you provide a pointer to a consuming API, you have to ensure the pointer remains valid as long as the consuming API is using that pointer. Actually, that's *exactly* why there's a bf_releasebuffer method at all. |
|||
msg128518 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-02-13 19:42 | |
That said: > Having the memoryview "own" the exported buffer would be a simple > solution to the above issue. If you can implement that without exhibing the issues we discussed above (e.g. O(N) memory consumption when doing repetitive slicing), then why not. Also, thanks for the patch. Please refrain from spurious formatting changes, though. |
|||
msg128525 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-02-13 22:01 | |
I'm still not comfortable with a convention that relies on *clients* of the PEP 3118 API not mucking with the internals of the Py_buffer struct. I'm *much* happier with the rule based on malloc/free semantics where the *pointer* passed to PyObject_GetBuffer must match a single later call to PyObject_ReleaseBuffer. This is also a convention that should be quite familiar to any C/C++ developers using the API directly. Note my comment earlier, pointing out that under those rules, dup_buffer actually does the right thing given the way it is used in the current implementation, but there's one current bug (issue 9990) where the wrong address is passed to the release buffer call - get_buffer is called with a stack address, the contents are copied to a heap address, then the heap address is used in the release_buffer call. |
|||
msg128526 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-02-13 22:12 | |
As far as the question of re-exporting the underlying view or not goes, I agree having "memoryview(a)" potentially refer to different underlying memory from "a" itself (because the source object has changed since the first view was exported) is a recipe for confusion. It is also important for ensuring memoryview slicing works correctly. From a didactic point of view, this would also have the benefit of providing a canonical example of managing exported Py_buffer structs in the CPython source code. The "repeated slicing" issue is definitely a concern, though. If we declare memoryview objects officially immutable, then we can avoid that by shortcutting the chaining with a typecheck on the referenced object (i.e. call GetBuffer on the source object directly if it is a memoryview, but on the memoryview we're copying otherwise) |
|||
msg128527 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-02-13 22:20 | |
I should note that the idea of using a Py_buffer's "internal" field as the unique ID rather than the Py_buffer's address does have some merit, but I still overall prefer a design that places fewer constraints on API consumers. |
|||
msg128528 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-02-13 22:33 | |
> I'm still not comfortable with a convention that relies on *clients* > of the PEP 3118 API not mucking with the internals of the Py_buffer > struct. Which clients? Those who export the buffer, or those who consume it? > I'm *much* happier with the rule based on malloc/free semantics where > the *pointer* passed to PyObject_GetBuffer must match a single later > call to PyObject_ReleaseBuffer. Agreed that Py_buffer should have been a PyObject from the start, but the PEP chose differently. We now have backwards compatibility constraints. What do we do with PyMemoryView_FromBuffer? Also, there's probably some code out there that likes to copy Py_buffers around. > As far as the question of re-exporting the underlying view or not > goes, I agree having "memoryview(a)" potentially refer to different > underlying memory from "a" itself (because the source object has > changed since the first view was exported) is a recipe for confusion. If an object changes its buffer while it's exported somewhere, it will always result in confusion for the user, regardless of how the memoryview object is implemented. All normal uses of the buffer API assume that the buffer's memory doesn't change while it's being accessed by its consumer (what would it mean to SHA1-hash or zlib-compress a changing piece of memory?). So I don't know why the memoryview object *in particular* should care about this. |
|||
msg128537 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-02-14 11:11 | |
On Mon, Feb 14, 2011 at 8:33 AM, Antoine Pitrou <report@bugs.python.org> wrote: >> I'm still not comfortable with a convention that relies on *clients* >> of the PEP 3118 API not mucking with the internals of the Py_buffer >> struct. > > Which clients? Those who export the buffer, or those who consume it? Consumers. (I'll try to stick to provider/consumer terminology, as that's clearer in this context >> I'm *much* happier with the rule based on malloc/free semantics where >> the *pointer* passed to PyObject_GetBuffer must match a single later >> call to PyObject_ReleaseBuffer. > > Agreed that Py_buffer should have been a PyObject from the start, but > the PEP chose differently. malloc/free modelled semantics have *nothing* to do with Py_buffer being a full PyObject in its own right. All I mean is that whatever pointer you call ReleaseBuffer with should be the one you passed to GetBuffer, and the only thing tp_releasebuffer implementations should rely on is the address of that pointer rather than the struct contents. However, from what Pauli has said, we may want to go with the alternative approach of saying the struct address is irrelevant, and only the content matter, using the "internal" field to disambiguate different exported buffers. I believe either will work, and either places additional constraints on buffer API consumers that aren't currently clearly documented. > We now have backwards compatibility constraints. What do we do with > PyMemoryView_FromBuffer? Also, there's probably some code out there that > likes to copy Py_buffers around. Such code is likely to be broken regardless of how we clarify the semantics, in the same way that our own dup_buffer is currently broken under either set of semantics (i.e. calling ReleaseBuffer with a different address in one case, clobbering the "internal" field in other cases). We will probably need to expose an official Py_buffer copying function that gets all the subtle details right so that extension authors can more easily avoid making the same mistakes. >> As far as the question of re-exporting the underlying view or not >> goes, I agree having "memoryview(a)" potentially refer to different >> underlying memory from "a" itself (because the source object has >> changed since the first view was exported) is a recipe for confusion. > > If an object changes its buffer while it's exported somewhere, it will > always result in confusion for the user, regardless of how the > memoryview object is implemented. All normal uses of the buffer API > assume that the buffer's memory doesn't change while it's being accessed > by its consumer (what would it mean to SHA1-hash or zlib-compress a > changing piece of memory?). > So I don't know why the memoryview object *in particular* should care > about this. I'm not talking about an exported view changing its details (that's explicitly disallowed by the PEP), I'm talking about the fact that sequential calls to PyObject_GetBuffer are permitted to return different answers. That's the point Pauli's PictureSet example illustrated - even though the toy example uses a somewhat clumsy API, it's perfectly legitimate according to the documentation, and it shows that the current memoryview implementation may behave strangely when you copy or slice a view of a mutable object, even though the view itself is guaranteed to remain valid. Consider the following: Traceback (most recent call last): File "<stdin>", line 1, in <module> BufferError: Existing exports of data: object cannot be re-sized Now suppose that instead of disallowing the resize, bytearray (or a similar object) permitted it by allocating a new memory buffer, while keeping a reference to the old buffer around until the memoryview releases it (an approach that is perfectly legitimate according to the PEP). In that case, our current "use the source object" approach to memoryview copying and slicing will backfire badly, since copies and slices will be working off the *new* (empty) state of the object, while the original memoryview will still be looking at the old populated state. I think Pauli's right, we need to make memoryview re-exporting significantly smarter in order to cope correctly with mutable objects. |
|||
msg128538 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-02-14 11:29 | |
Another idea we may want to revisit is the PyManagedBuffer concept, which would be a true PyObject that existed solely to simplify sharing of Py_buffer structs. If memoryview used such an object internally, then copying and slicing would be quite simple - just INCREF the managed buffer instance without letting the original source object know anything was going on. |
|||
msg128539 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-02-14 11:41 | |
OK, to summarise that revisit into a coherent proposal: 1. Add "PyManagedBuffer" as an even lower level wrapper around Py_buffer. The only thing this would support is doing GetBuffer on construction and ReleaseBuffer when destroyed (or when explicitly asked to do so), as well as access to the information in the Py_buffer struct. 2. Adjust PyMemoryView to use that object to access the source object that implements the PEP 3118 API. 3. When copying or slicing memoryview objects, the new PyMemoryView instance acquires a reference to the existing PyManagedBuffer instance instead of creating a new one. 4. If memoryview.release() attempts to explicitly release the buffer, but there are additional references to the PyManagedBuffer object, the request will fail with a BufferError. 5. dup_buffer is deleted (with extreme prejudice) 6. 3rd party code is encouraged in the C API documentation to never ever copy Py_buffer structs around and to use PyManagedBuffer instead. |
|||
msg128540 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-02-14 12:02 | |
On further reflection, I realised point 4 in that suggestion isn't reliable in a formal sense. Consider: with memoryview(a) as m: m[:] The slice won't reference the buffer again, but isn't guaranteed by the language definition to have been garbage collected at the time m.release() is called by the with statement. Altering release() to simply decrement the reference count of the managed buffer would defeat the whole point of having that method, so it may be necessary to allow early release with outstanding references and then include a "still alive" check in the code that allows access to the buffer details (similar to the way weak references work). |
|||
msg128542 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-02-14 13:03 | |
> >> I'm *much* happier with the rule based on malloc/free semantics where > >> the *pointer* passed to PyObject_GetBuffer must match a single later > >> call to PyObject_ReleaseBuffer. > > > > Agreed that Py_buffer should have been a PyObject from the start, but > > the PEP chose differently. > > malloc/free modelled semantics have *nothing* to do with Py_buffer > being a full PyObject in its own right. Well, the moment you're talking about having a struct of heterogeneous fields whose address should remain identical, why not use the one internal abstraction that is designed exactly for that, aka PyObject. Sure you can devise an other ownership model, but what's the point? I'm talking about avoiding NIH and an additional maintenance burden due to avoidable cruft. > > We now have backwards compatibility constraints. What do we do with > > PyMemoryView_FromBuffer? Also, there's probably some code out there that > > likes to copy Py_buffers around. > > Such code is likely to be broken regardless of how we clarify the > semantics, Well, PyMemoryView_FromBuffer is used in the stdlib right now (in the IO lib). > I think Pauli's right, we need to make memoryview > re-exporting significantly smarter in order to cope correctly with > mutable objects. Ok. > 1. Add "PyManagedBuffer" as an even lower level wrapper around > Py_buffer. The only thing this would support is doing GetBuffer on > construction and ReleaseBuffer when destroyed (or when explicitly > asked to do so), as well as access to the information in the Py_buffer > struct. > > 2. Adjust PyMemoryView to use that object to access the source object > that implements the PEP 3118 API. Ouch. Why not fold all that directly into memoryview? A third abstraction for support of what is supposed to be a simple API sounds too much. (see, if Py_buffer was a PyObject, we'd have only one abstraction) |
|||
msg128544 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-02-14 13:49 | |
The managed buffer needs to be a separate object so that multiple memoryview objects can share it. Essentially, we would end up with two APIs - the raw Py_buffer API and the higher level PyManagedBuffer one. C extensions would have the choice of using the low level API (most likely the case for existing extensions) or making life easier for themselves by using the higher level one. It is, at least from the consumer side, almost exactly what you suggested in the first place: a PEP 3118 style API where the main interface object is a true PyObject instance. Providers will still need to deal with correctly populating the passed in Py_buffer instances, but that can be handled easily enough by storing exported managed buffers in a Python list and storing indices in the Py_buffer internal field. PyMemoryView itself isn't appropriate as that higher level API, since it does a lot more than just provide an easy way to share a Py_buffer struct (specifically, thanks to slicing, a memoryview instance may not expose the entire underlying buffer). Managed buffers would be much simpler, simply providing read-only access to the fields of the managed Py_buffer struct. Depending on how we end up defining the required consumer semantics, PyMemoryView_FromBuffer may or may not be fully salvageable. If we find dup_buffer semantics that are universally correct, then we may be able to keep it in its current form by building those into an alternate ManagedBuffer constructor, otherwise we may need to restrict it to cases where the obj field is NULL. |
|||
msg128546 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-02-14 14:03 | |
If it helps, one way to think of it is that post-redesign, the PyManagedBuffer would be the "real" API object, with Py_buffer merely a way for data exporters to provide the information needed to initialise the managed buffer properly. (That actually fairly accurately tracks the origin of the Py_buffer struct - to avoid huge numbers of output variables in the GetBuffer API calls) Is such a solution ideal? No, but it would be significantly better than what we have now. If the PEP had been implemented as written, we would also have been much better off, but given that we failed to pick up the discrepancies between spec and implementation at the time, we're now stuck with trying to improve the situation without breaking backwards compatibility. Since I now agree with your diagnosis that the heart of the problem is the fact that Py_buffer contains lots of pointers and thus is difficult to copy safely, wrapping it in a PyObject (without adding any additional behaviour) in order to minimise copying seems like the most obvious solution. |
|||
msg128551 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-02-14 15:13 | |
Nick's plan of action above seems mostly OK to me. Though, it might be simpler to collapse the PyManagedBuffer object to PyMemoryView, even if this requires memoryviews to serve two purposes. [Nick:] > I'm still not comfortable with a convention that relies on *clients* > of the PEP 3118 API not mucking with the internals of the Py_buffer > struct. Some points against: (i) Having to look up keys by memory address from an additional PyDict is more work for the exporter than just passing around some PyMem_Malloc'd information in `internal`. (ii) There is already an "obj" field in the structure that the consumers are supposed to not mess with. (iii) The exporter cannot use the `internal` field for anything if bf_releasebuffer cannot rely on it being intact. If the recommended consumer API is changed so that Py_buffer mainly sits inside a PyObject, it becomes more clear that Py_buffer is read-only for the consumer (-- which is what I believe the PEP intended, but did not write down). [Nick:] > Altering release() to simply decrement the reference count of the > managed buffer would defeat the whole point of having that method, so > it may be necessary to allow early release with outstanding references > and then include a "still alive" check in the code that allows access > to the buffer details (similar to the way weak references work). Early release does not seem possible if the buffer does not come from the original object: lst = [] with memoryview(a) as m: b = numpy.array(m) lst.append(b) Now, m.__exit__ cannot release the buffer, since `b` holds a buffer-interface lock to `m`. `b` is 3rd party code, and does not know anything about MemoryViews. Some alternatives: (i) require that bf_getbuffer always gives a new lock on all exported buffers, if there are multiple, (ii) allow memoryview.__exit__ to fail silently, (iii) drop the context management. I guess (i) would be a sane choice -- I don't see many use cases for the same object exporting multiple different buffers. It's not needed by Numpy, and I suspect there is no existing 3rd party code that relies on this (because it doesn't work with the current implementation of memoryview :) |
|||
msg128878 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2011-02-19 19:14 | |
Sorry, I'm unassigning myself from this; I'm still following the issue, but just don't forsee having time to work on it at all. |
|||
msg139162 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-06-26 10:30 | |
[Nick] > Another idea we may want to revisit is the PyManagedBuffer concept, > which would be a true PyObject that existed solely to simplify sharing > of Py_buffer structs. > If memoryview used such an object internally, then copying and slicing > would be quite simple - just INCREF the managed buffer instance without > letting the original source object know anything was going on. I think this is the nicest solution since memoryview would then always have a proper base object. Do I understand correctly that PyManagedBuffer should only handle 1-dimensional objects? There is an additional point about slicing and sub-views: I think slicing (esp. multidimensional slicing) would be greatly simplified if we added a requirement for the *exporting* object to provide a sliced view. (The same applies to sub-views, also see source comments below [1]). For example, an exporting object could provide a sliced view by adding a getslicedbufferproc to PyBufferProcs: int PyObject_GetSlicedBuffer(PyObject *obj, Py_buffer *view, int flags, PyObject *key); By "sliced view" I mean that the exporting object changes buf, shape and strides. There are several advantages: o The invariant that all allocated memory in the buffer belongs to the exporting object remains intact. o The responsibility for creating correct multidimensional sliced views is shifted to the implementor of the exporting object. [1] memoryobject.c: /* XXX There should be an API to create a subbuffer */ /* XXX: This needs to be fixed so it actually returns a sub-view */ |
|||
msg139172 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-06-26 12:31 | |
The idea of PyManagedBuffer is for it to be an almost completely passive object that *just* acts as a refcounted wrapper around the Py_buffer structure, so it doesn't care about the actual contents. The only supplemental functionality I think it should provide is to disallow explicitly releasing the buffer while the reference count is greater than 1. I'm OK with my example cited above being unreliable. The correct way to write such code would then be: with memoryview(obj) as m: with m[:] as m2: ... I think separating the concerns this way, letting PyManagedBuffer worry about the lifecycle issues of the underlying buffer reference, while PyMemoryView deals with the *interpretation* of the buffer description (such as by providing useful slicing functionality) will make the whole arrangement easier to handle. When a memoryview is sliced, it would create a new memoryview that has a reference to the same PyManagedBuffer object, but different internal state that affects how that buffer is accessed. This is better than requiring that every implementor of the buffer API worry about the slicing logic - we can do it right in memoryview and then implementers of producer objects don't have to worry about it. Currently, however, memoryview gets tied up in knots since it is trying to do everything itself in a way that makes it unclear what is going on. The semantics of copying the Py_buffer struct or of accessing the PEP 3118 API on the underlying object when slicing or copying views are demonstrably broken. If we try to shoehorn reference counting semantics into the current object model, we would end up with two distinct modes of operation for memoryview: Direct: the view is directly accessing an underlying object via the PEP 3118 API Indirect: the view has a reference to another memoryview object that it is using as a data source That's complicated - hard to implement in the first place and hard to follow when reading the code. Adding the PyManagedBuffer object makes the object model more complex, but simplifies the runtime semantics: every memoryview instance will access a PyManagedBuffer object which takes care of the underlying PEP 3118 details. Direct use of the PEP 3118 consumer API in 3rd party code will also be strongly discouraged, with PyManagedBuffer promoted as the preferred alternative (producers, of course, will still need to provide the raw Py_buffer data that PyManagedBuffer exposes). At the Python level, I don't think it is necessary to expose a new object, so we can stick with Antoine's preferred model where memoryview is the only public API. My proposed new PyManagedBuffer object would just be about making life easier at the C level. |
|||
msg139173 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-06-26 12:35 | |
I should also note that if memoryview ends up growing enough state to cope correctly with numpy-style multi-dimensional slicing, I'm actually OK with that. You could get a long way just using the array module to allocate a large chunk of memory and then a suitably enhanced memoryview to manipulate it as a multidimensional array. That's a future concern, though - for now, the key task is to migrate to reliable, reference based semantics for Py_buffer management. |
|||
msg139256 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-06-27 10:32 | |
Nick, you know a lot about this issue and I'm probably missing many things here. I misunderstood your concept of PyManagedBuffer, so my previous posting might have been hard to understand. I'd appreciate if you (or anyone in this thread) could comment if the following would work *in theory*, even if you are against an additional getslicedbufferproc: As I understand, there are two major issues that complicate the code: 1) the copying in PyMemoryView_FromBuffer() 2) slicing To address 1), I wanted to create new memoryview objects exclusively from proper base objects that implement the buffer protocol. So the plan was to create small wrapper object inside PyMemoryView_FromBuffer() that handles enough of the buffer protocol to be usable inside the stdlib for one-dimensional objects. The actual memoryview would then be created by calling PyMemoryView_FromObject() on that wrapper. [PyMemoryView_FromObject() would then obviously not call PyMemoryView_FromBuffer(), but would create the view directly.] To address 2), buffers would *always* have to be filled in by the original exporting object, hence the proposal to add a getslicedbufferproc. Then memoryview would always have a proper base object and could always call getbuffer()/INCREF(base) and releasebuffer()/DECREF(base). I thought this would make the code much cleaner. > Direct: the view is directly accessing an underlying object via the PEP 3118 API > Indirect: the view has a reference to another memoryview object that it is using > as a data source Is there still a difference if only the original base object manages buffers and they are never copied? > This is better than requiring that every implementor of the buffer API > worry about the slicing logic - we can do it right in memoryview and then > implementers of producer objects don't have to worry about it. I'm not sure, but my reasoning was that e.g. in numpy the slicing logic is already in place. Then again, I don't know if it is a legitimate use of buf, shapes and strides to implement slicing. According to this mail, slicing information was supposed to be part of the memoryview struct: http://mail.python.org/pipermail/python-dev/2007-April/072584.html |
|||
msg139257 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-06-27 11:28 | |
skrah writes: > I think slicing (esp. multidimensional slicing) would be greatly > simplified if we added a requirement for the *exporting* object > to provide a sliced view. (The same applies to sub-views, also > see source comments below [1]). > > For example, an exporting object could provide a sliced view by adding > a getslicedbufferproc to PyBufferProcs: > > int PyObject_GetSlicedBuffer(PyObject *obj, Py_buffer *view, > int flags, PyObject *key); The same thing can be done via PyObject_GetBuffer(obj, &view, flags); PyBuffer_Slice(&view, &sliced_view, flags, key); given an implementation of PyBuffer_Slice. The logic in PyBuffer_Slice does not depend on where the buffer comes from, and every buffer can be sliced. As far as I see, the advantage of `getslicedbufferproc` would be to make the implementation of PyMemoryView simpler, but not much else. In my view, having each exporter implement the same logic by itself would only be an unnecessary burden. > o The invariant that all allocated memory in the buffer belongs > to the exporting object remains intact. Numpy arrays do not have this invariant, and they happily re-export memory owned by someone else. This is one root of problems here: the PEP implicitly assumes that re-exporting buffers (e.g. memoryview's implementation of `getbuffer`) is done in the way Numpy does it. Because of this, there is no mechanism for "incrementing the refcount" of an existing buffer export. Maintaining the above invariant then unavoidably leads to strange behavior in corner cases (which probably are very rare, as mentioned above), and as happened here, make the implementation messy and lead to bugs. The invariant *is* required for guaranteeing that `memoryview.release()` always succeeds. Such a method probably wasn't foreseen in the PEP (and I did not remember that it existed in my first patch), as Numpy arrays don't have any equivalent. The alternatives here are (i) do as Numpy does and give up the invariant and allow `.release()` to fail in some cases, or (ii) document the corner cases in the interface spec and try to detect them and fail if they occur. Which of these is chosen probably does not matter much in practice, but having PyManagedBuffer will make implementing either choice easier. |
|||
msg139266 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-06-27 13:17 | |
I'll try to do a summary of the conversation so far, since it's quite long and hard to follow. The basic issue is that memoryview needs to support copying and slicing that creates a new memoryview object. The major problem with that is that the PEP 3118 semantics as implemented operate in such a way that neither copying the Py_buffer struct *nor* requesting a new copy of the struct from the underlying object will do the right thing in all cases. (According to the PEP *as written* copying probably should have been OK, but the implementation doesn't match the PEP in several important respects such that copying is definitely wrong in the absence of tight control of the lifecycles of copies relative to the original). Therefore, we either need to redesign the buffer export from memoryview to use daisy chaining (such that in "m = memoryview(obj); m2 = m[:]; m3 = m2[:]" m3 references m2 which references m which in turn references obj) or else we need to introduce an internal reference counted object (PyManagedBuffer) which allows a single view of an underlying object to be safely shared amongst multiple clients (such that m, m2 and m3 would all reference the same managed buffer instance which holds the reference to obj). My preference is strongly for the latter approach as it prevents unbounded and wasteful daisy chaining while also providing a clean, easy to use interface that will make it easier for 3rd parties to write PEP 3118 API consumers (by using PyManagedBuffer instead of the raw Py_buffer struct). Once that basic lifecycle problem for the underlying buffers is dealt with then we can start worrying about other problems like exporting Py_buffer objects from memoryview instances correctly. The lifecycle problem is unrelated to the details of the buffer *contents* though - it's entirely about the fact that clients can't safely copy all those pointers (as some may refer to addresses inside the struct) and asking the original object for a fresh copy is permitted to return a different answer each time. The actual *slicing* code in memoryview isn't too bad - it just needs to use dedicated storage rather than messing with the contents of the Py_buffer struct it received from the underlying object. Probably the easiest way to handle that is by having the PyManagedBuffer reference be in *addition* to the current Py_buffer struct in the internal state - then the latter can be used to record the effects of the slicing, if any. Because we know the original Py_buffer struct is guaranteed to remain alive and unmodified, we don't need to worry about fiddling with any copied pointers - we can just leave them pointing into the original structure. When accessed via the PEP 3118 API, memoryview objects would then export that modified Py_buffer struct rather than the original one (so daisychaining would be possible, but we wouldn't make it easy to do from pure Python code, as both the memoryview constructor and slicing would give each new memoryview object a reference to the original managed buffer and just update the internal view details as appropriate. Here's the current MemoryView definition: typedef struct { PyObject_HEAD Py_buffer view; } PyMemoryViewObject; The TL;DR version of the above is that I would like to see it become: typedef struct { PyObject_HEAD PyManagedBuffer source_data; // shared read-only Py_buffer access Py_buffer view; // shape, strides, etc potentially modified } PyMemoryViewObject; Once the internal Py_buffer had been initialised, the memoryview code actually wouldn't *use* the source data reference all that much (aside from eventually releasing the buffer, it wouldn't use it at all). Instead, that reference would be retained solely to control the lifecycle of the original Py_buffer object relative to the modified copies in the various memoryview instances. Does all that make my perspective any clearer? |
|||
msg139314 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-06-27 16:47 | |
Pauli Virtanen <report@bugs.python.org> wrote: > skrah writes: > > For example, an exporting object could provide a sliced view by adding > > a getslicedbufferproc to PyBufferProcs: > > > > int PyObject_GetSlicedBuffer(PyObject *obj, Py_buffer *view, > > int flags, PyObject *key); > > The same thing can be done via > > PyObject_GetBuffer(obj, &view, flags); > PyBuffer_Slice(&view, &sliced_view, flags, key); > > given an implementation of PyBuffer_Slice. The logic in PyBuffer_Slice does > not depend on where the buffer comes from, and every buffer can be sliced. Ok, that sounds good. I came across a comment that base objects can change their memory layout: http://mail.python.org/pipermail/python-dev/2007-April/072606.html Is that something that must be taken care of? > > o The invariant that all allocated memory in the buffer belongs > > to the exporting object remains intact. > > Numpy arrays do not have this invariant, and they happily re-export memory > owned by someone else. I'm not sure if we use the same terminology. By "exporting object" I meant the original base object and this is the invariant I wanted: m1 = memoryview(base) # directly from base m2 = memoryview(m1) # redirects getbuffer/releasebuffer to base m3 = memoryview(m2) # redirects getbuffer/releasebuffer to base s1 = m3[1::2, 1::2] # redirects getslicedbuffer/releasebuffer to base That's also what you mean by "re-exporting", right? |
|||
msg139316 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-06-27 17:17 | |
Nick Coghlan <report@bugs.python.org> wrote: [Snip liberally] > The lifecycle problem is unrelated to the details of the buffer *contents* though - it's entirely about the fact that clients can't safely copy all those pointers (as some may refer to addresses inside the struct) and asking the original object for a fresh copy is permitted to return a different answer each time. > The actual *slicing* code in memoryview isn't too bad I promise that I'll keep quiet about the getslicedbufferproc from now on, since there isn't much enthusiasm. :) The reason I kept mentioning it was that I thought it would eliminate the need to copy anything at all. All buffers would come from a single, memory owning base object. > Does all that make my perspective any clearer? Yes, thank you. The tricky part is to understand why always redirecting getbuffer/releasebuffer to the underlying *original base object* is not sufficient, but as I understood Pauli's last posting that is due to the addition of the release() method. |
|||
msg139317 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-06-27 17:21 | |
Le Mon, 27 Jun 2011 13:17:57 +0000, Nick Coghlan <report@bugs.python.org> a écrit : > > The TL;DR version of the above is that I would like to see it become: > > typedef struct { > PyObject_HEAD > PyManagedBuffer source_data; // shared read-only Py_buffer access > Py_buffer view; // shape, strides, etc potentially modified > } PyMemoryViewObject; Looks ok to me. (well, acceptable anyway) |
|||
msg139320 - (view) | Author: Lenard Lindstrom (kermode) | Date: 2011-06-27 18:14 | |
Using Python reference counting and the garbage collector to control when PyBuffer_Release is called has problems. First, it assumes the CPython interpreter will always use reference counting. Second, PyBuffer_Release may never get called if circular references exist. Third, an exception could keep a memoryview object alive in a traceback, delaying the PyBuffer_Release call. Finally, it does not play well with the memoryview __enter__, __exit__, and release methods. It makes Python level context management pointless; instead, just del the memoryview instance and hope the garbage collector cooperates. For the old buffer protocol and the Numpy array interface, resources have to be released in an object's destructor at garbage collection time. There is no other choice. If that also becomes the case for the new buffer protocol, then there is little incentive to migrate to it. |
|||
msg139323 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-06-27 18:36 | |
Lenard Lindstrom writes: > Using Python reference counting and the garbage collector to control > when PyBuffer_Release is called has problems. That's not what's being suggested. The refcounting discussed here is an implementation detail internal to memoryview, and does not affect the possibility to implement `release()` and context management reliably. |
|||
msg139325 - (view) | Author: Lenard Lindstrom (kermode) | Date: 2011-06-27 20:03 | |
It would if the proposed PyManagedBuffer only releases the Py_buffer struct - calls PyBuffer_Release - when its Python reference count goes to zero. So a separate reference count will be maintained instead. |
|||
msg139341 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-06-28 00:58 | |
memoryview.release() will raise an exception if you call it when the underlying PyManagedBuffer instance has a refcount > 1. So the explicit memory management still works, and if you mess it up by leaving dangling references around you'll run the risk of getting an exception (CPython will mostly cleanly things up for you due to the refcounting, but we're well used to CPython users getting away with that kind of bug and being surprised when they port to a fully garbage collected implementation like PyPy, IronPython or Jython). It isn't pretty, but it's the most practical way to cope with the lifecycle issues. The other alternative is to obediently force release of the buffer and then throw exceptions on subsequent *access*, the way we do with files. I'm less in favour of that approach, since it results in additional runtime overhead as you have to keep checking the state validity and I think it generates the exception in the wrong place - the exception should be triggered where the implicit assertion that "all other references are gone and this object should be released right now" has been violated, not at some later random location where the object happens to be used. (I'm also swayed by the backwards compatibility argument: we can easily move from "refcount must be 1" enforcement to the laissez-faire approach used by files. You can't go in the other direction without the potential to break working code) The reason redirecting all requests to the underlying object doesn't work is because repeated calls to getbuffer on mutable objects are allowed to return *different* answers. Assume we have a mutable array type that allows changes while memory is exported by keeping the old buffer around until all references are released. Then we want the following behaviour: a = mutable_byte_array(100) # 100 byte array m1 = memoryview(a) # View of original 100 byte array a.resize(0) # Array is now zero length, but old buffer is still valid m2 = m1[:] m3 = memoryview(m1) # Both m2 and m3 should provide access to the *original* 100 byte # array, not to the now empty *current* array # Having len(m2) and len(m3) differ from len(m1) because the # original source array had changed would be incredibly confusing # When m1, m2 and m3 all go away then the original array will be # released The builtin bytearray avoids this issue by simply disallowing mutation while a buffer is exported, but memoryview needs to cope with arbitrary third party objects which may behave like the hypothetical mutable_byte_array(). |
|||
msg139747 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-04 10:28 | |
In order to have a basis for discussion, I've set up a repo at http://hg.python.org/features/pep-3118#memoryview with an implementation of PyManagedBuffer. The whole test suite passes, also with refleak counting and Valgrind. Review is most welcome. If you think that this is roughly what PyManagedBuffer is supposed to look like, I can add some tests and documentation. A couple of remarks: o I have used refcounting assumptions that appear to be valid for the Python codebase, but that need to be documented. o The docs state that if an underlying object supports writable views, the memoryview will be readable and writable. I use this in PyManagedBuffer_FromObject(). In fact there are tests that write to the original exporting object. o Releasing a view raises if more than one view is active. o get_shape0() is used in memory_length(). This does not seem correct if ndim > 1. o memory_getbuf() is still not a real getbufferproc, since it disregards almost all flags. o PyMemoryView_GetContiguous() needs to create a view with an underlying PyManagedBuffer. If the obj parameter is already a memoryview, the existing PyManagedBuffer has to be used, so I just do a check if the requested buffertype agrees with view->readonly. It would be possible to complicate the code further by having a PyMemoryView_FromObjectAndFlags(). o In the non-contiguous case, PyMemoryView_GetContiguous() creates a new buffer, but the returned view does not use that buffer. |
|||
msg139748 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-04 10:42 | |
Nick Coghlan <report@bugs.python.org> wrote: > The reason redirecting all requests to the underlying object doesn't work > is because repeated calls to getbuffer on mutable objects are allowed to > return *different* answers. Assume we have a mutable array type that allows > changes while memory is exported by keeping the old buffer around until all > references are released. Then we want the following behaviour: > > The builtin bytearray avoids this issue by simply disallowing mutation while > a buffer is exported, but memoryview needs to cope with arbitrary third party > objects which may behave like the hypothetical mutable_byte_array(). I find this behavior quite sane. Would it not be an option to make this a requirement in the PEP? As far as I can see, Numpy also disallows mutations on exported buffers. |
|||
msg139751 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-07-04 10:59 | |
@skrah: Yes, Numpy exposes only a single buffer per object. Making this a requirement in the PEP would probably be a sane change, as there is probably little real-world need to allow it behave otherwise. Comment on the patch: it seems you do not track the re-export count in memory_getbuf: a = memoryview(obj) b = numpy.asarray(a) a.release() b[0] = 123 # <-- BOOM: the buffer was already released Could be fixed by Py_INCREF(self->mbuf) in getbuffer and DECREF in releasebuffer. In this design, the only choice is to make the `release()` call to fail. (I had some code for n-dim slicing etc. in my first patch that could be useful to have too; I'll see if I find time to dig them out here.) |
|||
msg139758 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-04 12:21 | |
Pauli Virtanen <report@bugs.python.org> wrote: > Comment on the patch: it seems you do not track the re-export count in memory_getbuf: > > a = memoryview(obj) > b = numpy.asarray(a) > a.release() > b[0] = 123 # <-- BOOM: the buffer was already released Could you give the exact sequence of events (including the creation of obj)? For me this works: Python 3.3a0 (memoryview:bbe70ca4e0e5+, Jul 4 2011, 13:55:55) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> obj = bytearray(b'123456789') >>> a = memoryview(obj) >>> b = numpy.asarray(a) >>> a.release() Traceback (most recent call last): File "<stdin>", line 1, in <module> BufferError: several memoryviews are active >>> b[0] = 123 >>> b array([123, 50, 51, 52, 53, 54, 55, 56, 57], dtype=uint8) >>> del a >>> b[0] = 224 >>> b array([224, 50, 51, 52, 53, 54, 55, 56, 57], dtype=uint8) > (I had some code for n-dim slicing etc. in my first patch that could be > useful to have too; I'll see if I find time to dig them out here.) That would be nice. |
|||
msg139760 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-04 12:27 | |
It might be worth postponing the multi-dimensional support to a second patch, though. If we can get the buffer lifecycle solid first then that provides a better foundation for any further development. |
|||
msg139762 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-04 12:30 | |
As far as the rule of disallowing shape changes while a buffer is exported, I actually think that's a more sane approach as well. However, I've been burned enough times by going "nobody would be insane enough to rely on that, would they?" that I'm seriously reluctant to impose additional backwards incompatible rules on a published spec when it isn't *too* difficult to adjust the infrastructure to better handle the existing complexity. |
|||
msg139764 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-04 13:00 | |
[I agree that multi-dimensional support should not be part of this patch. I was thinking about creating a separate branch for that.] Nick Coghlan <report@bugs.python.org> wrote: > As far as the rule of disallowing shape changes while a buffer is exported, > I actually think that's a more sane approach as well. However, I've been > burned enough times by going "nobody would be insane enough to rely on that, > would they?" that I'm seriously reluctant to impose additional backwards > incompatible rules on a published spec when it isn't *too* difficult to > adjust the infrastructure to better handle the existing complexity. Yes, I understand. However, Pauli said earlier: "I don't see many use cases for the same object exporting multiple different buffers. It's not needed by Numpy, and I suspect there is no existing 3rd party code that relies on this (because it doesn't work with the current implementation of memoryview :)" That's why I thought that no one could be using this feature at present. The main problem that have with PyManagedBuffer is that the capabilities of the original underlying buffer (flags) *at the time of the export* aren't stored anywhere. This makes it hard to respond to specific queries in memory_getbuf() and PyMemoryObject_GetContiguous(). You can't query the original object again since it might have changed. I suppose that an existing memoryview could recompute its own capabilities by using PyBuffer_IsContiguous() and friends. It would be much easier though if the original flags were stored in the buffer by PyObject_GetBuffer(). |
|||
msg139766 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-04 13:26 | |
Nice work with the patch Stefan - I've added a few review comments (including a suggestion on how to deal with the GetContiguous problem). One idea that review did prompt is that if we aren't going back to the original object for fresh buffer requests, perhaps we should expose the underlying object as a read-only property of the memoryview objects. That way if the original view is unsuitable (e.g. read-only when a writable buffer is needed) it should be straightforward to go back to the underlying object to request something different. Another question Stefan raised is whether or not PyMemoryView_FromObject should accept a "flags" argument that it passes on to the underlying "getbuffer" call. I'm inclined to say yes - ideally I'd like to *only* expose PyMemoryView and PyManagedBuffer (rather than GetBuffer/ReleaseBuffer) for the limited API, which means accepting the flags argument for the two higher level interfaces. |
|||
msg139767 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-04 13:29 | |
Is there anything stopping us just storing the flags on PyManagedBuffer? It's OK if the construction API requires the flag information in addition to the Py_buffer struct. |
|||
msg139775 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-07-04 14:36 | |
@skrah: Ahh, this always happens when I don't run it :) But my point stands -- the reason why it does not crash with Numpy is that Numpy calls PyMemoryView_FromObject to obtain a new memoryview and then uses PyMemoryView_GET_BUFFER. Along this code path the refcount of self->mbuf gets properly incremented, as it's explicitly done in PyMemoryView_FromObject. However, if you have a custom object Foo which just calls `PyObject_GetBuffer`, and then do the same sequence a = memoryview(whatever) b = Foo(a) # --> only grabs the buffer with PyObject_GetBuffer a.release() # --> will invalidate the buffer b.mogrify_buffer_contents() # --> boom Here, the buffer is released too early, as the refcount of `self->mbuf` is not incremented during the `PyObject_GetBuffer` call. > Is there anything stopping us just storing the flags on > PyManagedBuffer? Slicing memoryviews can invalidate the contiguity flags, and no-strides flags, so some checks are still probably needed in `memory_getbuf`. > It's OK if the construction API requires the flag > information in addition to the Py_buffer struct. This would be useful, yes. |
|||
msg139826 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-05 02:43 | |
On Tue, Jul 5, 2011 at 12:36 AM, Pauli Virtanen <report@bugs.python.org> wrote: > Slicing memoryviews can invalidate the contiguity flags, and no-strides flags, so some checks are still probably needed in `memory_getbuf`. That makes sense, so just as memoryview has its own shape and strides data, it will also need its own flags data that *starts* as a copy of that in the original buffer, but may be subsequently modified. |
|||
msg139834 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-05 08:39 | |
Nick, Pauli, thanks for all the comments. I'm busy implementing the easy changes; then it'll be easier to deal with the flags issues. Pauli: Does numpy use the (undocumented) smalltable array in the Py_buffer structure? We would like to drop it. |
|||
msg139846 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-05 10:49 | |
I've uploaded a revised version that addresses several suggestions. I think we have agreement on those now: - Officially ditch smalltable. - Comment static storage fields inside PyMemoryViewObject. - Improve refcounting in PyMemoryView_FromBuffer()/PyMemoryView_FromObject(). - Increment mbuf refcount in memory_getbuf(). - Create separate sections for managedbuffer and memoryview. Still open: - Update documentation. - Should PyManagedBuffer be private to this file? Do we need mbuf_new()? - Add test to _testcapimodule.c. I wrote a small test for the problematic case in PyMemoryView_GetContiguous(), and it indeed returns an unaltered view. I suggest that we leave the NotImplementedError for now and handle that in a separate issue. - Flag handling. |
|||
msg139856 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-05 11:44 | |
I'm slightly confused about the implication chain in the flags. PyBUF_STRIDES seem to allow for discontiguous arrays, yet STRIDES -> ND -> C_CONTIGUOUS. PyBUF_FULL[_RO] | PyBUF_INDIRECT -------------- PyBUF_FORMAT ----------[PyBUF_WRITABLE] | PyBUF_STRIDES (This would be used when the consumer can handle strided, discontiguous arrays ...) | PyBUF_ND <-> PyBUF_CONTIG (why?) | PyBUF_C_CONTIGUOUS (... but the implication chain leads us to a contiguous buffer) |
|||
msg139859 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-07-05 11:49 | |
> I'm slightly confused about the implication chain in the flags. PyBUF_STRIDES > seem to allow for discontiguous arrays, yet STRIDES -> ND -> C_CONTIGUOUS. To be honest I have never understood anything about these flags, and I doubt anyone without a numpy background would. |
|||
msg139861 - (view) | Author: Pauli Virtanen (pv) * | Date: 2011-07-05 12:11 | |
The flags don't seem to be meant to describe the properties of the buffer, only what the exporter is required to fill in. STRIDES does not imply necessarily discontinuous, only that the `strides` field is present. The C_/F_/ANY_CONTIGUOUS flags imply that the memory layout of an n-dim array is C/Fortran/either contiguous. Why these flags imply STRIDES is probably to make the result unambiguous, and because typically when dealing with n-d arrays you usually need to know the strides anyway. `NULL` `strides` implies C-contiguous, so the CONTIG flag does not imply STRIDES (no idea why it's different from PyBUF_C_CONTIGUOUS). |
|||
msg139865 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-05 12:47 | |
It took me a bit of thinking, but I figured out why the "contiguous" flags imply STRIDES. A quick recap of all the flags: WRITABLE -> error if can't support write access FORMAT -> request format info in Py_buffer struct. Should never error, but report unsigned bytes if not requested ND -> requests shape info in Py_buffer struct. Report 1 dimensional if not requested. Error if data is not C contiguous (as STRIDES is required to handle any non-C contiguous cases). STRIDES -> requests shape and stride info. Error if correct buffer access requires stride support and this flag is not passed. C_CONTIGUOUS/F_CONTIGUOUS/ANY_CONTIGUOUS -> variants that also request shape and stride info but are limited to handling C contiguous memory, Fortran contiguous memory or either. INDIRECT -> requests shape and suboffset info. Error if correct buffer access requires suboffset support and this flag is not passed. So, to address the specific confusion, the basic "STRIDES" request just says "give me the strides info" and I can deal with whatever you give me. The "CONTIGUOUS" variants say "give me the strides info, but I can only cope with certain layouts, so error if you can't provide them". "ND" is a way to say "I can copy with multiple dimensions, but only the C version without using strides info" Suppose we have a 3x4 array of unsigned bytes (i.e. 12 bytes of data). In C format, the strides info would be [4, 1] (buf[0][0] and buf[0][1] are adjacent in memory, while buf[0][0] and buf[1][0] are 4 bytes apart). In FORTRAN format that layout is different, so the strides info would be [1, 3] (and now buf[0][0] and buf[1][0] are adjacent while buf[0][0] and buf[0][1] are 3 bytes apart). The difference between ND and C_CONTIGUOUS is that the latter asks for both the shape and strides fields in the Py_buffer object to be populated while the former only requests shape information. |
|||
msg139866 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-05 12:49 | |
At least, that's the explanation based on the PEP - not sure where "CONTIG" as an alias for ND (N-dimensional) comes from. But then, smalltable was an undocumented novelty, too :) |
|||
msg139868 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-05 12:59 | |
To address the "should PyManagedBuffer be public?" question: yes, I think so. Given the amount of grief the raw PEP 3118 API has caused the memoryview implementation, I expect the easier lifecycle management provided by the PyObject based API may also help 3rd parties. |
|||
msg139870 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-05 13:13 | |
Regarding the Reitveld cc field: I tend not to add anyone to that and instead post comments to the tracker item to say that I've finished a review in Reitveld. If people want to see details they can go look at the review itself (or remove themselves from the bug nosy list if they have genuinely lost interest). |
|||
msg139871 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-07-05 13:42 | |
> Regarding the Reitveld cc field: I tend not to add anyone to that and > instead post comments to the tracker item to say that I've finished a > review in Reitveld. If people want to see details they can go look at > the review itself (or remove themselves from the bug nosy list if they > have genuinely lost interest). Be aware the Rietveld integration is buggy: for example, I got no notification of the current reviews. So it's better to post a message mentioning the review anyway. |
|||
msg139872 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-05 14:02 | |
I don't think that's a bug, it's a missing feature in the integration (there's a request on the metatracker to add automatic notifications of new reviews on the bug itself). I did mention the review above but it would have been easy to miss amongst the other comments. |
|||
msg139873 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-07-05 14:09 | |
> I don't think that's a bug, it's a missing feature in the integration > (there's a request on the metatracker to add automatic notifications > of new reviews on the bug itself). It is a bug, actually. People on the nosy list are also on the Rietveld cc list, but in the wrong form. See http://psf.upfronthosting.co.za/roundup/meta/issue382 |
|||
msg139874 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-07-05 14:09 | |
(and so, for the record, I've added my own small review :)) |
|||
msg139875 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-05 14:37 | |
Moving this discussion out of the review comments: Antoine is wanting to make release() nondeterministic by having the underlying buffer only released when all views using it either have release() called or are no longer referenced. I contend that release() needs to mean "release the underlying memory *right now*" or it is completely pointless. The "I don't want to care about lifecycle issues" approach is already handled quite adequately by the ordinary refcounting semantics. If ensuring that all references have been eliminated before release() is called is too much work for a user then the answer is simple: don't call release() and let the refcounting do the work. |
|||
msg139877 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-07-05 14:55 | |
> Antoine is wanting to make release() nondeterministic by having the > underlying buffer only released when all views using it either have > release() called or are no longer referenced. That's not "nondeterministic" if everyone calls release(). Less so than garbage collection anyway. > I contend that release() needs to mean "release the underlying memory > *right now*" or it is completely pointless. The "I don't want to care > about lifecycle issues" approach is already handled quite adequately > by the ordinary refcounting semantics. Well, if you assume refcounting and no reference cycles, then release() is AFAICT already useless. See issue9757 for the argument we had with Guido ;) My issue is that until now sliced memoryviews are independent objects and are not affected by the releasing of the original memoryview. With this patch, they are, and that's why I'm advocating for a subtler approach (which would really mirror the current slicing semantics, and wouldn't break compatibility ;)). release() is supposed to mean "you can dispose of this memoryview", not "you can dispose of any underlying memory area, even if there's some sharing that I as an user don't know anything about" (*). By making release() affect "related" memoryviews we are exposing an internal implementation detail (the PyManagedBuffer sharing) as part of the API. (*) for something similar, if you close() a file-like object obtained through socket.makefile(), it doesn't close the underlying fd until all other file-like objects are closed too |
|||
msg139878 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-05 15:33 | |
Antoine Pitrou <report@bugs.python.org> wrote: > My issue is that until now sliced memoryviews are independent objects > and are not affected by the releasing of the original memoryview. With > this patch, they are, and that's why I'm advocating for a subtler > approach (which would really mirror the current slicing semantics, and > wouldn't break compatibility ;)). I wrote a comment on rietveld (which failed to get mailed again). My plan is to make the sliced views more independent by copying shape, strides, and suboffsets unconditionally on construction. Then it should always be possible to delete views independently. With respect to releasing, the views are of course still dependent. > release() is supposed to mean "you can dispose of this memoryview", not > "you can dispose of any underlying memory area, even if there's some > sharing that I as an user don't know anything about" (*). By making > release() affect "related" memoryviews we are exposing an internal > implementation detail (the PyManagedBuffer sharing) as part of the API. I thought the rationale for the release() method was to allow sequences like: b = bytearray() m1 = memoryview(b) m1.release() -> must call releasebuffer instantly. b.resize(10) -> this might fail otherwise if the garbage collection is too slow. So I think releasebuffer must be called on the original base object, and only the ManagedBuffer can do that. |
|||
msg139883 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-07-05 15:46 | |
> I thought the rationale for the release() method was to allow sequences like: > > b = bytearray() > m1 = memoryview(b) > m1.release() -> must call releasebuffer instantly. > b.resize(10) -> this might fail otherwise if the garbage collection is too slow. Well, that would still work with my proposal. Now consider: def some_library_function(byteslike): with memoryview(byteslike) as m2: # do something with m2 with memoryview(some_object) as m1: some_library_function(m1) ... print(m1[0]) That m1 becomes unusable after m2 is released in the library function is completely counter-intuitive, and will make memoryviews a pain to use in real life. |
|||
msg139919 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-06 04:07 | |
If someone is calling release() on all of their views (including slices) than they won't have any problems. The only way they can get into trouble is if they have a slice or copy that they *aren't* explicitly releasing, and in that case they *already* have a problem and we're just picking it up sooner (although, in the case of CPython, refcounting will likely take care of it, just as it does for similar problems with files). This is why I suggested that we should start exposing the source object at the Python level as an attribute of memoryview objects. Then people can decide for themselves if they want a "view-of-a-view" by slicing/copying the memoryview directly or a new, independent view by going back to the original object. |
|||
msg139920 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-06 04:16 | |
Oops, Antoine's right, the release() semantics in the patch are broken, albeit for the precisely opposite reasons: that example will actually blow up with BufferError inside some_library_function(). I withdraw my objection - Antoine's right that release() on a memoryview object has to just mean "drop the reference to the ManagedBuffer instance". Calling release() when there are actual *exported* buffers outstanding should still trigger BufferError though (slices and copies don't count in that case, as they'll have their own independent references to the ManagedBuffer instance). |
|||
msg139921 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-06 04:21 | |
Sorry, I mischaracterised Antoine's suggestion in my last comment. It's more in the nature of ManagedBuffer exposing two APIs: 1. request_release(): tells ManagedBuffer "if I have the last reference, release the buffer now". 2. release(): immediately releases the buffer, or triggers SystemError if there are additional references to the buffer. memoryview.release() would call the first method before dropping the reference to the buffer. |
|||
msg139932 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-06 12:39 | |
Antoine is right, this needs to be fixed. I think that for *practical* purposes, the existing release() method already behaves like a tryrelease() method: Traceback (most recent call last): File "<stdin>", line 1, in <module> BufferError: Existing exports of data: object cannot be re-sized >>> So while m1.release() in fact *does* release a buffer, the desired effect (freeing up 'b' for subsequent operations) only happens after also calling m2.release(). This applies to Python and NumPy objects. With this amount of complexity, it might be a good idea to separate refcounting and exports, like so: typedef struct { PyObject_HEAD int released; Py_ssize_t exports; /* total number of exports */ Py_buffer master; } PyManagedBufferObject; typedef struct { PyObject_HEAD PyManagedBufferObject *mbuf; Py_ssize_t shape[Py_MEMORYVIEW_MAXSTATIC]; Py_ssize_t strides[Py_MEMORYVIEW_MAXSTATIC]; Py_ssize_t suboffsets[Py_MEMORYVIEW_MAXSTATIC]; int released; /* for clarity in the code */ Py_ssize_t exports; /* buffer exports */ Py_buffer view; } PyMemoryViewObject; Then it is possible for a MemoryView to call mbuf_release() from memory_release() if self->exports == 0 and --self->mbuf->exports == 0. So, if I understand Antoine correctly, in following graph m1, m2 and m4 could mark themselves as 'released' and decrement self->mbuf->exports. This could happen in any order. What should happen with m2.release()? Raising would be odd here as well, since Pauli stated earlier that some people might actually construct new objects from an exported buffer. m2 could mark itself as 'release_pending', and henceforth only accept releasebuffer requests from b1, b2 and x1. Once those are all in, it releases itself properly and sends the message to mbuf. 'release_pending' could be expressed as (self->released && self->exports > 0). ManagedBuffer (mbuf) | ----------------------------------- | | m1 (private arrays) m3 (private arrays) | | m2 m4 (private arrays) | ----------------- | | | | b1 b2 (regular temporary buffer exports) | x1 (new object constructed from a buffer export) Antoine, was this roughly your suggestion? |
|||
msg139933 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-07-06 12:42 | |
[The first part of the message again, this time via the web interface.] Antoine is right, this needs to be fixed. I think that for *practical* purposes, the existing release() method already behaves like a tryrelease() method: >>> b = bytearray(b'123456789') >>> m1 = memoryview(b) >>> m2 = memoryview(m1) >>> m1.release() >>> b.append(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> BufferError: Existing exports of data: object cannot be re-sized >>> So while m1.release() in fact *does* release a buffer, the desired effect (freeing up 'b' for subsequent operations) only happens after also calling m2.release(). This applies to Python and NumPy objects. |
|||
msg139935 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-07-06 13:01 | |
Le mercredi 06 juillet 2011 à 12:39 +0000, Stefan Krah a écrit : > Antoine, was this roughly your suggestion? I think so (!), but I also agree with Nick that raising BufferError when calling release() on a memoryview with exports is a reasonable alternative. |
|||
msg139936 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-07-06 13:16 | |
Yeah, the reason my originally proposed semantics were wrong is because copying (or slicing) a memoryview object and then explicitly releasing that object would always fail through no fault of that code. That would be broken and the only way to fix it is to allow the release() call but not actually call the ReleaseBuffer API until all the buffer references are gone. Exports are different - whenever you copy or slice a memoryview, you get a new memoryview object with the export count set to zero, so it is reasonable to require that anyone that wants to explicitly release the memoryview's buffer reference make sure that there aren't any exports hanging around. Keep in mind that memoryview copies and slices don't increase the export count, as those will refer directly back to the original managed buffer. +1 on tracking buffer exports explicitly rather than relying solely on playing games with refcounts though - while technically redundant in the ManagedBuffer case, I agree it will make the code far more self-explanatory. |
|||
msg141858 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-08-10 11:46 | |
I thought it might be productive to switch to documentation/test driven development for PEP-3118 in general. So I updated the documentation, trying to spell out the responsibilities of both exporter and consumer as clearly as possible. In order to have a PEP-3118 reference implementation, I wrote Modules/_testbuffer.c and Lib/test/test_buffer.py. The test module contains an ndarray object (independent from NumPy's ndarray) with these features: o Full base object capabilities, including responding to flag specific requests. o Full re-exporter capability: The object obtains a buffer from another exporter and poses as a base object. o Optional capability to change layout while buffers are exported. o Full support for arbitrary format strings using the struct module. o Fortran style arrays. o Arbitrary multidimensional structures, including offsets and negative strides. o Support for converting arrays to suboffset representations. o Support for multidimensional indexing, slicing and tolist(). o Optional support for testing against NumPy. In memoryobject.c I only fixed the buffer release issues that came up. Before proceeding with allocating private arrays for each memoryview, it would be great to have agreement on the revised documentation and a couple of other issues. Documentation ------------- I'll highlight some changes here: 1) view->obj: Must be a new reference to the provider if the buffer is obtained via getbuffer(), otherwise NULL. In case of failure the field MUST be set to NULL (was: undefined). So, logically this should be seen the same way as returning a new reference from a standard Python function. 2) view->buf: This can (and _does_ for Numpy arrays) point to any location in the underlying memory block. If a consumer doesn't request one of the simple or contiguous buffers, all bets are off. 3) view->len: Make it clear that the PEP defines it as product(shape) * itemsize. 4) Ownership for shape, strides, internal: read-only for the consumer. 5) Flag explanations: The new section puts emphasis on which fields a consumer can expect in response to a buffer request, starting with the fields that will always be set. 6) Rule for writable/read-only requests. Only raise if the buffer is read-only and the request is 'writable'. This seems to be the most practical solution. 7) Add NumPy-style and PIL-style sections to make clear what a consumer must be able to handle if complicated structures are requested (thanks Pauli Virtanen for clarifying what valid NumPy arrays may look like). What I would like to spell out clearly: 8) A PEP-3118 compliant provider MUST always be able to respond to a PyBUF_FULL_RO request (i.e. fill in shape AND strides). ctypes doesn't currently do that, but could be fixed easily. This is easy to implement for the exporter (see how PyBuffer_FillInfo() does it in a few lines). Memoryview ---------- 8) Add PyMemoryView_FromBytes(). This could potentially replace PyMemoryView_FromBuffer(). 9) I've come to think that memoryviews should only be created from PyBUF_FULL_RO requests (this also automatically succeeds if an object is writable, see above). One reason is that we don't constantly have to check for the presence of shape and strides and format (unless ndim = 0). Currently it is possible to create any kind of view via PyMemoryView_FromBuffer(). It would be possible to deprecate it or make it a rule that the buffer argument must have full information. 10) NumPy has a limit of ndim = 64. It would be possible to use that limit as well and make shape, strides and suboffsets static arrays of size 64. Perhaps this is a bit wasteful, it depends on how many views are typically created. 11) test_buffer.py contains an example (see: test_memoryview_release()) that Antoine's test case will not work if a re-exporter is involved. Should we leave that or fix it, too? My apologies for the long list, but it'll be easier to proceed if some rules are written in stone. :) |
|||
msg142191 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-08-16 13:25 | |
70a8ccd53ade fixes conversion of NumPy-style arrays to a suboffset representation. The previous representation did not work for slicing non-contiguous arrays with backward strides. Also, I've added long comments that hopefully explain how slicing with arbitrary strides and suboffsets is supposed to work. |
|||
msg143727 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-09-08 14:53 | |
Here's a completely restructured memoryview implementation that I believe is quite robust. Both memoryobject.c (the parts I worked on, which is 91%) and _testbuffer.c have 100% code coverage, including all error conditions [1]. memoryview is tested against the struct module (via _testbuffer's ndarray), array.array and bytearray. To deal with the case explosions inherent to the specification test_buffer.py resorts to brute force in some cases (most notably in testing getbuffer() flags). PyMemoryViewObject is now a PyVarObject with private arrays. Unless ndim = 0, shape and strides are always present after initialization. Memoryview now has support for all native single-value struct module specifiers. I abandoned the idea of using the struct module for packing/unpacking. New benchmarks (See #10227) indicate that _testbuffer's tolist() is over 1000x slower that the optimized versions of memoryview. The cast function from #5231 is completely implemented. Review would be much appreciated. Perhaps it would be possible to do a mini PEP-3118 sprint? Usually I'm never on IRC, but I could dust off irssi for the occasion. These are the main changes in detail: ------------------------------------- o Restructure memoryobject.c into sections. o Use struct hack in PyMemoryViewObject for the dynamic arrays. o Rework initialization. o Add a couple of invariants: A new memoryview will always have complete shape/strides/format information, unless ndim = 0. o Add buffer flags: A new memoryview will always have flag information that determines whether it is a scalar (ndim = 0), C/Fortran contiguous, NumPY or PIL style, released or active. This eliminates the need for expensive re-computations during a getbuffer() request. o Add support for all native struct module formats in indexing, assigning and tolist(). o Add memoryview.cast(format=x, shape=y), where x is a native format specifier and y is any multidimensional shape. o Add PEP-3118 compliant getbuffer() method. o Slice assignments now support non-contiguous 1-D slices. o Comparisons now support non-contiguous 1-D buffers. o Representation of empty shape etc. is now an empty tuple (O.K. with Mark Dickinson and Travis Oliphant). [1] 100% coverage requires a patch for systematically triggering failures of Python API functions. |
|||
msg143730 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-09-08 15:05 | |
> is over 1000x slower that the optimized versions of memoryview. Oh dear, scratch that. Lets make that 15x. ;) |
|||
msg144234 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-09-18 09:22 | |
Revision 4492afe05a07 allows memoryview to handle objects with an __index__() method. This is for compatibility with the struct module (See also #8300). |
|||
msg151360 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-01-16 13:23 | |
Finally reviewed Stefan's latest patch. It mostly looks great to me. Aside from a few minor cosmetic quibbles, the only substantive change I would like is to expose the Py_buffer len field as "memoryview.size" instead of "memoryview.len" (to reduce confusion with "len(memoryview)" and to encourage a conceptual link with sys.getsizeof()). As noted in the review, I also think fixing #12384 should be fairly straightforward and it would be nice to have that working when the change is applied. (Kristjan: I added you to the nosy list, since you commented on another issue that you had been poking around in the memoryview internals and noticed a few dodgy things. Stefan's work on this issue is aimed at comprehensively addressing those problems). |
|||
msg151445 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-01-17 11:41 | |
Thanks Nick. Looking through this discussion it looks as though you encountered all the confusing bits that I ran into (dup_buffer, ownership issues, reference counting and so forth.) Good work. |
|||
msg151447 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-01-17 11:53 | |
Thanks for the comments. Most of them should be easy to fix. Nick Coghlan <report@bugs.python.org> wrote: > [...] expose the Py_buffer len field as "memoryview.size" instead of > "memoryview.len" (to reduce confusion with "len(memoryview)" and > to encourage a conceptual link with sys.getsizeof()). I agree with this. The best thing is probably to have both versions work in 3.3 and remove memoryview.len in 3.4 (or later). > As noted in the review, I also think fixing #12384 should be fairly > straightforward and it would be nice to have that working when the > change is applied. tobytes() currently calls PyBuffer_ToContiguous(..., 'C') from Objects/abstract.c. Since the function repeatedly calls PyBuffer_GetPointer(), it has this comment: /* XXX : This is not going to be the fastest code in the world several optimizations are possible. */ So if the fix for tobytes() should go in, I'd like to use the copy_buffer() function from _testbuffer.c. PyBuffer_ToContiguous() would need to be fixed separately. test_buffer.py already asserts that for each result, result.tolist() and result.tobytes() match the results obtained from PyBuffer_GetPointer(indices) (See: test_buffer.py:779), so I'm not worried about using the same implementation in the tests as in the production code. |
|||
msg151537 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-01-18 13:34 | |
Revisiting memoryview.size: I foresee problems for NumPy users, since array.size has a different meaning there: >>> x = array([[1,2,3], [4,5,6]], dtype='q') >>> x.shape (2, 3) >>> x.itemsize 8 >>> len(x) 2 >>> x.size 6 >>> x.nbytes 48 So here we have: x.nbytes == product(shape) * itemsize == Py_buffer.len == (virtual!) byte length x.size == product(shape) == number of elements My suggestion is to use memoryview.nbytes as well. memoryview.size would have the additional problem that Py_buffer.len is always the byte size of the logical structure (e.g. after slicing) and not necessarily the byte size of the physical memory area. |
|||
msg151538 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-01-18 14:40 | |
"nbytes" sounds reasonable to me, given the unfortunate ambiguity of both size and len. As far as #12834 goes, I'm happy to go along with whatever you think is best. You've spent a lot more time down in the guts of the implementation than I have, and the approach you describe seems to make sense :) |
|||
msg152086 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-01-27 12:06 | |
Aside from some minor comments that I included in my review, the latest patch gets a +1 from me. |
|||
msg152090 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-01-27 12:41 | |
Antoine's review picked up on several issues I missed or glossed over - I actually agree with his point about making most of the new APIs private rather than public. With regards to exposing _testbuffer in the documentation of memoryview's hash support, perhaps it would be better to use a 1D bytes object + memoryview.cast() to get an officially supported multi-dimensional view of a region of memory? |
|||
msg152091 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-01-27 12:42 | |
> With regards to exposing _testbuffer in the documentation of > memoryview's hash support, perhaps it would be better to use a 1D > bytes object + memoryview.cast() to get an officially supported > multi-dimensional view of a region of memory? By the way, I didn't point them, but there are other places in the memoryview doc where _testbuffer is mentioned. |
|||
msg152124 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-01-27 21:42 | |
I've added some comments on Rietveld. This time, I pasted the email addresses manually into the CC field, apparently without success (I didn't receive mail). Regarding the use of _testbuffer in the docs: I agree that it's strange, on the other hand it illustrates several examples much better (length of a multi-dimensional view, result of cast to ND, etc). So I'm not sure what to do. Of course I'll take out the examples if you really don't like it. |
|||
msg152145 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-01-28 00:56 | |
Radical suggestion: make it public as collections.simple_ndarray? (prefixing with simple_ to be explicit that this is not even *close* to being the all-singing, all-dancing NumPy.ndarray) |
|||
msg152161 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-01-28 14:13 | |
I've been busy fixing the memoryview.h issues first. Could you take a look at: http://hg.python.org/features/pep-3118/file/f020716704d4/Include/memoryobject.h Changes: a) Make all functions and the two buffer access macros part of the limited API again. b) Make the managed buffer API definitely private. c) Make PyBUF_MAX_NDIM=64 part of the official buffer API. I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc., since the comment already says "... Don't access their fields directly.". |
|||
msg152173 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-01-28 17:19 | |
Nick Coghlan <report@bugs.python.org> wrote: > Radical suggestion: make it public as collections.simple_ndarray? Heh, that's a nice one. :) While I think that the code in _testbuffer.c is sound and very well tested, the API is low-level and optimized for testing all sorts of quirks. Examples: _testbuffer.ndarray has the questionable capability of changing views while buffers are exported. This is to test the pathological case that came up in the discussion. Then, similar to NumPy's ndarray, _testbuffer.ndarray constructs arrays from a flat list. This is a bit clumsy but quite suitable for testing. NumPy's ndarray is also the low level API. I think most people use the high level array: >>> a = array([[1,2], [3,4]], dtype="L") >>> a array([[1, 2], [3, 4]], dtype=uint64) So it would take some effort to polish the API. Meanwhile, to eliminate the use of _testbuffer in the documentation, I think the following might work: I've just added complete support for displaying multi-dimensional lists and multi-dimensional comparisons to memoryobject.c in my private repo. The code is quite succinct and follows exactly the same pattern as copy_base/copy_rec. Then, in the documentation we can use cast() and memoryview.tolist(), but add a comment that cast() is just used for demonstration purposes. |
|||
msg152174 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-01-28 17:29 | |
> Nick Coghlan <report@bugs.python.org> wrote: > > Radical suggestion: make it public as collections.simple_ndarray? > > Heh, that's a nice one. :) Well, the question is: does it have a point? Is this ndarray useful outside of any third-party infrastructure such as the APIs offered by numpy? You might answer that we already have array.array but ISTM that array.array is quite useless in general :) |
|||
msg152175 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-01-28 17:51 | |
Antoine Pitrou <report@bugs.python.org> wrote: > > Nick Coghlan <report@bugs.python.org> wrote: > > > Radical suggestion: make it public as collections.simple_ndarray? > > > > Heh, that's a nice one. :) > > Well, the question is: does it have a point? Is this ndarray useful > outside of any third-party infrastructure such as the APIs offered by > numpy? I think it would be mainly educational. It's easier to understand the whole (multi-dimensional) point of PEP-3118 when there's a concrete object that you can play with. Of course serious users would go straight to NumPy. The other issue is that's it's slightly strange to have a fully featured memoryview with no object in the stdlib that can utilize all features (at least the testing side is now complete). Anyway, right now I don't know myself if I'm for or against it. :) |
|||
msg152188 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-01-28 20:59 | |
> a) Make all functions and the two buffer access macros part > of the limited API again. Hmm, I don't think buffer access macros should be part of the limited API. If they are truely important (which I doubt), we should have equivalent functions for them. > I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc., > since the comment already says "... Don't access their fields directly.". My question is whether there is any point in making these flags. Does 3rd-party code want to manipulate memoryview internals, instead of querying the Py_buffer? (and of course the memoryview object is becoming more and more like another Py_buffer :-)) |
|||
msg152196 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-01-28 22:44 | |
Antoine Pitrou <report@bugs.python.org> wrote: > > a) Make all functions and the two buffer access macros part > > of the limited API again. > > Hmm, I don't think buffer access macros should be part of the limited > API. If they are truely important (which I doubt), we should have > equivalent functions for them. I thought the whole Py_buffer API was only temporarily removed from the limited API until the issues were sorted out: http://bugs.python.org/issue10181#msg125462 For instance, here the macros are not excluded: http://hg.python.org/cpython/file/3292cc222d5c/Include/memoryobject.h Since the issues seem resolved in general, I thought it was time to restore the previous state. [I just noticed that I didn't revert all of Martin's changes, so xxlimited currently does not build in the pep-3118 repo.] As for the two macros specifically, I know Pauli was using them: http://bugs.python.org/issue10181#msg139775 > > I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc., > > since the comment already says "... Don't access their fields directly.". > > My question is whether there is any point in making these flags. Does > 3rd-party code want to manipulate memoryview internals, instead of > querying the Py_buffer? The flags are primarily for the memoryview itself to avoid repeated calls to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER to get the view and also needs to determine the contiguity type, that code could also use the flags. Pauli: If you are still following the issue, do you think having access to contiguity flags is useful for 3rd-party code or not? |
|||
msg152197 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-01-28 23:09 | |
> Antoine Pitrou <report@bugs.python.org> wrote: > > > a) Make all functions and the two buffer access macros part > > > of the limited API again. > > > > Hmm, I don't think buffer access macros should be part of the limited > > API. If they are truely important (which I doubt), we should have > > equivalent functions for them. > > I thought the whole Py_buffer API was only temporarily removed from the > limited API until the issues were sorted out: > > http://bugs.python.org/issue10181#msg125462 I'm talking about the memoryview access macros. It's like PyList_GET_SIZE, it isn't part of the limited API and you have to use PyList_GetSize instead. The limited API promises binary (ABI) compatibility accross releases. It's a very strong promise to make and we must be careful not to put too much stuff in the limited API. I don't think the rule for inclusion is "we should put everything useful in the limited API". > For instance, here the macros are not excluded: > > http://hg.python.org/cpython/file/3292cc222d5c/Include/memoryobject.h It might be a mistake, then. > As for the two macros specifically, I know Pauli was using them: > > http://bugs.python.org/issue10181#msg139775 Well, Pauli might use them, but it just means his code isn't compatible with the limited API, then. > The flags are primarily for the memoryview itself to avoid repeated calls > to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER > to get the view and also needs to determine the contiguity type, that > code could also use the flags. But why would 3rd-party code use PyMemoryView_GET_BUFFER instead of, for example, PyObject_GetBuffer? You seldom have APIs which *expect* a memoryview, I think. Instead, they would expect a buffer-compatible object. |
|||
msg152206 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-01-29 01:09 | |
I'm with Antoine here - we want to be *very* conservative with what we expose through the limited API. Due to the ABI compatibility promise, anything exposed that way is very hard to change. Keeping things out of the limited API isn't really an issue - it just means that anyone that wants to use them needs to rebuild their extensions for each new version of the C API (just as they do now). Addings things to the stable ABI later is easy though - that's why we had Martin take all the Py_buffer related stuff out of the initial version of the limited API. |
|||
msg152208 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-01-29 02:24 | |
Also, +1 to replacing _testbuffer with .cast() to demonstrate the multi-dimensional support. Without an actual multi-dimensional array object in the standard library, demonstrating those aspects is always going to be a little tricky. However, it's worth doing so that the likes of NumPy and PIL don't need to explain those interactions. |
|||
msg152223 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-01-29 12:40 | |
Antoine Pitrou <report@bugs.python.org> wrote: > > I thought the whole Py_buffer API was only temporarily removed from the > > limited API until the issues were sorted out: > > > > http://bugs.python.org/issue10181#msg125462 > > I'm talking about the memoryview access macros. It's like > PyList_GET_SIZE, it isn't part of the limited API and you have to use > PyList_GetSize instead. You're right: I presumed that the macros were excluded temporarily when in fact that had already happened in 62b61abd02b8. > > The flags are primarily for the memoryview itself to avoid repeated calls > > to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER > > to get the view and also needs to determine the contiguity type, that > > code could also use the flags. > > But why would 3rd-party code use PyMemoryView_GET_BUFFER instead of, for > example, PyObject_GetBuffer? You seldom have APIs which *expect* a > memoryview, I think. Instead, they would expect a buffer-compatible > object. That's a good question. It looks to me that the macro was present as PyMemoryView() initially. You renamed it in #3560 (with Guido's approval), and later PyMemoryView_GET_BUFFER appeared in the docs. I think 3rd-party code uses the macros mainly because they are present and, in the case of PyMemoryView_GET_BUFFER, documented. In most situations PyObject_GetBuffer() could be used indeed. Most use cases I can think of would also involve having access to the managed buffer API. As an example, here's a technique that is similar to what goes on in PyMemoryView_GetContiguous(): Suppose you have an initialized bytes object that you want to wrap as a multi-dimensional exporter. Then: - Base the memoryview on the bytes object and keep exactly one reference to it. - Adjust the shape, strides etc. to get the structure you want. - Return the view: You now have a fully compliant exporter that only needs a single Py_DECREF() to disappear and do all cleanup. Of course this could also be exposed as a function, e.g.: /* stealing a reference to bytes */ PyMemoryView_FromBytesAndInfo(PyObject *bytes, Py_buffer *info); So let's make the flags private. What do you prefer? 1) Leave them in memoryview.h, but with a leading underscore: _Py_MEMORYVIEW_C [...] 2) Move them to memoryobject.c, with a leading underscore. 3) Move them to memoryobject.c, without a leading underscore (I find this more readable). 4) Move them to memoryobject.c as MV_C, MV_FORTRAN, etc. Also, I'll add a note to the docs that PyMemoryView_GET_BUFFER can probably be avoided by using PyObject_GetBuffer() directly. |
|||
msg152245 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-01-29 17:19 | |
> Of course this could also be exposed as a function, e.g.: > > /* stealing a reference to bytes */ > PyMemoryView_FromBytesAndInfo(PyObject *bytes, Py_buffer *info); I think we should minimize the number of reference-stealing functions. > So let's make the flags private. What do you prefer? I don't really mind, whatever you think is best :) |
|||
msg152404 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-01-31 20:34 | |
I've uploaded a new patch that should address the remaining issues: o In the documentation _testbuffer has been replaced by m.cast() + the now multi-dimensional m.tolist(). o I restored the state of the limited API. If we want to include Py_buffer again, I think this should be done in a separate patch. o Flags of the memoryview object are private. Additionally, because NumPy allows non-aligned array accesses, I changed the packing functions to use memcpy for multi-byte types. On x86/amd64 gcc is smart enough to produce almost exactly the same asm output as before, with a slowdown of 0-1%, depending on the benchmark. On other platforms the situation might be worse, but I don't have access to real hardware where alignment actually matters. |
|||
msg153870 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-02-21 13:16 | |
Latest version looks good to me - I vote for landing it whenever you're ready :) |
|||
msg154138 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-02-24 15:16 | |
Thanks, Nick. I'll try to get it done this weekend. I've uploaded Misc/NEWS and Doc/whatsnew/3.3.rst (my apologies to Antoine for plagiarizing the first sentence, I found it hard to come up with a better version). I wasn't sure whether to put the "whatsnew" entry into the section "Other Language Changes" or into the PEP section. I chose the latter, since many new features have been added that are part of the PEP. |
|||
msg154141 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-02-24 17:25 | |
PEP section makes sense - I plan to mark PEP 3118 as Final once you commit this (or you can do that yourself, for that matter). |
|||
msg154240 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-02-25 11:25 | |
New changeset 3f9b3b6f7ff0 by Stefan Krah in branch 'default': - Issue #10181: New memoryview implementation fixes multiple ownership http://hg.python.org/cpython/rev/3f9b3b6f7ff0 |
|||
msg154323 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-02-26 10:30 | |
Nick Coghlan <report@bugs.python.org> wrote: > PEP section makes sense - I plan to mark PEP 3118 as Final once you commit > this (or you can do that yourself, for that matter). Array features are complete except for multi-dimensional indexing and slicing. I think it would be nice to add those in a separate issue; it's not a lot of additional code and it doesn't interfere with the current code. The overall array handling scheme is sound. Life would be a bit easier if contiguity flags were a mandatory part of the Py_buffer structure that the exporter has to fill in. Then there is an open issue (#3132) for expanding the struct module syntax, where the wording in some sections of the PEP led to a bit of head-scratching. :) In another issue (#13072) the question came up whether the proposed 'u' and 'w' formats still make sense after PEP-393 (I think they do, they should map to UCS-2 and UCS-4). We need to decide what to do about 2.7 and 3.2. It's pretty difficult by now to separate the bug fixes from the features. I could follow the example of CPU manufacturers: start with the whole feature set and disable as much as possible. Another problem for 2.7 and 3.2 is that the 'B' format would still need to accept bytes instead of ints. Or can we change that as a bug fix? |
|||
msg154327 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-02-26 11:34 | |
Aw, I guess I was too optimistic in thinking this was the last gap before we could declare it Final... +1 on creating a new feature request for NumPy style multi-dimensional indexing and slicing support on memoryview (I'm assuming that's what you meant). As far as your last question goes, I'm not sure we *can* salvage this in 2.7/3.2. We do have the option of throwing our hands up in the air and saying "Sorry, we couldn't figure out how to fix it without completely rewriting it. Take a look at 3.3 if you want to see how it's *supposed* to work." I hesitate to suggest this (since I'm not volunteering to do the work) but perhaps it would be worth packaging up the 3.3. memoryview and publishing it on PyPI for the benefit of 3.2 and 2.7? May also be worth bringing up on python-dev to see if anyone else has any bright ideas. Myself, I'm not seeing any real choices beyond "won't fix", "backport 3.3 version and remove/disable the new features", "backport 3.3 version, new features and all" and "publish a full featured version on PyPI". |
|||
msg154616 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-02-29 11:33 | |
I'm busy adding the C-API changes to the docs. Regarding the stable ABI: The general mood was to *keep* the removal of the buffer interface for some time, did I get that right? In that case this removal (especially of the Py_buffer struct) should be documented also for 3.2. |
|||
msg154651 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-02-29 16:54 | |
New changeset 9d3c7dd55c7f by Stefan Krah in branch 'default': Issue #10181: Add warning that structure layouts in memoryobject.h and http://hg.python.org/cpython/rev/9d3c7dd55c7f |
|||
msg154665 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-02-29 21:18 | |
For 3.2, there's no removal to document - we asked MvL to drop the buffer APIs from PEP 384, so they've never been part of the stable ABI. From the PEP: "The buffer interface (type Py_buffer, type slots bf_getbuffer and bf_releasebuffer, etc) has been omitted from the ABI, since the stability of the Py_buffer structure is not clear at this time. Inclusion in the ABI can be considered in future releases." I agree we shouldn't be hasty in making even the updated buffer interface implementation officially part of the stable ABI. Better to leave it out for 3.3, get some real world feedback on the latest version, then commit to stability for 3.4+. |
|||
msg154680 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-03-01 08:32 | |
> From the PEP: "The buffer interface (type Py_buffer, type slots bf_getbuffer > and bf_releasebuffer, etc) has been omitted from the ABI, since the stability > of the Py_buffer structure is not clear at this time. Inclusion in the ABI > can be considered in future releases." Great, that's exactly what I was looking for. - As far as I can see the issue is finished, so I'm closing it. Thanks again everyone for all the help! |
|||
msg154764 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-03-02 13:14 | |
It occurs to me that one thing that *could* be backported to early versions are some of the documentation improvements on how to correctly handle the lifecycle of fields in Py_buffer. That gets messy though because memoryview doesn't behave nicely in those versions, so we'd be violating our own guidelines. Perhaps the relevant sections should be updated with a reference saying that the semantics have been cleaned up in 3.3, and if in doubt, try to follow the 3.3 documentation? |
|||
msg154937 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-03-05 09:51 | |
New changeset 373f6cdc6d08 by Stefan Krah in branch 'default': Issue #10181: The decision was to raise a buffer error in memory_exit() http://hg.python.org/cpython/rev/373f6cdc6d08 |
|||
msg157834 - (view) | Author: Josh Triplett (joshtriplett) | Date: 2012-04-09 08:08 | |
I currently use Python 2.7, and I'd like to make use of memoryview. Specifically, I work on BITS (http://biosbits.org/), which runs Python in ring 0 as part of GRUB, and I'd like to use memoryview to give Python access to data in physical memory. I ran into several of the problems documented here when trying to do so. I'd really love to see a backport of this fixed version into 2.7. |
|||
msg157835 - (view) | Author: Josh Triplett (joshtriplett) | Date: 2012-04-09 08:15 | |
> I currently use Python 2.7, and I'd like to make use of memoryview. Specifically, I work on BITS (http://biosbits.org/), which runs Python in ring 0 as part of GRUB, and I'd like to use memoryview to give Python access to data in physical memory. I ran into several of the problems documented here when trying to do so. I'd really love to see a backport of this fixed version into 2.7. More specifically, the primary functionality that I'd like to use exists in 3.3 as PyMemoryView_FromMemory. I've tried to approximate that function using the available API in 2.7, but that led me here. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:07 | admin | set | github: 54390 |
2012-04-09 08:15:56 | joshtriplett | set | messages: + msg157835 |
2012-04-09 08:08:31 | joshtriplett | set | nosy:
+ joshtriplett messages: + msg157834 |
2012-03-05 09:51:05 | python-dev | set | messages: + msg154937 |
2012-03-02 13:14:40 | ncoghlan | set | messages: + msg154764 |
2012-03-01 08:32:50 | skrah | set | status: open -> closed resolution: fixed messages: + msg154680 stage: resolved |
2012-02-29 21:18:37 | ncoghlan | set | messages: + msg154665 |
2012-02-29 16:54:30 | python-dev | set | messages: + msg154651 |
2012-02-29 11:33:51 | skrah | set | nosy:
+ loewis messages: + msg154616 |
2012-02-26 11:34:31 | ncoghlan | set | messages: + msg154327 |
2012-02-26 10:42:23 | skrah | link | issue2394 superseder |
2012-02-26 10:30:34 | skrah | set | messages: + msg154323 |
2012-02-25 11:25:30 | python-dev | set | nosy:
+ python-dev messages: + msg154240 |
2012-02-24 17:25:45 | ncoghlan | set | messages: + msg154141 |
2012-02-24 15:16:20 | skrah | set | files:
+ issue10181-news.diff messages: + msg154138 |
2012-02-21 13:16:01 | ncoghlan | set | messages: + msg153870 |
2012-02-13 15:39:53 | skrah | unlink | issue10227 dependencies |
2012-02-09 22:42:30 | skrah | link | issue9990 superseder |
2012-02-09 22:42:30 | skrah | unlink | issue9990 dependencies |
2012-02-09 22:33:54 | skrah | link | issue8305 superseder |
2012-02-09 22:33:54 | skrah | unlink | issue8305 dependencies |
2012-02-09 22:31:39 | skrah | link | issue7433 superseder |
2012-02-09 22:31:39 | skrah | unlink | issue7433 dependencies |
2012-01-31 20:34:30 | skrah | set | messages: + msg152404 |
2012-01-31 20:33:08 | skrah | set | files: + a5c4a96dc981.diff |
2012-01-29 20:59:16 | skrah | link | issue5231 superseder |
2012-01-29 20:59:16 | skrah | unlink | issue5231 dependencies |
2012-01-29 17:19:05 | pitrou | set | messages: + msg152245 |
2012-01-29 12:40:42 | skrah | set | messages: + msg152223 |
2012-01-29 02:24:36 | ncoghlan | set | messages: + msg152208 |
2012-01-29 01:09:51 | ncoghlan | set | messages: + msg152206 |
2012-01-28 23:09:36 | pitrou | set | messages: + msg152197 |
2012-01-28 22:44:49 | skrah | set | messages: + msg152196 |
2012-01-28 20:59:27 | pitrou | set | messages: + msg152188 |
2012-01-28 17:51:37 | skrah | set | messages: + msg152175 |
2012-01-28 17:29:12 | pitrou | set | messages: + msg152174 |
2012-01-28 17:19:17 | skrah | set | messages: + msg152173 |
2012-01-28 14:13:15 | skrah | set | messages: + msg152161 |
2012-01-28 00:56:31 | ncoghlan | set | messages: + msg152145 |
2012-01-27 21:42:25 | skrah | set | messages: + msg152124 |
2012-01-27 12:42:39 | pitrou | set | messages: + msg152091 |
2012-01-27 12:41:03 | ncoghlan | set | messages: + msg152090 |
2012-01-27 12:06:48 | ncoghlan | set | messages: + msg152086 |
2012-01-25 18:00:18 | skrah | unlink | issue12834 dependencies |
2012-01-25 17:44:16 | skrah | unlink | issue13411 dependencies |
2012-01-25 17:43:55 | skrah | set | files: + 8dd9f0ea4876.diff |
2012-01-18 14:41:48 | ncoghlan | set | messages: - msg151539 |
2012-01-18 14:41:24 | ncoghlan | set | messages: + msg151539 |
2012-01-18 14:40:40 | ncoghlan | set | messages: + msg151538 |
2012-01-18 13:34:54 | skrah | set | messages: + msg151537 |
2012-01-17 12:58:03 | skrah | link | issue13411 dependencies |
2012-01-17 11:53:19 | skrah | set | messages: + msg151447 |
2012-01-17 11:41:55 | kristjan.jonsson | set | messages: + msg151445 |
2012-01-16 13:23:45 | ncoghlan | set | nosy:
+ kristjan.jonsson messages: + msg151360 versions: + Python 3.3, - Python 3.2 |
2012-01-16 13:09:27 | ncoghlan | link | issue12834 dependencies |
2012-01-16 10:29:30 | ncoghlan | link | issue13797 dependencies |
2011-11-13 11:56:08 | paul.moore | set | nosy:
+ paul.moore |
2011-09-18 09:22:30 | skrah | set | messages: + msg144234 |
2011-09-18 09:17:06 | skrah | set | files: + 4492afe05a07.diff |
2011-09-08 15:05:06 | skrah | set | messages: + msg143730 |
2011-09-08 14:57:25 | skrah | link | issue10227 dependencies |
2011-09-08 14:56:21 | skrah | link | issue5231 dependencies |
2011-09-08 14:53:24 | skrah | set | messages: + msg143727 |
2011-09-08 14:47:15 | skrah | set | files: + 51cedae5acfc.diff |
2011-08-16 13:25:53 | skrah | set | messages: + msg142191 |
2011-08-16 13:20:32 | skrah | set | files: + 70a8ccd53ade.diff |
2011-08-10 12:43:05 | skrah | link | issue7433 dependencies |
2011-08-10 12:39:21 | skrah | link | issue8305 dependencies |
2011-08-10 12:31:52 | skrah | link | issue9990 dependencies |
2011-08-10 11:46:42 | skrah | set | messages: + msg141858 |
2011-08-10 11:43:36 | skrah | set | files: + 78fb1181787a.diff |
2011-08-10 03:19:45 | kermode | set | nosy:
- kermode |
2011-07-06 13:16:43 | ncoghlan | set | messages: + msg139936 |
2011-07-06 13:01:03 | pitrou | set | messages: + msg139935 |
2011-07-06 12:42:22 | skrah | set | messages: + msg139933 |
2011-07-06 12:39:20 | skrah | set | messages: + msg139932 |
2011-07-06 04:21:36 | ncoghlan | set | messages: + msg139921 |
2011-07-06 04:16:49 | ncoghlan | set | messages: + msg139920 |
2011-07-06 04:07:31 | ncoghlan | set | messages: + msg139919 |
2011-07-05 15:46:40 | pitrou | set | messages: + msg139883 |
2011-07-05 15:33:19 | skrah | set | messages: + msg139878 |
2011-07-05 14:55:51 | pitrou | set | messages: + msg139877 |
2011-07-05 14:37:02 | ncoghlan | set | messages: + msg139875 |
2011-07-05 14:09:46 | pitrou | set | messages: + msg139874 |
2011-07-05 14:09:25 | pitrou | set | messages: + msg139873 |
2011-07-05 14:02:55 | ncoghlan | set | messages: + msg139872 |
2011-07-05 13:42:07 | pitrou | set | messages: + msg139871 |
2011-07-05 13:13:27 | ncoghlan | set | messages: + msg139870 |
2011-07-05 12:59:01 | ncoghlan | set | messages: + msg139868 |
2011-07-05 12:49:17 | ncoghlan | set | messages: + msg139866 |
2011-07-05 12:47:13 | ncoghlan | set | messages: + msg139865 |
2011-07-05 12:11:51 | pv | set | messages: + msg139861 |
2011-07-05 11:49:04 | pitrou | set | messages: + msg139859 |
2011-07-05 11:44:53 | skrah | set | messages: + msg139856 |
2011-07-05 10:49:41 | skrah | set | messages: + msg139846 |
2011-07-05 10:47:50 | skrah | set | files: + 718801740bde.diff |
2011-07-05 08:39:58 | skrah | set | messages: + msg139834 |
2011-07-05 02:43:09 | ncoghlan | set | messages: + msg139826 |
2011-07-04 14:36:56 | pv | set | messages: + msg139775 |
2011-07-04 13:29:22 | ncoghlan | set | messages: + msg139767 |
2011-07-04 13:26:46 | ncoghlan | set | messages: + msg139766 |
2011-07-04 13:00:53 | skrah | set | messages: + msg139764 |
2011-07-04 12:30:50 | ncoghlan | set | messages: + msg139762 |
2011-07-04 12:27:27 | ncoghlan | set | messages: + msg139760 |
2011-07-04 12:21:09 | skrah | set | messages: + msg139758 |
2011-07-04 10:59:06 | pv | set | messages: + msg139751 |
2011-07-04 10:42:28 | skrah | set | messages: + msg139748 |
2011-07-04 10:36:14 | vstinner | set | nosy:
+ vstinner |
2011-07-04 10:29:41 | skrah | set | files: + bbe70ca4e0e5.diff |
2011-07-04 10:28:58 | skrah | set | hgrepos:
+ hgrepo36 messages: + msg139747 |
2011-06-28 00:58:24 | ncoghlan | set | messages: + msg139341 |
2011-06-27 20:03:59 | kermode | set | messages: + msg139325 |
2011-06-27 18:36:27 | pv | set | messages: + msg139323 |
2011-06-27 18:14:36 | kermode | set | messages: + msg139320 |
2011-06-27 17:21:58 | pitrou | set | messages: + msg139317 |
2011-06-27 17:17:07 | skrah | set | messages: + msg139316 |
2011-06-27 16:47:35 | skrah | set | messages: + msg139314 |
2011-06-27 13:17:56 | ncoghlan | set | messages: + msg139266 |
2011-06-27 11:28:13 | pv | set | messages: + msg139257 |
2011-06-27 10:32:49 | skrah | set | messages: + msg139256 |
2011-06-26 17:58:05 | petri.lehtinen | set | nosy:
+ petri.lehtinen |
2011-06-26 12:35:47 | ncoghlan | set | messages: + msg139173 |
2011-06-26 12:31:34 | ncoghlan | set | messages: + msg139172 |
2011-06-26 10:30:20 | skrah | set | nosy:
+ skrah messages: + msg139162 |
2011-06-20 18:58:35 | jcon | set | nosy:
+ jcon |
2011-02-19 19:14:26 | mark.dickinson | set | assignee: mark.dickinson -> messages: + msg128878 nosy: teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv |
2011-02-14 21:15:10 | loewis | set | nosy:
- loewis |
2011-02-14 15:13:21 | pv | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128551 |
2011-02-14 14:03:10 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128546 |
2011-02-14 13:49:02 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128544 |
2011-02-14 13:03:50 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128542 |
2011-02-14 12:02:23 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128540 |
2011-02-14 11:41:42 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128539 |
2011-02-14 11:29:13 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128538 |
2011-02-14 11:11:53 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128537 |
2011-02-13 22:33:44 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128528 |
2011-02-13 22:20:19 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128527 |
2011-02-13 22:12:42 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128526 |
2011-02-13 22:01:05 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128525 |
2011-02-13 19:42:17 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128518 |
2011-02-13 19:31:31 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128517 |
2011-02-13 19:08:43 | pv | set | files:
+ buffer-interface-clarify.patch nosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128516 |
2011-02-13 16:04:23 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128514 |
2011-02-13 15:34:17 | pv | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128508 |
2011-02-13 15:11:38 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128506 |
2011-02-13 15:09:37 | pv | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128505 |
2011-02-13 15:02:14 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128503 |
2011-02-13 14:35:10 | pv | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128499 |
2011-02-13 14:19:13 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128495 |
2011-02-13 13:20:55 | pv | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128491 |
2011-02-13 04:23:03 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou, pv messages: + msg128477 |
2011-02-13 03:47:32 | pv | set | nosy:
+ pv messages: + msg128475 |
2011-01-07 11:22:35 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125641 |
2011-01-07 10:41:14 | mark.dickinson | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125640 |
2011-01-07 09:34:48 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125635 |
2011-01-07 09:10:24 | mark.dickinson | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125631 |
2011-01-07 09:05:53 | mark.dickinson | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125630 |
2011-01-07 03:59:23 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125618 |
2011-01-06 19:35:24 | loewis | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125582 |
2011-01-06 09:20:26 | mark.dickinson | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125531 |
2011-01-06 06:14:00 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125507 |
2011-01-06 02:15:39 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125502 |
2011-01-06 02:09:52 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125499 |
2011-01-05 19:40:41 | loewis | set | files:
+ pybuffer.diff messages: + msg125462 keywords: + patch nosy: loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou |
2011-01-05 16:52:46 | pitrou | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125433 |
2011-01-05 16:30:52 | ncoghlan | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125431 |
2011-01-05 13:51:10 | mark.dickinson | set | nosy:
loewis, teoliphant, mark.dickinson, ncoghlan, rupole, kermode, pitrou messages: + msg125424 |
2011-01-05 12:25:05 | ncoghlan | set | nosy:
+ teoliphant messages: + msg125418 |
2011-01-05 07:30:22 | kermode | set | nosy:
+ kermode |
2011-01-04 14:58:15 | ncoghlan | set | nosy:
loewis, mark.dickinson, ncoghlan, rupole, pitrou messages: + msg125332 |
2011-01-04 14:56:10 | ncoghlan | link | issue10001 superseder |
2011-01-04 14:54:40 | ncoghlan | set | nosy:
+ loewis messages: + msg125328 |
2011-01-04 14:49:16 | ncoghlan | set | nosy:
mark.dickinson, ncoghlan, rupole, pitrou messages: + msg125327 |
2011-01-04 08:22:38 | mark.dickinson | set | nosy:
mark.dickinson, ncoghlan, rupole, pitrou messages: + msg125297 |
2011-01-04 08:18:56 | mark.dickinson | set | nosy:
mark.dickinson, ncoghlan, rupole, pitrou messages: + msg125296 |
2011-01-04 00:20:02 | pitrou | set | assignee: mark.dickinson nosy: + mark.dickinson |
2010-11-03 13:00:31 | ncoghlan | set | messages: + msg120322 |
2010-11-03 12:51:50 | ncoghlan | set | messages: + msg120321 |
2010-11-03 12:21:52 | ncoghlan | set | title: get_shape0 in memoryobject.c not checked for error return -> Problems with Py_buffer management in memoryobject.c (and elsewhere?) |
2010-11-03 12:19:54 | ncoghlan | set | messages: + msg120319 |
2010-11-03 11:59:53 | rupole | set | messages: + msg120318 |
2010-11-03 01:17:31 | rupole | set | messages: + msg120297 |
2010-11-02 23:01:12 | pitrou | set | messages: + msg120281 |
2010-11-02 22:57:06 | ncoghlan | set | messages: + msg120280 |
2010-11-02 22:34:57 | pitrou | set | messages: + msg120270 |
2010-11-02 22:30:01 | ncoghlan | set | messages: + msg120268 |
2010-11-02 22:16:18 | pitrou | set | nosy:
+ ncoghlan messages: + msg120267 |
2010-11-02 21:30:35 | eric.araujo | set | nosy:
+ pitrou |
2010-10-23 18:22:37 | rupole | create |