|
|
|
Created:
2 years, 3 months ago by rupole Modified:
1 year, 3 months ago Reviewers:
pitrou, ncoghlan, stefan-usenet CC:
loewis, oliphant_enthought.com, p.f.moore_gmail.com, mark.dickinson, Nick Coghlan, rupole_hotmail.com, AntoinePitrou, krisvale, haypo, josh_joshtriplett.org, pv, skrah, devnull_psf.upfronthosting.co.za, jcon, Petri Lehtinen Visibility:
Public. |
Patch Set 1 #Patch Set 2 #
Total comments: 21
Patch Set 3 #
Total comments: 13
Patch Set 4 #Patch Set 5 #Patch Set 6 #Patch Set 7 #
Total comments: 60
Patch Set 8 #
Total comments: 22
Patch Set 9 #
Total comments: 1
Patch Set 10 #MessagesTotal messages: 31
That looks a *lot* cleaner than what we have now. See detailed comments for a few suggestions on layout, as well as handling the case where GetContiguous requires copying. http://bugs.python.org/review/10181/diff/2978/7717 File Include/memoryobject.h (right): http://bugs.python.org/review/10181/diff/2978/7717#newcode78 Include/memoryobject.h:78: #define Py_MEMORYVIEW_MAXSTATIC 3 The significance of this constant escapes me when looking just at the header file - the implementation may make it clearer, but a comment would be helpful. http://bugs.python.org/review/10181/diff/2978/7718 File Include/object.h (left): http://bugs.python.org/review/10181/diff/2978/7718#oldcode161 Include/object.h:161: mono-dimensional buffers. */ Alas, I think we're stuck with keeping this around for backwards compatibility reasons, now. Fortunately, since we'll strictly control the lifecycle via PyManagedBuffer now, it's OK if memoryview instances have additional pointers that refer to this space so there's no need to fiddle with the pointer values. Then again, it's undocumented, so we're within our rights to ditch it... http://bugs.python.org/review/10181/diff/2978/7719 File Lib/test/test_sys.py (right): http://bugs.python.org/review/10181/diff/2978/7719#newcode773 Lib/test/test_sys.py:773: check(memoryview(b''), size(h + 'P3P3P3P2P2P2i5P')) With smalltable added back in, this may need adjusting again. http://bugs.python.org/review/10181/diff/2978/7720 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2978/7720#newcode40 Objects/memoryobject.c:40: /* XXX any meaning if ndim > 1? */ Just the size in the first dimension - as noted on the tracker, fixing that can be the *next* task on the list... http://bugs.python.org/review/10181/diff/2978/7720#newcode67 Objects/memoryobject.c:67: */ Agreed these semantics make sense and should be documented. http://bugs.python.org/review/10181/diff/2978/7720#newcode90 Objects/memoryobject.c:90: return PyMemoryView_FromObject((PyObject *)mbuf); There's a ref leak here if PyMemoryView_FromObject fails (unlikely but possible). http://bugs.python.org/review/10181/diff/2978/7720#newcode95 Objects/memoryobject.c:95: { I suggest moving this *before* PyMemoryView_FromBuffer so the two memoryview constructors are together (that's more important than having the two "_FromObject" functions together) http://bugs.python.org/review/10181/diff/2978/7720#newcode142 Objects/memoryobject.c:142: mbuf = (PyManagedBufferObject *)v; The refcounting here bothers me - there should really be a Py_INCREF(mbuf) on this branch, with PyMemoryView_FromBuffer adjusted accordingly to include an unconditional Py_DECREF after creating the memoryview instance. http://bugs.python.org/review/10181/diff/2978/7720#newcode167 Objects/memoryobject.c:167: { Just as a general comment, rather than intermingling the managedbuffer and memoryview implementations, it would probably be cleared to have the entire managedbuffer implementation first, followed by the memoryview implementation (see listobject.c for an example of a single C file that defines multiple objects - lists themselves along with some iterators) http://bugs.python.org/review/10181/diff/2978/7720#newcode441 Objects/memoryobject.c:441: "underlying PyManagedBuffer"); This does make me wonder if memoryview should expose a reference to the underlying object as a read-only property. http://bugs.python.org/review/10181/diff/2978/7720#newcode465 Objects/memoryobject.c:465: This isn't too difficult to handle - once the data has been copied out to the bytes instance, discard the first memoryview object and return a new one based on the bytes object (alternatively, just switch the ManagedBuffer pointers around appropriately without destroying and recreating the memoryview itself).
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/2978/7720 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2978/7720#newcode465 Objects/memoryobject.c:465: My comment was a bit cryptic. I meant that in the current implementation dest is filled in but apparently not used, i.e. view->buf is never set to dest. Also, releasing the view later on (see last XXX) seems wrong. Then, I think the strides and shape need to be adjusted as well.
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/2978/7717 File Include/memoryobject.h (right): http://bugs.python.org/review/10181/diff/2978/7717#newcode78 Include/memoryobject.h:78: #define Py_MEMORYVIEW_MAXSTATIC 3 Indeed. The constant is supposed to be used in init_arrays() in the slicing function when multi-dimensional slicing will be implemented. I thought it might be good to have storage for three dimensions and switch to dynamic arrays for everything above.
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/2978/7718 File Include/object.h (left): http://bugs.python.org/review/10181/diff/2978/7718#oldcode161 Include/object.h:161: mono-dimensional buffers. */ Would it be OK to keep it but add a "deprecated" comment?
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/2978/7720 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2978/7720#newcode142 Objects/memoryobject.c:142: mbuf = (PyManagedBufferObject *)v; Yes, this is nicer. I first regarded PyManagedBuffer_FromObject() as internal, in which case the refcounting should work out. I'll change that.
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/2978/7717 File Include/memoryobject.h (right): http://bugs.python.org/review/10181/diff/2978/7717#newcode78 Include/memoryobject.h:78: #define Py_MEMORYVIEW_MAXSTATIC 3 Makes sense, although you may want to leave that until the patch that fixes the multi-dimensional support. http://bugs.python.org/review/10181/diff/2978/7718 File Include/object.h (left): http://bugs.python.org/review/10181/diff/2978/7718#oldcode161 Include/object.h:161: mono-dimensional buffers. */ I suggest asking the NumPy folks about this one. If they use it, we should keep it. If they don't, ditch it as an obsolete undocumented feature. http://bugs.python.org/review/10181/diff/2978/7720 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2978/7720#newcode465 Objects/memoryobject.c:465: It wouldn't surprise me in the least to find out the previous code here was broken. Is there even a test that exercises it properly in the current test suite?
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/2978/7718 File Include/object.h (left): http://bugs.python.org/review/10181/diff/2978/7718#oldcode161 Include/object.h:161: mono-dimensional buffers. */ Numpy does not touch the `smalltable` field, and I haven't seen any other code using it either. This does not mean that no such code exists, but it's undocumented (and ugly), so... http://bugs.python.org/review/10181/diff/2978/7720 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2978/7720#newcode67 Objects/memoryobject.c:67: */ Maybe also update the documentation at the same time? http://bugs.python.org/review/10181/diff/2978/7720#newcode465 Objects/memoryobject.c:465: It could be useful to add tests for the buffer C-API to _testcapimodule.c. (Including a simple extension object that exposes the buffer interface, so that the correct implementation of the getbuffer/releasebuffer lifecycle in memoryview could be tested.)
Sign in to reply to this message.
What's the etiquette for CC'ing? I don't want to leave anyone out, but perhaps the mail volume gets too high for some people. http://bugs.python.org/review/10181/diff/2987/7744 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2987/7744#newcode257 Objects/memoryobject.c:257: mview->view = mbuf->master; This still isn't right if 'v' is a sliced memoryview. The sliced buffer should be cloned.'v' must also be kept alive, since the new view will refer to the shape/stride arrays (read-only). That's the way it is done in memory_getbuf() now. But won't keeping 'v' alive have the exact efficiency issues that Antoine mentioned earlier? We could of course clone all arrays right here. That's another efficiency issue.
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/2987/7744 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2987/7744#newcode257 Objects/memoryobject.c:257: mview->view = mbuf->master; I think cloning the arrays right here is best. Copying into the static arrays is fast, if ndim > 3 (or some other small number) there are probably bigger efficiency issues.
Sign in to reply to this message.
I think you already came to the same conclusion I did, but adding my comment for the record anyway. http://bugs.python.org/review/10181/diff/2987/7744 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2987/7744#newcode257 Objects/memoryobject.c:257: mview->view = mbuf->master; In the view-of-a-view case, we can exploit the fact that we *know* how view internals work to clone the info safely and make sure that any pointers either refer to memory owned by the underlying object, by mbuf or by the new mview. We don't suffer from the problem the old dup_buffer code did where it was manipulating structures that were first populated by 3rd party code. Probably the best thing to do is set a flag in the type-checking code and then check it when deciding how to populate mview->view (i.e. either by copying directly from the master or from the source view's internal state)
Sign in to reply to this message.
Hi, Looks good on the principle. I have an issue with the strategy adopted for the release() method, see inline comments. Antoine. http://bugs.python.org/review/10181/diff/2987/7744 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2987/7744#newcode61 Objects/memoryobject.c:61: _PyObject_GC_TRACK(mbuf); Looks like this could be done above, since it's shared by all code paths. http://bugs.python.org/review/10181/diff/2987/7744#newcode74 Objects/memoryobject.c:74: /* XXX Do we need mbuf_new()? */ Not if it's not supposed to be instantiated through anything else than PyManagedBuffer_FromObject(). http://bugs.python.org/review/10181/diff/2987/7744#newcode83 Objects/memoryobject.c:83: PyErr_SetString(PyExc_BufferError, "several memoryviews are active"); I think this should be a SystemError of some sort (it should not be possible to call mbuf_release() if there are active memoryviews around). http://bugs.python.org/review/10181/diff/2987/7744#newcode147 Objects/memoryobject.c:147: (((PyMemoryViewObject *) memobj)->mbuf->released) This looks wrong to me. Since release() is a method of the memoryview object, and the shared buffer is invisible to the user, release() should switch a "released" flag on the memoryview itself and not affect other memoryviews. The memoryview must also keep count of the number of exports made through memory_getbuf. When both "released" is true and the number of exports falls to 0, "mbuf" should be set to NULL. This way you get the desired behaviour (a refcounted shared managed buffer) without messing with the semantics of the release() method.
Sign in to reply to this message.
Since the "released" flag refers to whether or not ReleaseBuffer has been called, I think ManagedBuffer is actually the right place for it. http://bugs.python.org/review/10181/diff/2987/7744 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2987/7744#newcode83 Objects/memoryobject.c:83: PyErr_SetString(PyExc_BufferError, "several memoryviews are active"); The memoryviews *don't know* if there are multiple views around - they're only coupled via mbuf, so it has to do the check on their behalf (and produce a user friendly error if someone hasn't managed their views correctly). http://bugs.python.org/review/10181/diff/2987/7744#newcode147 Objects/memoryobject.c:147: (((PyMemoryViewObject *) memobj)->mbuf->released) For the refcounting of exports part, Stefan is going to handle that via Py_INCREF(mbuf) and Py_DECREF(mbuf) as appropriate in the buffer export related methods. Having the "released" flag on the managed buffer avoids the need for a bunch of "!= NULL" checks in the memoryview code (which would be needed if the "released" status was recorded by setting mbuf = NULL). I'm not necessarily *opposed* to doing things that way, but I think it would actually make the code messier rather than cleaner. Remember one of the goals here is for release() to either free the underlying buffer immediately or raise BufferError if it is still in use. The shared state reflects.
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/2987/7744 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2987/7744#newcode83 Objects/memoryobject.c:83: PyErr_SetString(PyExc_BufferError, "several memoryviews are active"); On 2011/07/05 16:15:10, Nick Coghlan wrote: > The memoryviews *don't know* if there are multiple views around - they're only > coupled via mbuf, so it has to do the check on their behalf (and produce a user > friendly error if someone hasn't managed their views correctly). I'm worried that you're adding extra burden on the user here. "Managing their views" is another layer of complexity which we could handle transparently just fine (well, I hope so). http://bugs.python.org/review/10181/diff/2987/7744#newcode147 Objects/memoryobject.c:147: (((PyMemoryViewObject *) memobj)->mbuf->released) On 2011/07/05 16:15:10, Nick Coghlan wrote: > > Remember one of the goals here is for release() to either free the underlying > buffer immediately or raise BufferError if it is still in use. The shared state > reflects. Ok. However, the problem here is that several memoryviews share the same underlying state, for example if one is a slice of another. This makes it rather awkward and tedious to manage them, if releasing one of them invalidates the other ones (it isn't the case currently, by construction). Sure, managing this ourselves makes *our* code less simple, but at the benefit of users, and that's what most of Python is about ;)
Sign in to reply to this message.
> Ok. However, the problem here is that several memoryviews share the same underlying state Actually I want to change that, see: http://bugs.python.org/review/10181/diff/2987/7744#newcode257 With the static arrays inside the MemoryViewObject, I think it's fast enough to copy shape, strides and suboffsets unconditionally on construction. Then each MemoryViewObject can disappear independently, which is a lot cleaner.
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/2987/7744 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2987/7744#newcode61 Objects/memoryobject.c:61: _PyObject_GC_TRACK(mbuf); mbuf->master.obj is followed by tp_traverse() though. The docs say that all the fields followed by the tp_traverse handler should be valid before calling PyObject_GC_TRACK(). Or does this not apply to the macro version?
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/2987/7744 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/2987/7744#newcode61 Objects/memoryobject.c:61: _PyObject_GC_TRACK(mbuf); On 2011/07/06 18:21:32, skrah wrote: > mbuf->master.obj is followed by tp_traverse() though. The docs say that all the > fields followed by the tp_traverse handler should be valid before calling > PyObject_GC_TRACK(). Well, it's either valid, or it's NULL. It shouldn't make any difference since the tp_traverse handler explicitly checks for NULL.
Sign in to reply to this message.
Mostly just cosmetic comments on the docs and various comments, but there are two substantive points: 1. I think the "len" field in the Py_buffer struct should be exposed as a "size" attribute on the memoryview object to better match the sys.getsizeof() and len() terminology 2. I think memoryview.tobytes() and PyMemoryView_GetContiguous() can be fixed fairly easily (and fixing the former provides the means to test the latter) Otherwise looks great to me - huge improvements to both the documentation and the implementation. http://bugs.python.org/review/10181/diff/3343/10406 File Doc/c-api/buffer.rst (right): http://bugs.python.org/review/10181/diff/3343/10406#newcode134 Doc/c-api/buffer.rst:134: can use :c:member:`~Py_buffer.itemsize` to navigate the buffer. Use '==' rather '=' http://bugs.python.org/review/10181/diff/3343/10406#newcode230 Doc/c-api/buffer.rst:230: return a readable or writable buffer, but the choice MUST be consistent Minor quibble: I tend to favour "provide" over "return" when data is provided to a caller via an output parameter rather than a function call's actual return value. So here, for example, I'd write something like "If set, the exporter MUST provide a writable buffer or else report failure. Otherwise, the exporter MAY provide either a readonly or writable buffer, but the choice MUST be consistent for all consumers." http://bugs.python.org/review/10181/diff/3343/10406#newcode319 Doc/c-api/buffer.rst:319: Perhaps include PyBUF_WRITABLE and PyBUF_SIMPLE as the final two lines in this table? (I realise they aren't technically compound requests, but they continue the pattern down to the simple 1-D case) http://bugs.python.org/review/10181/diff/3343/10406#newcode335 Doc/c-api/buffer.rst:335: n-dimensional array as follows: "as as" -> "as a" (split across the two lines) http://bugs.python.org/review/10181/diff/3343/10406#newcode424 Doc/c-api/buffer.rst:424: Calls to :c:func:`PyObject_GetBuffer` and :c:func:`PyBuffer_Release` must Grammar nitpicking: "Successful calls to GetBuffer must be paired with calls to ReleaseBuffer, similar..." http://bugs.python.org/review/10181/diff/3343/10406#newcode426 Doc/c-api/buffer.rst:426: after the consumer is done with the buffer, :c:func:`PyBuffer_Release` must "to to" http://bugs.python.org/review/10181/diff/3343/10406#newcode432 Doc/c-api/buffer.rst:432: Release the buffer *view* and decrement :c:member:`view->obj`. This function "decrement the reference count for" http://bugs.python.org/review/10181/diff/3343/10406#newcode447 Doc/c-api/buffer.rst:447: Unrelated: perhaps these "fortran" arguments should all be changed to "contig_kind"? The fortran naming suggests a boolean, but they're actually an ASCII mnemonic. http://bugs.python.org/review/10181/diff/3343/10406#newcode463 Doc/c-api/buffer.rst:463: with writability set according to *readonly*. *buf* is interpreted a a sequence "a a" -> "as a" http://bugs.python.org/review/10181/diff/3343/10408 File Doc/c-api/typeobj.rst (right): http://bugs.python.org/review/10181/diff/3343/10408#newcode1240 Doc/c-api/typeobj.rst:1240: :c:func:`PyBuffer_FillInfo` provides an easy way of exposing a simple While it's already mentioned in the field descriptions, perhaps it's worth repeating here that the "internal" field is provided specifically to support passing arbitrary data from getbufferproc to the corresponding releasebufferproc call? http://bugs.python.org/review/10181/diff/3343/10409 File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/10181/diff/3343/10409#newcode2348 Doc/library/stdtypes.rst:2348: :class:`bytes`, :class:`bytearray` and :class:`array.array`. If you're going to mention array.array, then the lead in should be updated to something like "Built-in and standard library objects..." http://bugs.python.org/review/10181/diff/3343/10409#newcode2365 Doc/library/stdtypes.rst:2365: from the :mod:`struct` module, index will return a single element "indexing will" http://bugs.python.org/review/10181/diff/3343/10409#newcode2411 Doc/library/stdtypes.rst:2411: ValueError: memoryview assignment: lvalue and rvalue have different structures Minor quibble: a quick grep suggests that LHS/RHS are currently more common in the docs and code than lvalue/rvalue. (not a big deal either way, though) http://bugs.python.org/review/10181/diff/3343/10409#newcode2479 Doc/library/stdtypes.rst:2479: Parameter definitions are missing. http://bugs.python.org/review/10181/diff/3343/10409#newcode2485 Doc/library/stdtypes.rst:2485: Perhaps note explicitly that size changes can be handled by slicing the memoryview either before (1D -> C-contig) or after (C-contig -> 1D) casting? http://bugs.python.org/review/10181/diff/3343/10409#newcode2577 Doc/library/stdtypes.rst:2577: This attribute should probably be "size" rather than "len", since it's much closer in spirit to sys.getsizeof() than it is to len(). http://bugs.python.org/review/10181/diff/3343/10409#newcode2615 Doc/library/stdtypes.rst:2615: I'm guessing the meaning here is that tolist() and tobytes() don't work with complex underlying formats? Better phrasing may be: ", but some methods (e.g. tolist()) will fail for anything other than native single element formats.) http://bugs.python.org/review/10181/diff/3343/10409#newcode2649 Doc/library/stdtypes.rst:2649: I don't understand this explanation (is it even possible to properly access the elements of PIL-style array with suboffsets from Python? Wouldn't each pointer reference need to be replaced with a separate memorview somewhere down in the C code?) http://bugs.python.org/review/10181/diff/3343/10414 File Lib/test/test_buffer.py (right): http://bugs.python.org/review/10181/diff/3343/10414#newcode13 Lib/test/test_buffer.py:13: A caveat on my review: I've only skimmed this new test file rather than reviewed it thoroughly. I still think this is a darn sight better than what we have in place now :) http://bugs.python.org/review/10181/diff/3343/10417 File Modules/_testbuffer.c (right): http://bugs.python.org/review/10181/diff/3343/10417#newcode3 Modules/_testbuffer.c:3: Same caveat as for the corresponding test (i.e. only skimmed, not reviewed thoroughly) http://bugs.python.org/review/10181/diff/3343/10419 File Objects/abstract.c (right): http://bugs.python.org/review/10181/diff/3343/10419#newcode654 Objects/abstract.c:654: if (view == NULL) return 0; /* XXX why not -1? */ That's a very good question - perhaps raise it specifically on the tracker item? http://bugs.python.org/review/10181/diff/3343/10420 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/3343/10420#newcode104 Objects/memoryobject.c:104: is called from mbuf_clear() to break up a reference cycle. */ "can still" http://bugs.python.org/review/10181/diff/3343/10420#newcode346 Objects/memoryobject.c:346: mv_alloc(int ndim) The mv_alloc name meant I was looking for this to appear in a tp_alloc slot and was confused why it didn't start with "memory_" like the other memory view private functions. I suggest using the name "memory_alloc", but explicitly noting that it is called from memory_new (via the mbuf APIs) rather than via the tp_alloc slot. http://bugs.python.org/review/10181/diff/3343/10420#newcode508 Objects/memoryobject.c:508: "memoryview: object does not have the buffer interface"); This error message should report the type of "v" (it makes debugging a *lot* easier) http://bugs.python.org/review/10181/diff/3343/10420#newcode1086 Objects/memoryobject.c:1086: struct module is around 15x slower than the two functions below. */ Differences in the desired error messages may make this impossible, but it may be worth exploring (as a separate issue) whether or not *struct* could be updated to take advantage of these optimised cases. http://bugs.python.org/review/10181/diff/3343/10420#newcode1371 Objects/memoryobject.c:1371: return NULL; If PyMemoryView_GetContiguous is fixed, then this should become fairly straightforward (add var declarations and typecasts as necessary): mem = PyMemoryView_GetContiguous(self); contiguous_bytes = mem->view->obj; Py_INCREF(contiguous_bytes); Py_DECREF(mem); return contiguous_bytes; http://bugs.python.org/review/10181/diff/3343/10420#newcode1419 Objects/memoryobject.c:1419: { I suggest explicitly noting that this is implemented separately from memory_subscript in order to take advantage of the automatic iteration support provided by PySequence_Iter (and to avoid duplicating the PyNumber_Index logic already provided in the core) http://bugs.python.org/review/10181/diff/3343/10420#newcode1984 Objects/memoryobject.c:1984: XXX PyBUF_SHADOW has been removed: issue #10293 Any reason not to nuke it from the comment as well, then? http://bugs.python.org/review/10181/diff/3343/10420#newcode2038 Objects/memoryobject.c:2038: 'dest' is filled in but not used: 'mem' is returned unchanged. I suggest referencing #12834 here - I think if this is fixed, then it will be straightforward to fix tobytes() (as noted above). That also provides a vector for testing: creating non-contiguous arrays by slicing your new _testbuffer implementation, then checking that tobytes() on a memoryview of those arrays gives the right answer. http://bugs.python.org/review/10181/diff/3343/10420#newcode2063 Objects/memoryobject.c:2063: return (PyObject *)mem; With all of your other fixes in place, I'm fairly sure this now just needs the following change: Py_DECREF(mem); mem = (PyMemoryViewObject *) PyMemoryView_FromObject(dest); Py_DECREF(dest); return (PyObject *) mem; http://bugs.python.org/review/10181/diff/3343/10420#newcode2187 Objects/memoryobject.c:2187: {"len", (getter)memory_len_get, NULL, NULL}, As I noted in the docs, this is probably better exposed at the Python level as 'size'. Even though that differs from the C field name, it better maps to the general size = "number of bytes", len = "number of objects" convention.
Sign in to reply to this message.
Mostly just cosmetic comments on the docs and various comments, but there are two substantive points: 1. I think the "len" field in the Py_buffer struct should be exposed as a "size" attribute on the memoryview object to better match the sys.getsizeof() and len() terminology 2. I think memoryview.tobytes() and PyMemoryView_GetContiguous() can be fixed fairly easily (and fixing the former provides the means to test the latter) Otherwise looks great to me - huge improvements to both the documentation and the implementation.
Sign in to reply to this message.
Mostly just cosmetic comments on the docs and various comments, but there are two substantive points: 1. I think the "len" field in the Py_buffer struct should be exposed as a "size" attribute on the memoryview object to better match the sys.getsizeof() and len() terminology 2. I think memoryview.tobytes() and PyMemoryView_GetContiguous() can be fixed fairly easily (and fixing the former provides the means to test the latter) Otherwise looks great to me - huge improvements to both the documentation and the implementation.
Sign in to reply to this message.
My apologies for the review spam - Reitveld was reporting a traceback, so I didn't think it had actually accepted the review :(
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/3343/10406 File Doc/c-api/buffer.rst (right): http://bugs.python.org/review/10181/diff/3343/10406#newcode134 Doc/c-api/buffer.rst:134: can use :c:member:`~Py_buffer.itemsize` to navigate the buffer. Fixed. http://bugs.python.org/review/10181/diff/3343/10406#newcode230 Doc/c-api/buffer.rst:230: return a readable or writable buffer, but the choice MUST be consistent OK, changed. http://bugs.python.org/review/10181/diff/3343/10406#newcode319 Doc/c-api/buffer.rst:319: This also depends on ... http://mail.scipy.org/pipermail/numpy-discussion/2011-August/058189.html ... to which I didn't receive a conclusive answer. Synopsis: PyBUF_SIMPLE is used widely where PyBUF_CONTIG should have be used. So I suggested that an exporter must expose any C-contiguous array as a 1-D array with an implicit cast to 'B' (format field is NULL) in response to a PyBUF_SIMPLE request. Otherwise a *lot* of code will probably break. With those rules, PyBUF_SIMPLE doesn't really fit in nicely. http://bugs.python.org/review/10181/diff/3343/10406#newcode335 Doc/c-api/buffer.rst:335: n-dimensional array as follows: Fixed. http://bugs.python.org/review/10181/diff/3343/10406#newcode424 Doc/c-api/buffer.rst:424: Calls to :c:func:`PyObject_GetBuffer` and :c:func:`PyBuffer_Release` must Fixed. http://bugs.python.org/review/10181/diff/3343/10406#newcode426 Doc/c-api/buffer.rst:426: after the consumer is done with the buffer, :c:func:`PyBuffer_Release` must Fixed. http://bugs.python.org/review/10181/diff/3343/10406#newcode432 Doc/c-api/buffer.rst:432: Release the buffer *view* and decrement :c:member:`view->obj`. This function Fixed. http://bugs.python.org/review/10181/diff/3343/10406#newcode447 Doc/c-api/buffer.rst:447: NumPy uses "order", which I prefer slightly. Would you be happy with that, too? http://bugs.python.org/review/10181/diff/3343/10406#newcode463 Doc/c-api/buffer.rst:463: with writability set according to *readonly*. *buf* is interpreted a a sequence Fixed. http://bugs.python.org/review/10181/diff/3343/10408 File Doc/c-api/typeobj.rst (right): http://bugs.python.org/review/10181/diff/3343/10408#newcode1240 Doc/c-api/typeobj.rst:1240: :c:func:`PyBuffer_FillInfo` provides an easy way of exposing a simple I added this to the releasebufferproc section: "The exporter MUST use the internal field to keep track of buffer-specific resources (if present). This field is guaranteed to remain constant, while a consumer MAY pass a copy of the original buffer as the *view* argument." http://bugs.python.org/review/10181/diff/3343/10409 File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/10181/diff/3343/10409#newcode2348 Doc/library/stdtypes.rst:2348: :class:`bytes`, :class:`bytearray` and :class:`array.array`. Ok, I reverted this: two examples are enough anyway. http://bugs.python.org/review/10181/diff/3343/10409#newcode2365 Doc/library/stdtypes.rst:2365: from the :mod:`struct` module, index will return a single element Fixed. http://bugs.python.org/review/10181/diff/3343/10409#newcode2411 Doc/library/stdtypes.rst:2411: ValueError: memoryview assignment: lvalue and rvalue have different structures lvalue emphasizes that something is written to. Like in c-api/bytes.rst: "Pass the address of an existing bytes object as an lvalue (it may be written into)" That said, I'd also be happy with LHS/RHS. http://bugs.python.org/review/10181/diff/3343/10409#newcode2479 Doc/library/stdtypes.rst:2479: Fixed: cast(format[, shape]) "... shape defaults to [byte_length//new_itemsize]..." If someone calls cast("B", shape=None), is it OK to raise (None is not a list), or is there a convention that None must always be interpreted as the default?
Sign in to reply to this message.
I fixed the first batch of documentation issues.
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/3343/10406 File Doc/c-api/buffer.rst (right): http://bugs.python.org/review/10181/diff/3343/10406#newcode447 Doc/c-api/buffer.rst:447: I changed it to "order". I think it's best not to diverge from NumPy if possible. http://bugs.python.org/review/10181/diff/3343/10409 File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/10181/diff/3343/10409#newcode2485 Doc/library/stdtypes.rst:2485: Hmm, I think this is the sort of thing that a user might figure out by himself. Also, I'm a bit worried that the paragraph could get too long. http://bugs.python.org/review/10181/diff/3343/10409#newcode2577 Doc/library/stdtypes.rst:2577: Changed to 'nbytes'. http://bugs.python.org/review/10181/diff/3343/10409#newcode2615 Doc/library/stdtypes.rst:2615: Done here: http://hg.python.org/features/pep-3118/rev/ae595128415c http://bugs.python.org/review/10181/diff/3343/10409#newcode2649 Doc/library/stdtypes.rst:2649: This was an attempt to condense the "Slicing PIL arrays" comment from _testbuffer.c. You can't actually use those values from Python, so I marked them as informational now: http://hg.python.org/features/pep-3118/rev/8dd9f0ea4876 http://bugs.python.org/review/10181/diff/3343/10419 File Objects/abstract.c (right): http://bugs.python.org/review/10181/diff/3343/10419#newcode654 Objects/abstract.c:654: if (view == NULL) return 0; /* XXX why not -1? */ I opened #13860 for this. http://bugs.python.org/review/10181/diff/3343/10420 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/3343/10420#newcode104 Objects/memoryobject.c:104: is called from mbuf_clear() to break up a reference cycle. */ Fixed. http://bugs.python.org/review/10181/diff/3343/10420#newcode346 Objects/memoryobject.c:346: mv_alloc(int ndim) Fixed. http://bugs.python.org/review/10181/diff/3343/10420#newcode508 Objects/memoryobject.c:508: "memoryview: object does not have the buffer interface"); Fixed. http://bugs.python.org/review/10181/diff/3343/10420#newcode1086 Objects/memoryobject.c:1086: struct module is around 15x slower than the two functions below. */ I'll keep that in mind for later. http://bugs.python.org/review/10181/diff/3343/10420#newcode1371 Objects/memoryobject.c:1371: return NULL; Done in http://hg.python.org/features/pep-3118/rev/7af5316ac5fd . http://bugs.python.org/review/10181/diff/3343/10420#newcode1419 Objects/memoryobject.c:1419: { I added a new comment as part of http://hg.python.org/features/pep-3118/rev/da065680a63a . http://bugs.python.org/review/10181/diff/3343/10420#newcode1984 Objects/memoryobject.c:1984: XXX PyBUF_SHADOW has been removed: issue #10293 I rewrote the comment, since also locks have disappeared from the PEP long ago. http://bugs.python.org/review/10181/diff/3343/10420#newcode2063 Objects/memoryobject.c:2063: return (PyObject *)mem; Fixing the function required a lot of changes. Done in http://hg.python.org/features/pep-3118/rev/7af5316ac5fd . http://bugs.python.org/review/10181/diff/3343/10420#newcode2187 Objects/memoryobject.c:2187: {"len", (getter)memory_len_get, NULL, NULL}, Exposed as 'nbytes' now.
Sign in to reply to this message.
I think I've addressed all of Nick's comments. In particular, tobytes() and PyMemoryView_GetContiguous() are fixed and I added the missing hash function (also generalized for the multi-dimensional case).
Sign in to reply to this message.
Aside from a couple of very minor style nits and one questionable assert, the latest version looks good to me. (Also built and ran just fine after I pulled it into my sandbox repo). So +1 from me for pushing this to the main repo. http://bugs.python.org/review/10181/diff/4057/13587 File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/10181/diff/4057/13587#newcode20 Doc/library/stdtypes.rst:20: the collection instance itself but ``None``. Heh, the perils of reviewing the patch set deltas: Reitveld gets confused by the changes you've merged in from python.org :) I'll be assuming the non-memoryview related "changes" I'm seeing in this file between patch set 7 and patch set 8 are due to that. http://bugs.python.org/review/10181/diff/4057/13587#newcode2467 Doc/library/stdtypes.rst:2467: >>> from _testbuffer import * "from _testbuffer import ndarray" would be preferable here http://bugs.python.org/review/10181/diff/4057/13599 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/4057/13599#newcode662 Objects/memoryobject.c:662: assert(src->ndim <= Py_MEMORYVIEW_MAX_NDIM); I'm confused by this line. The function comment says src->ndim isn't used, but the assert checks it anyway. If it isn't used, why do we care what the value is? Was this meant to be ensuring that the comment about the 'ndim' was enforced in debug builds? http://bugs.python.org/review/10181/diff/4057/13599#newcode762 Objects/memoryobject.c:762: v->ob_type->tp_name); There's a macro for this that's the preferred form these days. "Py_TYPE(v)->tp_name", IIRC.
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/4057/13587 File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/10181/diff/4057/13587#newcode2467 Doc/library/stdtypes.rst:2467: >>> from _testbuffer import * Do we really want to expose _testbuffer in the documentation? Why not simply remove the example? http://bugs.python.org/review/10181/diff/4057/13587#newcode2643 Doc/library/stdtypes.rst:2643: the array would use in a contiguous representation. It is not neccesarily Perhaps add that it's equal to len(m.tobytes())? http://bugs.python.org/review/10181/diff/4057/13589 File Include/memoryobject.h (right): http://bugs.python.org/review/10181/diff/4057/13589#newcode78 Include/memoryobject.h:78: } PyManagedBufferObject; Is this a public API? I think it'd better be private (i.e. have it and related APIs prefixed with an underscore). http://bugs.python.org/review/10181/diff/4057/13589#newcode81 Include/memoryobject.h:81: #define Py_MEMORYVIEW_MAX_NDIM 64 Same here and other Py_MEMORYVIEW_* constants.
Sign in to reply to this message.
Thanks for the additional review. I've added some answers and also two questions. http://bugs.python.org/review/10181/diff/4057/13586 File Doc/c-api/typeobj.rst (right): http://bugs.python.org/review/10181/diff/4057/13586#newcode1263 Doc/c-api/typeobj.rst:1263: as the *view* argument. We might need to word this more carefully: The consensus somewhere in the middle of the discussion was that the *address* of the view must be the same, but the pointers *except* for view->internal might have changed. I worded it like this because it's in fact sufficient that view->internal stays the same (I'm using that in _testbuffer.c and it's actually a good scheme), but after re-reading the discussion I'm not sure if everyone agrees. http://bugs.python.org/review/10181/diff/4057/13587 File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/10181/diff/4057/13587#newcode2467 Doc/library/stdtypes.rst:2467: >>> from _testbuffer import * That's a good point, because people might start using _testbuffer. There are a couple of reasons why I did this: 1) It's easier to understand many examples if a multi-dimensional list is printed. 2) If cast() is used in the examples, the danger is that people think that cast() is the normal way of creating multi-dimensional arrays. I think that cast() should be used with care and only when one has a good grasp on how the buffers will be accessed by other objects. 1) could be addressed by simply adding the ~200 lines from _testbuffer that add complete multi-dimensional support for tolist() and slicing, if you can stomach more diffs for this issue. :) richcompare() would need a recursive cmp function as well... For 2), I really only know of NumPy or _testbuffer() as a way of creating regular multi-dimensional arrays for demo purposes. http://bugs.python.org/review/10181/diff/4057/13587#newcode2643 Doc/library/stdtypes.rst:2643: the array would use in a contiguous representation. It is not neccesarily Sounds good. http://bugs.python.org/review/10181/diff/4057/13589 File Include/memoryobject.h (right): http://bugs.python.org/review/10181/diff/4057/13589#newcode60 Include/memoryobject.h:60: #ifndef Py_LIMITED_API Is there still a reason now to exclude these from the limited API? http://bugs.python.org/review/10181/diff/4057/13589#newcode78 Include/memoryobject.h:78: } PyManagedBufferObject; Yes, this should definitely be private. I'll change that. http://bugs.python.org/review/10181/diff/4057/13589#newcode81 Include/memoryobject.h:81: #define Py_MEMORYVIEW_MAX_NDIM 64 Actually I think we should limit NDIM officially as part of the buffer API, e.g. PyBUF_MAX_NDIM. IIRC NumPy has a max of 32. 64 dimensions with a shape of 2 in each dimension and an itemsize of 1 would require 2**64 bytes, so I don't think this will be a problem. :) About the other flags: Py_MEMORYVIEW_C etc. are somewhat useful. Once they are set, you can avoid calling the relatively expensive PyBuffer_IsContiguous(). Perhaps even MV_C_CONTIGUOUS() etc. should be public (under different names)? http://bugs.python.org/review/10181/diff/4057/13599 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/4057/13599#newcode662 Objects/memoryobject.c:662: assert(src->ndim <= Py_MEMORYVIEW_MAX_NDIM); Good catch. This should really be 'ndim' in the assert. The assert is technically not needed, but I left it in case someone uses mbuf_add_incomplete_view() in other places and forgets to pass a validated 'ndim'. http://bugs.python.org/review/10181/diff/4057/13599#newcode762 Objects/memoryobject.c:762: v->ob_type->tp_name); OK, that's nicer.
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/4057/13589 File Include/memoryobject.h (right): http://bugs.python.org/review/10181/diff/4057/13589#newcode78 Include/memoryobject.h:78: } PyManagedBufferObject; Done in: http://hg.python.org/features/pep-3118/rev/6608872a2519 http://bugs.python.org/review/10181/diff/4057/13589#newcode81 Include/memoryobject.h:81: #define Py_MEMORYVIEW_MAX_NDIM 64 Introduced PyBUF_MAX_NDIM=64: http://hg.python.org/features/pep-3118/rev/1705a1f69532 http://bugs.python.org/review/10181/diff/4057/13599 File Objects/memoryobject.c (right): http://bugs.python.org/review/10181/diff/4057/13599#newcode662 Objects/memoryobject.c:662: assert(src->ndim <= Py_MEMORYVIEW_MAX_NDIM); Done in: http://hg.python.org/features/pep-3118/rev/3b82f529c397 http://bugs.python.org/review/10181/diff/4057/13599#newcode762 Objects/memoryobject.c:762: v->ob_type->tp_name); Done in: http://hg.python.org/features/pep-3118/rev/a3d917d67351
Sign in to reply to this message.
http://bugs.python.org/review/10181/diff/4057/13587 File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/10181/diff/4057/13587#newcode2643 Doc/library/stdtypes.rst:2643: the array would use in a contiguous representation. It is not neccesarily Done in: http://hg.python.org/features/pep-3118/rev/52251eca42fa
Sign in to reply to this message.
I didn't see anything significant this time around - I say go for it :) http://bugs.python.org/review/10181/diff/4057/13586 File Doc/c-api/typeobj.rst (right): http://bugs.python.org/review/10181/diff/4057/13586#newcode1263 Doc/c-api/typeobj.rst:1263: as the *view* argument. I was the original proponent of the "address must stay the same" point of view, but I was persuaded that keying off the internal pointer (as you now advise) is the better approach. http://bugs.python.org/review/10181/diff/4095/13888 File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/10181/diff/4095/13888#newcode2472 Doc/library/stdtypes.rst:2472: >>> m2.tolist() Very minor style nit - you switched the later examples to x/y, but still use m1/m2 here.
Sign in to reply to this message.
|