This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients jcon, kermode, mark.dickinson, ncoghlan, petri.lehtinen, pitrou, pv, rupole, skrah, teoliphant
Date 2011-06-27.13:17:55
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1309180676.88.0.541176207355.issue10181@psf.upfronthosting.co.za>
In-reply-to
Content
I'll try to do a summary of the conversation so far, since it's quite long and hard to follow.

The basic issue is that memoryview needs to support copying and slicing that creates a new memoryview object. The major problem with that is that the PEP 3118 semantics as implemented operate in such a way that neither copying the Py_buffer struct *nor* requesting a new copy of the struct from the underlying object will do the right thing in all cases. (According to the PEP *as written* copying probably should have been OK, but the implementation doesn't match the PEP in several important respects such that copying is definitely wrong in the absence of tight control of the lifecycles of copies relative to the original).

Therefore, we either need to redesign the buffer export from memoryview to use daisy chaining (such that in "m = memoryview(obj); m2 = m[:]; m3 = m2[:]" m3 references m2 which references m which in turn references obj) or else we need to introduce an internal reference counted object (PyManagedBuffer) which allows a single view of an underlying object to be safely shared amongst multiple clients (such that m, m2 and m3 would all reference the same managed buffer instance which holds the reference to obj). My preference is strongly for the latter approach as it prevents unbounded and wasteful daisy chaining while also providing a clean, easy to use interface that will make it easier for 3rd parties to write PEP 3118 API consumers (by using PyManagedBuffer instead of the raw Py_buffer struct).

Once that basic lifecycle problem for the underlying buffers is dealt with then we can start worrying about other problems like exporting Py_buffer objects from memoryview instances correctly. The lifecycle problem is unrelated to the details of the buffer *contents* though - it's entirely about the fact that clients can't safely copy all those pointers (as some may refer to addresses inside the struct) and asking the original object for a fresh copy is permitted to return a different answer each time.

The actual *slicing* code in memoryview isn't too bad - it just needs to use dedicated storage rather than messing with the contents of the Py_buffer struct it received from the underlying object. Probably the easiest way to handle that is by having the PyManagedBuffer reference be in *addition* to the current Py_buffer struct in the internal state - then the latter can be used to record the effects of the slicing, if any. Because we know the original Py_buffer struct is guaranteed to remain alive and unmodified, we don't need to worry about fiddling with any copied pointers - we can just leave them pointing into the original structure.

When accessed via the PEP 3118 API, memoryview objects would then export that modified Py_buffer struct rather than the original one (so daisychaining would be possible, but we wouldn't make it easy to do from pure Python code, as both the memoryview constructor and slicing would give each new memoryview object a reference to the original managed buffer and just update the internal view details as appropriate.

Here's the current MemoryView definition:

typedef struct {
    PyObject_HEAD
    Py_buffer view;
} PyMemoryViewObject;

The TL;DR version of the above is that I would like to see it become:

typedef struct {
    PyObject_HEAD
    PyManagedBuffer source_data; // shared read-only Py_buffer access
    Py_buffer view;  // shape, strides, etc potentially modified
} PyMemoryViewObject;

Once the internal Py_buffer had been initialised, the memoryview code actually wouldn't *use* the source data reference all that much (aside from eventually releasing the buffer, it wouldn't use it at all). Instead, that reference would be retained solely to control the lifecycle of the original Py_buffer object relative to the modified copies in the various memoryview instances.

Does all that make my perspective any clearer?
History
Date User Action Args
2011-06-27 13:17:57ncoghlansetrecipients: + ncoghlan, teoliphant, mark.dickinson, rupole, kermode, pitrou, pv, skrah, jcon, petri.lehtinen
2011-06-27 13:17:56ncoghlansetmessageid: <1309180676.88.0.541176207355.issue10181@psf.upfronthosting.co.za>
2011-06-27 13:17:56ncoghlanlinkissue10181 messages
2011-06-27 13:17:55ncoghlancreate