Message 128537 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	kermode, loewis, mark.dickinson, ncoghlan, pitrou, pv, rupole, teoliphant
Date	2011-02-14.11:11:53
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<AANLkTimc8H8Ayiosr+x0KddMv-=enR+zQnHDxZyguOkg@mail.gmail.com>
In-reply-to	<1297636420.3802.72.camel@localhost.localdomain>

Content
On Mon, Feb 14, 2011 at 8:33 AM, Antoine Pitrou <report@bugs.python.org> wrote: >> I'm still not comfortable with a convention that relies on clients >> of the PEP 3118 API not mucking with the internals of the Py_buffer >> struct. > > Which clients? Those who export the buffer, or those who consume it? Consumers. (I'll try to stick to provider/consumer terminology, as that's clearer in this context >> I'm much happier with the rule based on malloc/free semantics where >> the pointer passed to PyObject_GetBuffer must match a single later >> call to PyObject_ReleaseBuffer. > > Agreed that Py_buffer should have been a PyObject from the start, but > the PEP chose differently. malloc/free modelled semantics have nothing to do with Py_buffer being a full PyObject in its own right. All I mean is that whatever pointer you call ReleaseBuffer with should be the one you passed to GetBuffer, and the only thing tp_releasebuffer implementations should rely on is the address of that pointer rather than the struct contents. However, from what Pauli has said, we may want to go with the alternative approach of saying the struct address is irrelevant, and only the content matter, using the "internal" field to disambiguate different exported buffers. I believe either will work, and either places additional constraints on buffer API consumers that aren't currently clearly documented. > We now have backwards compatibility constraints. What do we do with > PyMemoryView_FromBuffer? Also, there's probably some code out there that > likes to copy Py_buffers around. Such code is likely to be broken regardless of how we clarify the semantics, in the same way that our own dup_buffer is currently broken under either set of semantics (i.e. calling ReleaseBuffer with a different address in one case, clobbering the "internal" field in other cases). We will probably need to expose an official Py_buffer copying function that gets all the subtle details right so that extension authors can more easily avoid making the same mistakes. >> As far as the question of re-exporting the underlying view or not >> goes, I agree having "memoryview(a)" potentially refer to different >> underlying memory from "a" itself (because the source object has >> changed since the first view was exported) is a recipe for confusion. > > If an object changes its buffer while it's exported somewhere, it will > always result in confusion for the user, regardless of how the > memoryview object is implemented. All normal uses of the buffer API > assume that the buffer's memory doesn't change while it's being accessed > by its consumer (what would it mean to SHA1-hash or zlib-compress a > changing piece of memory?). > So I don't know why the memoryview object in particular should care > about this. I'm not talking about an exported view changing its details (that's explicitly disallowed by the PEP), I'm talking about the fact that sequential calls to PyObject_GetBuffer are permitted to return different answers. That's the point Pauli's PictureSet example illustrated - even though the toy example uses a somewhat clumsy API, it's perfectly legitimate according to the documentation, and it shows that the current memoryview implementation may behave strangely when you copy or slice a view of a mutable object, even though the view itself is guaranteed to remain valid. Consider the following: Traceback (most recent call last): File "<stdin>", line 1, in <module> BufferError: Existing exports of data: object cannot be re-sized Now suppose that instead of disallowing the resize, bytearray (or a similar object) permitted it by allocating a new memory buffer, while keeping a reference to the old buffer around until the memoryview releases it (an approach that is perfectly legitimate according to the PEP). In that case, our current "use the source object" approach to memoryview copying and slicing will backfire badly, since copies and slices will be working off the new (empty) state of the object, while the original memoryview will still be looking at the old populated state. I think Pauli's right, we need to make memoryview re-exporting significantly smarter in order to cope correctly with mutable objects.

On Mon, Feb 14, 2011 at 8:33 AM, Antoine Pitrou <report@bugs.python.org> wrote:
>> I'm still not comfortable with a convention that relies on *clients*
>> of the PEP 3118 API not mucking with the internals of the Py_buffer
>> struct.
>
> Which clients? Those who export the buffer, or those who consume it?

Consumers. (I'll try to stick to provider/consumer terminology, as
that's clearer in this context

>> I'm *much* happier with the rule based on malloc/free semantics where
>> the *pointer* passed to PyObject_GetBuffer must match a single later
>> call to PyObject_ReleaseBuffer.
>
> Agreed that Py_buffer should have been a PyObject from the start, but
> the PEP chose differently.

malloc/free modelled semantics have *nothing* to do with Py_buffer
being a full PyObject in its own right. All I mean is that whatever
pointer you call ReleaseBuffer with should be the one you passed to
GetBuffer, and the only thing tp_releasebuffer implementations should
rely on is the address of that pointer rather than the struct
contents. However, from what Pauli has said, we may want to go with
the alternative approach of saying the struct address is irrelevant,
and only the content matter, using the "internal" field to
disambiguate different exported buffers. I believe either will work,
and either places additional constraints on buffer API consumers that
aren't currently clearly documented.

> We now have backwards compatibility constraints. What do we do with
> PyMemoryView_FromBuffer? Also, there's probably some code out there that
> likes to copy Py_buffers around.

Such code is likely to be broken regardless of how we clarify the
semantics, in the same way that our own dup_buffer is currently broken
under either set of semantics (i.e. calling ReleaseBuffer with a
different address in one case, clobbering the "internal" field in
other cases). We will probably need to expose an official Py_buffer
copying function that gets all the subtle details right so that
extension authors can more easily avoid making the same mistakes.

>> As far as the question of re-exporting the underlying view or not
>> goes, I agree having "memoryview(a)" potentially refer to different
>> underlying memory from "a" itself (because the source object has
>> changed since the first view was exported) is a recipe for confusion.
>
> If an object changes its buffer while it's exported somewhere, it will
> always result in confusion for the user, regardless of how the
> memoryview object is implemented. All normal uses of the buffer API
> assume that the buffer's memory doesn't change while it's being accessed
> by its consumer (what would it mean to SHA1-hash or zlib-compress a
> changing piece of memory?).
> So I don't know why the memoryview object *in particular* should care
> about this.

I'm not talking about an exported view changing its details (that's
explicitly disallowed by the PEP), I'm talking about the fact that
sequential calls to PyObject_GetBuffer are permitted to return
different answers. That's the point Pauli's PictureSet example
illustrated - even though the toy example uses a somewhat clumsy API,
it's perfectly legitimate according to the documentation, and it shows
that the current memoryview implementation may behave strangely when
you copy or slice a view of a mutable object, even though the view
itself is guaranteed to remain valid.

Consider the following:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: Existing exports of data: object cannot be re-sized

Now suppose that instead of disallowing the resize, bytearray (or a
similar object) permitted it by allocating a new memory buffer, while
keeping a reference to the old buffer around until the memoryview
releases it (an approach that is perfectly legitimate according to the
PEP). In that case, our current "use the source object" approach to
memoryview copying and slicing will backfire badly, since copies and
slices will be working off the *new* (empty) state of the object,
while the original memoryview will still be looking at the old
populated state. I think Pauli's right, we need to make memoryview
re-exporting significantly smarter in order to cope correctly with
mutable objects.

History
Date	User	Action	Args
2011-02-14 11:11:54	ncoghlan	set	recipients: + ncoghlan, loewis, teoliphant, mark.dickinson, rupole, kermode, pitrou, pv
2011-02-14 11:11:53	ncoghlan	link	issue10181 messages
2011-02-14 11:11:53	ncoghlan	create