classification
Title: Add context manager protocol to memoryviews
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, ncoghlan, pitrou
Priority: normal Keywords: patch

Created on 2010-09-03 15:11 by pitrou, last changed 2010-09-09 13:00 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
memcontext.patch pitrou, 2010-09-03 15:11
memcontext2.patch pitrou, 2010-09-08 19:14
Messages (12)
msg115459 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-03 15:11
This adds the context manager protocol to memoryview objects, as proposed on python-dev. Once the `with` block has finished, the underlying buffer is released and any operation on the memoryview raises a ValueError.
msg115477 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-03 17:49
Guido expressed skepticism at the idea:
http://mail.python.org/pipermail/python-dev/2010-September/103442.html
msg115759 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-09-07 13:41
I closed 9789 as a duplicate of this. Bringing the details from that issue over here:
====================================
memoryview objects currently offer no way to explicitly release the underlying buffer.

This may cause problems for mutable objects that are locked while PEP 3118 buffer references remain unreleased (e.g. in 3.2, io.BytesIO supports getbuffer() for direct access to the underlying memory, but disallows resizing until the associated memoryview goes away).

This isn't too bad in CPython due to explicit refcounting, but may be an issue when using other implementations since the time between release of the last reference and actual garbage collection is indeterminate. For example, the new test_getbuffer in the BytesIOMixin class in the test suite can't rely on "del buf" promptly releasing the underlying PEP 3118 buffer, so it is forced to also invoke a full GC collection cycle in order to be portable to other implementations.

So there are two separate questions here:
1. Whether or not to add an explicit "release()" method to memoryview objects (this would be sufficient to address the problem)
2. Whether or not to add direct context management support to memoryview objects (this isn't really necessary, since a short context manager can do the same thing, but may be a nice convenience)

Guido was -0 on the idea of supporting the context management protocol, but the rationale presented to him at the time was lacking the key concept of behavioural changes in the object owning the buffer based on whether or not there were any outstanding buffer references.
msg115782 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2010-09-07 15:46
Given this explanation, of course I am +1 on an explicit release() method.  But I'm still skeptical that a context manager adds much (not sure if that counts as -0 or +0 :-).

I suppose after release() is called all accesses through the memoryview object should be invalid, right?  Is this not covered by PEP 3118 at all?
msg115785 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-07 15:56
> Given this explanation, of course I am +1 on an explicit release()
> method.  But I'm still skeptical that a context manager adds much (not
> sure if that counts as -0 or +0 :-).

Ok, release() is probably enough.

> I suppose after release() is called all accesses through the
> memoryview object should be invalid, right?

Indeed. The patch tests for that (it uses "with" to release the
memoryview, but it wouldn't be hard to change it for a release()
method).

> Is this not covered by PEP 3118 at all?

The PEP says “this memory view object holds on to the memory of base
[i.e. the object the buffer was acquired from] until it is deleted”.
Apparently issues pertaining to delayed garbage collection weren't
raised at the time.
msg115810 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-09-07 22:03
>> Is this not covered by PEP 3118 at all?
>
> The PEP says “this memory view object holds on to the memory of base
> [i.e. the object the buffer was acquired from] until it is deleted”.
> Apparently issues pertaining to delayed garbage collection weren't
> raised at the time.

As with a few(!) other things in relation to this PEP, the primary
consumers were most interested in the C API side of things, so we
collectively missed relevant details on the Python side.

+1 on adding release() (obviously), and +1 on direct support for
context management (it seems very analogous to file.close to me, so
direct support makes more sense than leaving it out).
msg115890 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-08 19:14
Here is a new patch adding the release() method as well.
msg115903 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-09-08 21:22
On an eyeball review, the structure of do_release seems a little questionable to me. I'd be happier if view.obj and view.buf were copied to C locals and then set to NULL at the start of the function before any real work is done.

I believe the buffer release process can trigger __del__ methods (and possibly other Python code), which may currently see the memoryview in a slightly inconsistent state when reference cycles are involved. Setting these two references to NULL early should keep that from happening (the idea of using C locals to avoid this is the same concept as is employed by the Py_CLEAR macro)

For the test code, I suggest including:
- a test where release() is called inside the cm to check that __exit__ does not raise ValueError.
- check getbuffer() correctly raises ValueError
- tests that the other properties correctly raise ValueError (ndim, shape, strides, format, itemsize, suboffsets)

(For the properties, perhaps using a list of strings, a loop and getattr rather than writing each one out separately)

_check_released should perhaps also take a second argument that is used for the last two NotEqual checks. That is:

def _check_released(self, m, tp):
  ...
  self.assertNotEqual(m, memoryview(tp()))
  self.assertNotEqual(m, tp())

And then pass in tp as well as m for the respective tests.

There's also a minor typo in a comment in the tests: "called at second" should be "called a second".

Other than that, looks good to me.
msg115904 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-09-08 21:24
One other question: should IS_RELEASED use "||" rather than "&&"?

Is there any case where either of those pointers can be NULL and we still want to continue on rather than bailing out with a ValueError?
msg115913 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-08 22:48
> On an eyeball review, the structure of do_release seems a little
> questionable to me. I'd be happier if view.obj and view.buf were
> copied to C locals and then set to NULL at the start of the function
> before any real work is done.

You can't do that, since PyBuffer_Release() needs the information (at
least `obj`, and then the underlying object might rely on `buf` in its
bf_releasebuffer method).

(note, this code is simply factored out from tp_dealloc)

> For the test code, I suggest including:
> - a test where release() is called inside the cm to check that
> __exit__ does not raise ValueError.
> - check getbuffer() correctly raises ValueError
> - tests that the other properties correctly raise ValueError (ndim,
> shape, strides, format, itemsize, suboffsets)

Ok. I was bit too lazy and didn't include all the API.

> (For the properties, perhaps using a list of strings, a loop and
> getattr rather than writing each one out separately)

It's more clever but less practical if there's a failure: you can't know
up front which property failed the test.

> _check_released should perhaps also take a second argument that is
> used for the last two NotEqual checks. That is:

Hmm, why not.

> There's also a minor typo in a comment in the tests: "called at
> second" should be "called a second".

Ah, thanks.

> One other question: should IS_RELEASED use "||" rather than "&&"?
> 
> Is there any case where either of those pointers can be NULL and we
> still want to continue on rather than bailing out with a ValueError?

Well, view.obj can be NULL in case the buffer has been filled by hand by
some C code which simply wants to export a lazy view to some chunk of
memory (the IO lib uses this).

I could simply test for view.buf. I was thinking that maybe NULL could
be a proper data pointer on some platforms, but that would probably be
crazy (since using NULL as "non initialized" or "failed doing something"
is so common in C code).
msg115941 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-09-09 12:32
Sounds good - I'd say just commit whenever you're happy with it then.
msg115945 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-09 13:00
Committed in r84649. Thanks for the comments!
History
Date User Action Args
2010-09-09 13:00:03pitrousetstatus: open -> closed
resolution: fixed
messages: + msg115945

stage: patch review -> resolved
2010-09-09 12:32:44ncoghlansetmessages: + msg115941
2010-09-08 22:49:00pitrousetmessages: + msg115913
2010-09-08 21:24:38ncoghlansetmessages: + msg115904
2010-09-08 21:22:35ncoghlansetmessages: + msg115903
2010-09-08 19:14:12pitrousetfiles: + memcontext2.patch

messages: + msg115890
2010-09-07 22:03:43ncoghlansetmessages: + msg115810
2010-09-07 15:56:25pitrousetmessages: + msg115785
2010-09-07 15:46:55gvanrossumsetnosy: + gvanrossum
messages: + msg115782
2010-09-07 13:41:13ncoghlansetnosy: + ncoghlan
messages: + msg115759
2010-09-07 13:37:30ncoghlanlinkissue9789 superseder
2010-09-03 17:49:46pitrousetmessages: + msg115477
2010-09-03 15:11:02pitroucreate