msg170183 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-09-10 12:23 |
Currently rawiobase_read() reads to a bytearray object and then copies the data to a bytes object.
There is a TODO comment saying that the bytes object should be created directly. The attached patch does that.
|
msg170226 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2012-09-10 20:01 |
It works as along as the bytes object cannot leak to Python code, (imagine a custom readinto() method which plays with gc.get_referrers, then calls hash(b)...)
This is OK with this patch.
|
msg170624 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-09-17 20:17 |
New patch which checks the refcount of the memoryview and bytes object after calling readinto().
If either refcount is larger than the expected value of 1, then the data is copied rather than resized.
|
msg170627 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-09-17 22:30 |
> If either refcount is larger than the expected value of 1, then the
> data is copied rather than resized.
I think that's a useless precaution. The bytes object cannot "leak" since you are using PyMemoryView_FromMemory(), which doesn't know about the original object.
Out of curiousity, have you done any benchmarks?
|
msg170628 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-09-17 23:04 |
> I think that's a useless precaution. The bytes object cannot "leak"
> since you are using PyMemoryView_FromMemory(), which doesn't know about
> the original object.
The bytes object cannot "leak" so, as you say, checking that refcount is pointless. But the view might "leak", and since it does not own a reference to the base object we have a problem: we can't deallocate the bytes object for fear of breaking the view.
It looks like objects returned by PyMemoryView_FromMemory() must never be allowed to "leak", so I am not sure there are many circumstances in which PyMemoryView_FromMemory() is safe to use.
Perhaps using PyBuffer_FillInfo() and PyMemory_FromBuffer() would keep alive the bytes object while the view is alive, without letting the bytes object "leak".
> Out of curiousity, have you done any benchmarks?
No.
|
msg170629 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-09-17 23:14 |
> The bytes object cannot "leak" so, as you say, checking that refcount
> is pointless. But the view might "leak", and since it does not own a
> reference to the base object we have a problem: we can't deallocate the
> bytes object for fear of breaking the view.
Indeed, that's a problem (but your patch does deallocate the bytes object).
It's quite fishy, I'm not sure how to solve the issue cleanly. Stefan, do you have an idea?
|
msg170638 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-18 08:33 |
So the problem is that readinto(view) might result in several references
to view? I don't think that can be solved on the memoryview side.
One could do:
view = PyMemoryView_FromObject(b);
// Lie about writability
((PyMemoryViewObject *)view)->view.readonly = 0;
[...]
Then the view owns a reference to the bytes object. But that does not
solve the problem that writable memoryviews based on a readonly object
might be hanging around.
|
msg170639 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-09-18 10:47 |
> Then the view owns a reference to the bytes object. But that does not
> solve the problem that writable memoryviews based on a readonly object
> might be hanging around.
How about doing
PyObject_GetBuffer(b, &buf, PyBUF_WRITABLE);
view = PyMemoryView_FromBuffer(&buf);
// readinto view
PyBuffer_Release(&buf);
Would attempts to access a "leaked" reference to view now result in ValueError("operation forbidden on released memoryview object")? If so then I think this would be safe.
|
msg170642 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-09-18 12:49 |
The current non-test uses of PyMemoryView_FromBuffer() are in _io.BufferedReader.read(), _io.BufferedWriter.write(), PyUnicode_Decode().
It looks like they can each be made to leak a memoryview that references a deallocated buffer. (Maybe the answer is Don't Do That.)
import codecs, sys
def decode(buf):
global view
view = buf
return codecs.latin_1_decode(buf)
def getregentry():
return codecs.CodecInfo(name='foobar', decode=decode,
encode=codecs.latin_1_encode)
@codecs.register
def search_function(encoding):
if encoding == 'foobar':
return codecs.CodecInfo(*getregentry())
b = b'hello'.upper()
b.decode('foobar')
print(view.tobytes()) # => b'HELLO'
del b
x = b'dump'.upper()
print(view.tobytes()) # => b'DUMP\x00'
import io, sys
class File(io.RawIOBase):
def readinto(self, buf):
global view
view = buf
n = len(buf)
buf[:] = b'x'*n
return n
def readable(self):
return True
f = io.BufferedReader(File())
f.read(1)
print(view[:5].tobytes()) # => b'xxxxx'
del f
print(view[:5].tobytes()) # => b'\xdd\xdd\xdd\xdd\xdd'
|
msg170658 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-09-18 18:19 |
I am rather confused about the ownership semantics when one uses PyMemoryView_FromBuffer().
It looks as though PyMemoryView_FromBuffer() "steals" ownership of the buffer since, when the associated _PyManagedBufferObject is garbage collected, PyBuffer_Release() is called on its copy of the buffer info. However, the _PyManagedBufferObject does not own a reference of the base object, so one still needs to decref the base object (at some time when it is safe to do so).
So am I right in thinking that
PyObject_GetBuffer(obj, &buf, ...);
view = PyMemoryView_FromBuffer(&buf); // view->master owns the buffer, but view->master->obj == NULL
...
Py_DECREF(view); // releases buffer (assuming no other exports)
Py_XDECREF(buf.obj);
has balanced refcounting and is more or less equivalent to
view = PyMemoryView_FromObject(obj);
...
Py_DECREF(view);
The documentation is not very helpful. It just says that calls to PyObject_GetBuffer() must be matched with calls to PyBuffer_Release().
|
msg170659 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-18 18:20 |
Richard Oudkerk <report@bugs.python.org> wrote:
> PyObject_GetBuffer(b, &buf, PyBUF_WRITABLE);
> view = PyMemoryView_FromBuffer(&buf);
> // readinto view
> PyBuffer_Release(&buf);
>
> Would attempts to access a "leaked" reference to view now result in ValueError("operation forbidden on released memoryview object")? If so then I think this would be safe.
You would need to call memory_release(). Perhaps we can just expose it on the
C-API level as PyMemoryView_Release().
IMO the use of PyObject_GetBuffer() should be discouraged. The semantics
aren't clear (see #15821). I'd suggest using:
1) A buffer provider is involved (the case here):
PyMemoryView_From Object()
2) A piece of memory needs to be wrapped temporarily and no references
to the memoryview are "leaked" on the Python level:
PyMemoryView_FromMemory()
3) A piece of memory needs to be packaged as a memoryview with automatic
cleanup in mbuf_dealloc():
PyMemoryView_FromBufferWithCleanup() (proposed in msg169613)
So I think the combination of PyMemoryView_FromObject() with a call to
PyMemoryView_Release() should indeed work here.
|
msg170660 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-18 18:22 |
Richard Oudkerk <report@bugs.python.org> wrote:
> The documentation is not very helpful. It just says that calls
> to PyObject_GetBuffer() must be matched with calls to PyBuffer_Release().
Yes, we need to sort that out, see #15821.
|
msg170675 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-09-18 20:47 |
> You would need to call memory_release(). Perhaps we can just expose it on the
> C-API level as PyMemoryView_Release().
Should PyMemoryView_Release() release the _PyManagedBufferObject by doing mbuf_release(view->mbuf) even if view->mbuf->exports > 0?
Doing
Py_TYPE(view->mbuf)->tp_clear((PyObject *)view->mbuf);
seems to have the desired effect of causing ValueError when I try to access any associated memoryview.
> 3) A piece of memory needs to be packaged as a memoryview with automatic
> cleanup in mbuf_dealloc():
>
> PyMemoryView_FromBufferWithCleanup() (proposed in msg169613)
Maybe this should also handle decrefing the base object (given a flag PyManagedBuffer_FreeObj). I do worry about creating memoryviews that survive deallocation of the base object.
|
msg170676 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-09-18 20:56 |
> So I think the combination of PyMemoryView_FromObject() with a call to
> PyMemoryView_Release() should indeed work here.
I don't think we want to expose a mutable bytes object to outside code, so IMO PyMemoryView_FromMemory() is preferrable.
|
msg170678 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-18 21:19 |
Richard Oudkerk <report@bugs.python.org> wrote:
> Should PyMemoryView_Release() release the _PyManagedBufferObject by doing mbuf_release(view->mbuf) even if view->mbuf->exports > 0?
No, I think it should really just be a wrapper:
diff --git a/Objects/memoryobject.c b/Objects/memoryobject.c
--- a/Objects/memoryobject.c
+++ b/Objects/memoryobject.c
@@ -1093,6 +1093,12 @@
return memory_release((PyMemoryViewObject *)self, NULL);
}
+PyObject *
+PyMemoryView_Release(PyObject *m)
+{
+ return memory_release((PyMemoryViewObject *)m, NULL);
+}
+
We decided in #10181 not to allow releasing a view with exports, since the
logic is already quite complex. Is there a reasonable expectation that
existing code creates memoryviews of the readinto() argument?
|
msg170679 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-18 21:41 |
Antoine Pitrou <report@bugs.python.org> wrote:
> I don't think we want to expose a mutable bytes object to outside code,
> so IMO PyMemoryView_FromMemory() is preferrable.
I agree, but PyMemoryView_FromMemory(PyBytes_AS_STRING(b), n, PyBUF_WRITE)
just hides the fact that a mutable bytes object is exposed.
Are we talking about a big speedup here or could we perhaps just keep
the existing code?
|
msg170680 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-09-18 21:51 |
> Antoine Pitrou <report@bugs.python.org> wrote:
> > I don't think we want to expose a mutable bytes object to outside code,
> > so IMO PyMemoryView_FromMemory() is preferrable.
>
> I agree, but PyMemoryView_FromMemory(PyBytes_AS_STRING(b), n, PyBUF_WRITE)
> just hides the fact that a mutable bytes object is exposed.
Except that the mutable bytes object is not exposed to any outside code,
so that weird behaviour can't be observed.
|
msg170681 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-09-18 22:02 |
> Are we talking about a big speedup here or could we perhaps just keep
> the existing code?
I doubt it is worth the hassle. But I did want to know if there was a clean way to do what I wanted.
|
msg190457 - (view) |
Author: Dwight Guth (dwight.guth) |
Date: 2013-06-01 23:23 |
I was programming something today and thought I should let you know I came across a situation where the current behavior of this function is able to expose what seems to be raw memory to the user.
import io
class A(io.RawIOBase):
def readinto(self, b):
return len(b)
A().read(100)
|
msg223225 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2014-07-16 15:48 |
Please note this is also referred to from #15994.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:35 | admin | set | github: 60107 |
2019-04-08 13:06:30 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2019-04-08 13:02:20 | methane | set | nosy:
+ methane
|
2014-10-14 18:01:58 | skrah | set | nosy:
- skrah
|
2014-07-16 15:48:19 | BreamoreBoy | set | nosy:
+ BreamoreBoy
messages:
+ msg223225 versions:
+ Python 3.5 |
2013-06-01 23:23:23 | dwight.guth | set | nosy:
+ dwight.guth messages:
+ msg190457
|
2012-09-18 22:02:18 | sbt | set | messages:
+ msg170681 |
2012-09-18 21:51:38 | pitrou | set | messages:
+ msg170680 |
2012-09-18 21:41:23 | skrah | set | messages:
+ msg170679 |
2012-09-18 21:19:27 | skrah | set | messages:
+ msg170678 |
2012-09-18 20:56:43 | pitrou | set | messages:
+ msg170676 |
2012-09-18 20:50:10 | vstinner | set | nosy:
+ vstinner
|
2012-09-18 20:47:55 | sbt | set | messages:
+ msg170675 |
2012-09-18 18:22:55 | skrah | set | messages:
+ msg170660 |
2012-09-18 18:20:41 | skrah | set | messages:
+ msg170659 |
2012-09-18 18:19:43 | sbt | set | messages:
+ msg170658 |
2012-09-18 12:49:29 | sbt | set | messages:
+ msg170642 |
2012-09-18 10:47:49 | sbt | set | messages:
+ msg170639 |
2012-09-18 08:33:54 | skrah | set | messages:
+ msg170638 |
2012-09-17 23:14:26 | pitrou | set | nosy:
+ skrah messages:
+ msg170629
|
2012-09-17 23:04:56 | sbt | set | messages:
+ msg170628 |
2012-09-17 22:30:01 | pitrou | set | nosy:
+ pitrou messages:
+ msg170627
|
2012-09-17 20:59:07 | serhiy.storchaka | set | type: performance components:
+ Library (Lib), IO |
2012-09-17 20:17:09 | sbt | set | files:
+ iobase_read.patch
messages:
+ msg170624 |
2012-09-10 20:01:10 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg170226
|
2012-09-10 18:01:46 | jcea | set | nosy:
+ jcea stage: patch review
versions:
+ Python 3.4 |
2012-09-10 12:23:03 | sbt | create | |