classification
Title: memoryview from BufferedWriter becomes garbage
Type: behavior Stage:
Components: IO Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: memoryview to freed memory can cause segfault
View: 15994
Assigned To: Nosy List: benjamin.peterson, martin.panter, pitrou, serhiy.storchaka, stutzbach
Priority: normal Keywords:

Created on 2016-04-09 12:08 by martin.panter, last changed 2016-04-18 13:43 by martin.panter. This issue is now closed.

Messages (9)
msg263083 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-09 12:08
>>> class Raw(RawIOBase):
...     def writable(self): return True
...     def write(self, b):
...         global written
...         written = b
...         return len(b)
... 
>>> writer = BufferedWriter(Raw())
>>> writer.write(b"blaua")
5
>>> raw = writer.detach()
>>> written
<memory at 0x7fd37f7b1aa8>
>>> written.tobytes()
b'blaua'
>>> del writer
>>> written.tobytes()  # Garbage
b'\x80f\xab\x00\x00'

Assuming this is pointing into unallocated memory, maybe it could trigger a segfault, though I haven’t seen that.

I haven’t looked at the implementation. But I am guessing that BufferedWriter is passing a view of its internal buffer to write(). For Python 2, perhaps the fix is to check if that memoryview is still referenced, and allocate a new buffer if so. 3.5 should probably inherit this fix.

Another option for 3.6 might be to call release() when write() returns. This should be documented (along with the fact that memoryview is possible in the first place; see Issue 20699).
msg263113 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-09 21:39
Another option is to use the bytes object as internal buffer. If its refcount is == 1 (normal case), it is modified in-place, otherwise (if the reference is saved outside) new bytes object is created.
msg263114 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-09 22:11
On further thought, I think releasing the buffer would not be the best long-term solution. I encountered this bug when wrapping custom AuditableBytesIO objects (Lib/test/test_httpservers.py) with BufferedWriter, where the raw object just saves each write() buffer in a list for later use.

Serhiy: what you say sounds like what I had in mind, except I suspect it doesn’t matter whether the memoryview is backed by a bytes object or something else. The main point is we allocate a new buffer if the old one is still referenced by the memoryview.

It seems this problem has already been discovered, along with BufferedReader and PyUnicode_Decode(): <https://bugs.python.org/issue15903#msg170642>. The BufferedReader case can have more serious consequences because it is writable (Issue 15994).
msg263172 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-11 09:29
To create a memoryview with unlimited lifetime, I understand we need to nominate an “exporting object”, which becomes memoryview.obj. Using a bytes object here might be the simplest fix for just BufferedWriter.

However it looks like the buffer is shared with BufferedReader and others. To fix the analogous bug with BufferedReader, a bytearray might be better, because the user could see it being mutated when reading into the memoryview.

I think bytearray might be okay for BufferedWriter too, as long as we prevent it being resized. The user would be able to alter the contents of the buffer, but I don’t see that as a problem. An alternative would be a new opaque object that doesn’t do much except have a reference count.
msg263248 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-12 13:39
I realize there is another problem, and doing tricks with a bytes object won’t help that. BufferedWriter bypasses its own buffer for large writes:

>>> writer = BufferedWriter(Raw())
>>> large = bytearray(10000)
>>> writer.write(large)
10000
>>> written.tobytes()[:10]
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> large[:5] = b"blaua"
>>> written.tobytes()[:10]
b'blaua\x00\x00\x00\x00\x00'

BufferedWriter is passing a view of the original input through, without any copying. Perhaps the simplest thing is to warn and prevent the user from accessing the buffer after write() returns. I suggested some imperfect ideas in Issue 15994. Maybe I should just close this as a duplicate.
msg263272 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-12 19:39
I meant getting rid of memoryview at all and passing bytes buffer object directly to underlying write. But now I see that this idea perhaps is not feasible since we must write not only from the start of the buffer.

A writer like AuditableBytesIO is very attractive implementation. I used it multiple times, especially in tests. Your initial proposition LGTM, and perhaps this is only the solution applicable in maintained releases. Thus we should implement it in any case. In 3.6 we can add a warning.

Note that there are two kinds of references to a buffer: a reference to memoryview object and new memoryview that refers to the same buffer.
msg263304 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-13 04:24
The trouble with my original idea is that it is complex to implement, and inconsistent. If you wrote small amounts to your BufferedWriter, you would get a memoryview of bytes that you save for later. If there was a write of a large bytes object, we could pass a memoryview of that bytes object (or the actual bytes object). But if there was a write of a large bytearray, we could end up locking that bytearray, preventing it from being resized, and there could be problems mutating the bytearray.

Because I can’t see how to eliminate all inconsistencies, I prefer to document against saving the memoryview for later.
msg263623 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-17 19:19
Python implementation passes bytearray to underlying write and delete it's content just after that. Thus saving written data was never worked, and making it working don't worth efforts. I agree with you, we should add a warning against saving. This might be a part of issue20699.

As for original issue, this is potential crash, and we should fix this.
msg263665 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-18 13:43
Closing this as a duplicate of Issue 15994, where I have proposed a patch to add a note for RawIOBase.write(), and call memoryview.release() when the method returns.
History
Date User Action Args
2016-04-18 13:43:40martin.pantersetstatus: open -> closed
superseder: memoryview to freed memory can cause segfault
resolution: duplicate
messages: + msg263665
2016-04-17 19:19:30serhiy.storchakasetmessages: + msg263623
2016-04-13 04:24:46martin.pantersetmessages: + msg263304
2016-04-12 19:39:06serhiy.storchakasetmessages: + msg263272
2016-04-12 13:39:11martin.pantersetmessages: + msg263248
2016-04-11 09:29:54martin.pantersetmessages: + msg263172
2016-04-09 22:11:44martin.pantersetmessages: + msg263114
2016-04-09 21:39:51serhiy.storchakasetmessages: + msg263113
2016-04-09 12:43:12serhiy.storchakasetnosy: + pitrou, benjamin.peterson, stutzbach, serhiy.storchaka
2016-04-09 12:08:46martin.pantercreate