New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytearrays are not thread safe #47389
Comments
I found this problem when adding "print" statements to multi-threaded The following code is an extract of fileio_write (in Modules/_fileio.c),
io.BufferedWriter calls this function with a bytearray. |
Wasn't that always known ? Maybe I'm missing something here. |
The problem is not only about concurrent prints. import bz2, threading
bz2c = bz2.BZ2Compressor()
b = bytearray(b"a" * 1000000)
def f():
for x in range(10):
b[:] = b""
b[:] = bytearray(b"a" * 1000000)
threading.Thread(target=f).start()
for x in range(10):
bz2c.compress(b) bz2c.compress is a slow function, that happens to accept bytearray and |
Wow... |
Probably, but this affects all PyArg_ParseTuple("s#") calls that release |
If I try to follow the chain the consequences:
|
By the way, here's a more reliable way to make it crash (on my Linux import bz2, threading
bz2c = bz2.BZ2Compressor()
# Span at least a whole arena (256KB long)
junk_len = 512 * 1024
junk = b"a" * junk_len
buf = bytearray()
for x in range(50):
buf[:] = junk
t = threading.Thread(target=bz2c.compress, args=[buf])
t.start()
buf[:] = b""
t.join() |
Now I've just discovered there is the same problem with the import bz2, threading, array
bz2c = bz2.BZ2Compressor()
# Span at least a whole arena (256KB long)
junk_len = 512 * 1024
junk = array.array("c", b"a") * junk_len
empty = array.array("c")
buf = empty[:]
for x in range(50):
buf[:] = junk
t = threading.Thread(target=bz2c.compress, args=[buf])
t.start()
buf[:] = empty
t.join() |
I believe the 2.6 s# processing works correctly; the error is in the For 3k, convertbuffer shouldn't call bf_releasebuffer; instead, the |
Le samedi 05 juillet 2008 à 22:20 +0000, Martin v. Löwis a écrit :
getbuffer and releasebuffer exist in both 2.6 and 3.0, and bytearray As for array.array, it only implements old buffer API. |
For reference, here is a proof-of-concept patch which shows how to fix |
Fixing this in the methods themselves has the disadvantage that the |
Not blocking beta 2 because there's not enough time for the fix, but |
Here is a patch adding the s* format, and changing files, sockets, and I'd like reviewers to focus on flaws in the approach and bugs in the If this is accepted in principle, I'll forward-port it to 3.0. |
I haven't yet studied the patch in detail but I have a few questions: (1) are you sure it is safe not to INCREF the obj pointer in the (2) is it necessary to call directly bf_getbuffer & the like or is there (3) didn't you forget to convert "PyArg_ParseTuple(args, "s#iO:sendto", (4) is it really necessary to do a special case with PyString_Check() |
Travis, it would be really nice to have your input on this. |
Yes, that's the semantics of the current buffer interface, and I cannot
Perhaps. I cannot quite see what undesirable consequences that might
There is PyObject_GetBuffer and PyObject_ReleaseBuffer, but it is not
True.
That's what the code always did (for s#). It would be possible to |
With the patch the following script crashes the 2.6 interpreter on Windows: import sys, io, threading
stdout2 = io.open(sys.stdout.fileno(), mode="w")
def f(i):
for i in range(10):
stdout2.write(unicode((x, i)) + '\n')
for x in range(10):
t = threading.Thread(target=f, args=(x,))
t.start() (with py3k, replace "stdout2.write" with a simple "print") |
The problem is that the fix for bpo-3295 was committed in the py3k branch |
Sorry, that was my oversight! I've backported the fix. |
It's indeed better. Exception in thread Thread-2:
Traceback (most recent call last):
File "C:\dev\python\trunk1\lib\threading.py", line 523, in
__bootstrap_inner
self.run()
File "C:\dev\python\trunk1\lib\threading.py", line 478, in run
self.__target(*self.__args, **self.__kwargs)
File "<stdin>", line 3, in f
File "C:\dev\python\trunk1\lib\io.py", line 1473, in write
self.buffer.write(b)
File "C:\dev\python\trunk1\lib\io.py", line 1041, in write
self._write_buf.extend(b)
BufferError: Existing exports of data: object cannot be re-sized Again, I think this is unfortunate for a simple script that prints from |
Le mercredi 30 juillet 2008 à 23:03 +0000, Amaury Forgeot d'Arc a
Yes, but it's an issue with the BufferedWriter implementation, since it (in the middle term BufferedReader and BufferedWriter should perhaps be |
Two comments:
+ decoded = PyUnicode_DecodeMBCSStateful(pbuf.buf, pbuf.len, errors,
+ final ? NULL : &consumed);
+ PyBuffer_Release(&pbuf);
+ if (decoded == NULL)
return NULL;
- return codec_tuple(decoded, final ? size : consumed);
+ return codec_tuple(decoded, consumed);
}
You dropped the "final ? size : " for no apparent reason. |
You've missed the preceding line that says: + consumed = pbuf.len; If final is NULL, consumed isn't updated by the call to (in all honesty, I didn't test under Windows so this particular function |
On 2008-08-02 16:49, Antoine Pitrou wrote:
Thanks for clarifying this. |
Here is a new patch wrapping up Martin's patch with the following additions:
The whole test suite now passes, which is a good start :-) |
The last beta is arriving soon. We need to go ahead on this, or retarget |
I'll look into it tomorrow. Regards, |
I have now committed the patch to 2.6 as r65654, adding changes for the I also decided to make the Py_buffer structure own its reference, as I |
I also started working on porting it to 3.0, but couldn't complete that |
Le mardi 12 août 2008 à 16:15 +0000, Martin v. Löwis a écrit :
I've seen your recent merge and I don't know if you have finished with I think we should drop the "base" member in PyMemoryViewObject, it has (if people want to get easily at the base object, we could provide be a By the way, perhaps PyBuffer_Release should set view->obj and view->buf What do you think? |
Indeed, I'm done, r65654. Unless there are actual bugs in these patches
I tried, and failed; feel free to submit a patch (as a new issue).
Ok, that's an open issue. Is the caller of FromMemory supposed to do To put it another way: view.obj has *not* been INCREFed. *view holds In any case, the corresponding DECREF *does* exist: in memory_dealloc,
That can be done; it's again a separate patch. |
I'm a bit confused. In the PyBuffer_Release implementation you
Ok. |
Oops, I meant to make the reference own by Py_buffer, but actually Now, when I try to merge that into the 3k branch, test_unicode terribly Regards, |
Le jeudi 14 août 2008 à 16:13 +0000, Martin v. Löwis a écrit :
If you don't have the time to work on it anymore, you can post the Regards Antoine. |
The patch is really trivial, and attached. |
Le jeudi 14 août 2008 à 18:52 +0000, Martin v. Löwis a écrit :
By the way, even without that patch, there is a memory leak: Python 3.0b2+ (py3k, Aug 14 2008, 20:49:19)
[GCC 4.3.1 20080626 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, codecs
>>> b = bytearray()
>>> sys.getrefcount(b)
2
>>> codecs.ascii_decode(memoryview(b))
('', 0)
>>> sys.getrefcount(b)
3 |
With the patch, this memory leak goes away. Regards, |
Le jeudi 14 août 2008 à 19:06 +0000, Martin v. Löwis a écrit :
But now: 30
>>> m = memoryview(b)
>>> sys.getrefcount(b)
32
>>> del m
>>> sys.getrefcount(b)
31 |
Sorry, the roundup e-mail interface ate some lines of code: >>> b = b''
>>> sys.getrefcount(b)
30
>>> m = memoryview(b)
>>> sys.getrefcount(b)
32
>>> del m
>>> sys.getrefcount(b)
31 It doesn't happen with bytearrays, so I suspect it's that you only >>> b = bytearray()
>>> sys.getrefcount(b)
2
>>> m = memoryview(b)
>>> sys.getrefcount(b)
4
>>> del m
>>> sys.getrefcount(b)
2 |
Thanks, that was indeed the problem; this is now fixed in r65683 and r65685. My initial observation that test_unicode leaks a lot of memory was test_unicode still leaks 6 references each time; one reference is leaked |
This seems to break test_unicode on MacOSX 10.5.4, test_unicode |
Le samedi 16 août 2008 à 05:50 +0000, Ismail Donmez a écrit :
Can you run Lib/test/test_unicode.py directly to know which test |
The latest py3k branch is fine now and the test passes. |
I'm sorry that I was unavailable for comment during July and August as I get the impression that most of the problems are related to objects I'm not convinced that Py_buffer should have grown a link to an object. I'm not sure where PyBuffer_Release came from. I can't find it in the |
Hi Travis, Glad you're back!
What consequences are you thinking about? Specifically, why shouldn't Py_buffer have a link to the object? It's (please note that link can be NULL if you don't want to have the
It's a replacement for PyObject_ReleaseBuffer(). Since a Py_buffer now |
Although the bug is fixed, the following three code segments seems (1)
escape_decode(PyObject *self,
PyObject *args)
{
...
const char *data;
...
if (!PyArg_ParseTuple(args, "s#|z:escape_decode",
&data, &size, &errors))
}
(2)
readbuffer_encode(PyObject *self,
PyObject *args)
{
const char *data;
...
if (!PyArg_ParseTuple(args, "s#|z:readbuffer_encode",
&data, &size, &errors))
...
}
(3)
charbuffer_encode(PyObject *self,
PyObject *args)
{
const char *data;
...
if (!PyArg_ParseTuple(args, "t#|z:charbuffer_encode",
&data, &size, &errors))
...
} Firstly, "char *data;" have been replaced by "Py_buffer pbuf;" in many I could be wrong about it. Does anyone have any opinions on the above |
In theory, yes. But it happens that the GIL is not released before the |
I am still a little bit confused. Can you explain a little more in |
A problem can only occur if you preserve a pointer to the buffer, |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: