Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytearrays are not thread safe #47389

Closed
amauryfa opened this issue Jun 19, 2008 · 64 comments
Closed

bytearrays are not thread safe #47389

amauryfa opened this issue Jun 19, 2008 · 64 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@amauryfa
Copy link
Member

BPO 3139
Nosy @malemburg, @loewis, @warsaw, @amauryfa, @pitrou, @benjaminp
Files
  • test_threaded_print.diff: Modification of test_queue that shows print problems
  • bzcrash.patch
  • s_star.diff
  • s_star2.patch
  • refcount.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-08-14.20:36:30.501>
    created_at = <Date 2008-06-19.10:06:53.362>
    labels = ['interpreter-core', 'type-crash', 'release-blocker']
    title = 'bytearrays are not thread safe'
    updated_at = <Date 2009-09-02.18:45:10.438>
    user = 'https://github.com/amauryfa'

    bugs.python.org fields:

    activity = <Date 2009-09-02.18:45:10.438>
    actor = 'loewis'
    assignee = 'none'
    closed = True
    closed_date = <Date 2008-08-14.20:36:30.501>
    closer = 'loewis'
    components = ['Interpreter Core']
    creation = <Date 2008-06-19.10:06:53.362>
    creator = 'amaury.forgeotdarc'
    dependencies = []
    files = ['10658', '10823', '11026', '11040', '11114']
    hgrepos = []
    issue_num = 3139
    keywords = ['patch']
    message_count = 64.0
    messages = ['68397', '68398', '68400', '69220', '69245', '69286', '69287', '69288', '69304', '69307', '69309', '69318', '69774', '70212', '70243', '70244', '70301', '70319', '70432', '70435', '70443', '70444', '70447', '70448', '70495', '70505', '70515', '70529', '70555', '70557', '70559', '70562', '70575', '70583', '70587', '70595', '70633', '70634', '70640', '70645', '71012', '71027', '71052', '71059', '71093', '71098', '71099', '71134', '71136', '71139', '71140', '71142', '71145', '71146', '71152', '71195', '71480', '71502', '71873', '71878', '92143', '92148', '92168', '92175']
    nosy_count = 10.0
    nosy_names = ['lemburg', 'loewis', 'barry', 'teoliphant', 'amaury.forgeotdarc', 'pitrou', 'donmez', 'benjamin.peterson', 'gpolo', 'boya']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue3139'
    versions = ['Python 2.6', 'Python 3.0']

    @amauryfa
    Copy link
    Member Author

    I found this problem when adding "print" statements to multi-threaded
    code. When applying the attached diff to a py3k installation, the output
    on screen always contains some garbage.

    The following code is an extract of fileio_write (in Modules/_fileio.c),
    but the same behavior appears everywhere:

    if (!PyArg_ParseTuple(args, "s#", &ptr, &n))
    	return NULL;
    
    Py_BEGIN_ALLOW_THREADS
    errno = 0;
    n = write(self-\>fd, ptr, n);
    Py_END_ALLOW_THREADS
    

    io.BufferedWriter calls this function with a bytearray.
    In this case, the GIL is released when holding a pointer to the
    bytearray data.
    But another thread may mutate the bytearray in between, the pointer
    becomes stale and can lead to segfaults or funny results.

    @amauryfa amauryfa added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Jun 19, 2008
    @gpolo
    Copy link
    Mannequin

    gpolo mannequin commented Jun 19, 2008

    Wasn't that always known ? Maybe I'm missing something here.

    @amauryfa
    Copy link
    Member Author

    The problem is not only about concurrent prints.
    It is about invalid pointer passed to a C function.
    Here is an example that reliably crashes the interpreter on my windows
    machine:

    import bz2, threading
    bz2c = bz2.BZ2Compressor()
    b = bytearray(b"a" * 1000000)
    def f():
        for x in range(10):
            b[:] = b""
            b[:] = bytearray(b"a" * 1000000)
    threading.Thread(target=f).start()
    for x in range(10):
        bz2c.compress(b)

    bz2c.compress is a slow function, that happens to accept bytearray and
    to release the GIL. If the other thread reallocates the bytearray,
    bz2c.compress will read invalid data.

    @amauryfa amauryfa changed the title print is not thread safe bytearrays are not thread safe Jun 24, 2008
    @pitrou
    Copy link
    Member

    pitrou commented Jul 3, 2008

    Wow...
    Isn't this kind of code supposed to ask for a buffer on the bytearray
    object, together with an optional lock on it (or something like that)?

    @amauryfa
    Copy link
    Member Author

    amauryfa commented Jul 4, 2008

    Probably, but this affects all PyArg_ParseTuple("s#") calls that release
    the GIL afterwards. How many of them are there?

    @pitrou
    Copy link
    Member

    pitrou commented Jul 5, 2008

    If I try to follow the chain the consequences:

    • all PyArg_ParseTuple("s#") calls that release the GIL afterwards
      should be re-written to use another API (which one I don't know exactly,
      but hopefully the appropriate functions are already provided by the
      buffer API); this applies to third-party extension modules as well
    • consequently, forward compatibility is broken in an important way
      (but it would probably be ok for py3k)
    • perhaps the bytearray type should not have been backported to 2.6, or
      perhaps it should carry a big warning in the documentation

    @pitrou
    Copy link
    Member

    pitrou commented Jul 5, 2008

    By the way, here's a more reliable way to make it crash (on my Linux
    machine):

    import bz2, threading
    
    bz2c = bz2.BZ2Compressor()
    # Span at least a whole arena (256KB long)
    junk_len = 512 * 1024
    junk = b"a" * junk_len
    buf = bytearray()
    
    for x in range(50):
        buf[:] = junk
        t = threading.Thread(target=bz2c.compress, args=[buf])
        t.start()
        buf[:] = b""
        t.join()

    @pitrou
    Copy link
    Member

    pitrou commented Jul 5, 2008

    Now I've just discovered there is the same problem with the
    array.array() type (see following code).

    import bz2, threading, array
    
    bz2c = bz2.BZ2Compressor()
    # Span at least a whole arena (256KB long)
    junk_len = 512 * 1024
    junk = array.array("c", b"a") * junk_len
    empty = array.array("c")
    buf = empty[:]
    
    for x in range(50):
        buf[:] = junk
        t = threading.Thread(target=bz2c.compress, args=[buf])
        t.start()
        buf[:] = empty
        t.join()

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jul 5, 2008

    I believe the 2.6 s# processing works correctly; the error is in the
    bytearray object. This either shouldn't support the buffer interface, or
    it shouldn't reallocate the buffer after having returned a pointer to it.

    For 3k, convertbuffer shouldn't call bf_releasebuffer; instead, the
    caller of ParseTuple somehow needs to release the buffer. As a
    consequence, s# should refuse buffer objects who have a bf_releasebuffer
    operation, and another code might get added to fill out a Py_buffer - or
    s# can get changed to always produce a Py_buffer (which would affect the
    code significantly).

    @pitrou
    Copy link
    Member

    pitrou commented Jul 5, 2008

    Le samedi 05 juillet 2008 à 22:20 +0000, Martin v. Löwis a écrit :

    Martin v. Löwis <martin@v.loewis.de> added the comment:

    I believe the 2.6 s# processing works correctly; the error is in the
    bytearray object. This either shouldn't support the buffer interface, or
    it shouldn't reallocate the buffer after having returned a pointer to it.

    getbuffer and releasebuffer exist in both 2.6 and 3.0, and bytearray
    supports those methods in both.

    As for array.array, it only implements old buffer API.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 5, 2008

    For reference, here is a proof-of-concept patch which shows how to fix
    the bytearray crasher above (it raises a BufferError instead).

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jul 6, 2008

    Fixing this in the methods themselves has the disadvantage that the
    error reporting is not that good anymore.

    @loewis loewis mannequin added the release-blocker label Jul 6, 2008
    @warsaw
    Copy link
    Member

    warsaw commented Jul 16, 2008

    Not blocking beta 2 because there's not enough time for the fix, but
    this will definitely block beta 3.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jul 24, 2008

    Here is a patch adding the s* format, and changing files, sockets, and
    fileio to use it. For bz2, the immediate effect is that you get a type
    error (as an object providing bf_releasebuffer cannot be converted
    through s#/w# anymore); it would be possible to fix bz2 also to use
    s*/w* instead.

    I'd like reviewers to focus on flaws in the approach and bugs in the
    implementation; existing code should be converted to the new API
    afterwards (or not converted at all for 2.6/3.0, but only as patches get
    contributed).

    If this is accepted in principle, I'll forward-port it to 3.0.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 25, 2008

    I haven't yet studied the patch in detail but I have a few questions:

    (1) are you sure it is safe not to INCREF the obj pointer in the
    Py_buffer? in the cases handled by your patch we still hold a reference
    to the original args tuple and dict before calling PyBuffer_Release(),
    but some functions may want to store the Py_buffer object somewhere to
    re-use it later. It would seem more logical for PyBuffer_FillInfo to
    INCREF the obj, and for PyBuffer_Release to DECREF it and set it to NULL.

    (2) is it necessary to call directly bf_getbuffer & the like or is there
    a higher-level API to do it?

    (3) didn't you forget to convert "PyArg_ParseTuple(args, "s#iO:sendto",
    [...])" in sock_sendto?

    (4) is it really necessary to do a special case with PyString_Check()
    rather than rely on the string type's getbuffer method?

    @pitrou
    Copy link
    Member

    pitrou commented Jul 25, 2008

    Travis, it would be really nice to have your input on this.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jul 26, 2008

    (1) are you sure it is safe not to INCREF the obj pointer in the
    Py_buffer?

    Yes, that's the semantics of the current buffer interface, and I cannot
    see a flaw in it. When you call getbuffer, you certainly hold a
    reference, and it is your obligation to hold onto this reference
    somehow. So it is definitely safe (if properly documented).

    It would seem more logical for PyBuffer_FillInfo to
    INCREF the obj, and for PyBuffer_Release to DECREF it and set it to NULL.

    Perhaps. I cannot quite see what undesirable consequences that might
    have - feel free to develop and test an alternative patch that takes
    that approach.

    (2) is it necessary to call directly bf_getbuffer & the like or is there
    a higher-level API to do it?

    There is PyObject_GetBuffer and PyObject_ReleaseBuffer, but it is not
    higher-level. I need to check the function pointers, anyway (to
    determine whether the object supports getbuffer and requires
    releasebuffer or not), so calling them directly is the right level
    of abstraction (IMO).

    (3) didn't you forget to convert "PyArg_ParseTuple(args, "s#iO:sendto",
    [...])" in sock_sendto?

    True.

    (4) is it really necessary to do a special case with PyString_Check()
    rather than rely on the string type's getbuffer method?

    That's what the code always did (for s#). It would be possible to
    eliminate that case, yes.

    @amauryfa
    Copy link
    Member Author

    With the patch the following script crashes the 2.6 interpreter on Windows:

    import sys, io, threading
    stdout2 = io.open(sys.stdout.fileno(), mode="w")
    def f(i):
        for i in range(10):
            stdout2.write(unicode((x, i)) + '\n')
    for x in range(10):
        t = threading.Thread(target=f, args=(x,))
        t.start()

    (with py3k, replace "stdout2.write" with a simple "print")

    @pitrou
    Copy link
    Member

    pitrou commented Jul 30, 2008

    The problem is that the fix for bpo-3295 was committed in the py3k branch
    (in r64751) rather thank on the trunk!
    Once PyExc_BufferError is defined properly the crash disappears and
    exceptions are printed instead.

    @benjaminp
    Copy link
    Contributor

    Sorry, that was my oversight! I've backported the fix.

    @amauryfa
    Copy link
    Member Author

    It's indeed better.
    Now with when running my previous script I can see the exception ;-)

    Exception in thread Thread-2:
    Traceback (most recent call last):
      File "C:\dev\python\trunk1\lib\threading.py", line 523, in
    __bootstrap_inner
        self.run()
      File "C:\dev\python\trunk1\lib\threading.py", line 478, in run
        self.__target(*self.__args, **self.__kwargs)
      File "<stdin>", line 3, in f
      File "C:\dev\python\trunk1\lib\io.py", line 1473, in write
        self.buffer.write(b)
      File "C:\dev\python\trunk1\lib\io.py", line 1041, in write
        self._write_buf.extend(b)
    BufferError: Existing exports of data: object cannot be re-sized

    Again, I think this is unfortunate for a simple script that prints from
    several threads.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 30, 2008

    Le mercredi 30 juillet 2008 à 23:03 +0000, Amaury Forgeot d'Arc a
    écrit :

    Again, I think this is unfortunate for a simple script that prints from
    several threads.

    Yes, but it's an issue with the BufferedWriter implementation, since it
    changes the length of its underlying bytearray object. If it was
    rewritten to use a fixed-size bytearray, the problem would probably
    disappear.

    (in the middle term BufferedReader and BufferedWriter should perhaps be
    both rewritten in C)

    @malemburg
    Copy link
    Member

    Two comments:

    • I like the new *-getarg parameters, but it would be better to test
      for '#' first since this is still by far the most used getarg
      parameter.

    • Antoine, I think your codecs.c patch has a glitch:

    +    decoded = PyUnicode_DecodeMBCSStateful(pbuf.buf, pbuf.len, errors,
    +					   final ? NULL : &consumed);
    +	PyBuffer_Release(&pbuf);
    +    if (decoded == NULL)
     	return NULL;
    -    return codec_tuple(decoded, final ? size : consumed);
    +    return codec_tuple(decoded, consumed);
     }
     
       You dropped the "final ? size : " for no apparent reason.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 2, 2008

    You've missed the preceding line that says:

    + consumed = pbuf.len;

    If final is NULL, consumed isn't updated by the call to
    PyUnicode_DecodeMBCSStateful and keeps its original value of pbuf.len.

    (in all honesty, I didn't test under Windows so this particular function
    wasn't enabled when I compiled and ran the test suite)

    @malemburg
    Copy link
    Member

    On 2008-08-02 16:49, Antoine Pitrou wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    You've missed the preceding line that says:

    • consumed = pbuf.len;

    If final is NULL, consumed isn't updated by the call to
    PyUnicode_DecodeMBCSStateful and keeps its original value of pbuf.len.

    (in all honesty, I didn't test under Windows so this particular function
    wasn't enabled when I compiled and ran the test suite)

    Thanks for clarifying this.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 2, 2008

    Here is a new patch wrapping up Martin's patch with the following additions:

    • getargs.c fixes
    • codecs module fixes
    • multiprocessing module fix
    • fileobject readinto fix
    • in bytearray deallocator, print out a SystemError if there are
      existing exports

    The whole test suite now passes, which is a good start :-)

    @pitrou
    Copy link
    Member

    pitrou commented Aug 11, 2008

    The last beta is arriving soon. We need to go ahead on this, or retarget
    it for 2.7/3.1.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 11, 2008

    The last beta is arriving soon. We need to go ahead on this, or retarget
    it for 2.7/3.1.

    I'll look into it tomorrow.

    Regards,
    Martin

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 12, 2008

    I have now committed the patch to 2.6 as r65654, adding changes for the
    bz2module.

    I also decided to make the Py_buffer structure own its reference, as I
    was running out of arguments why not to. In the process, I removed
    PyObject_ReleaseBuffer, as it is redundant and would have an unclear
    sematics (what if the object passed directly and the object passed
    indirectly were different?).

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 12, 2008

    I also started working on porting it to 3.0, but couldn't complete that
    port yet - the memoryview object doesn't play nicely.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 13, 2008

    Le mardi 12 août 2008 à 16:15 +0000, Martin v. Löwis a écrit :

    I also started working on porting it to 3.0, but couldn't complete that
    port yet - the memoryview object doesn't play nicely.

    I've seen your recent merge and I don't know if you have finished with
    it.

    I think we should drop the "base" member in PyMemoryViewObject, it has
    become redundant and confusing. There are some places in memoryobject.c
    where base seems mistakingly used instead of view.obj, e.g.
    PyMemoryView_FromMemory INCREFs view.obj, but memory_dealloc DECREFs
    base.
    Also, I don't understand why memory_getbuf INCREFs view.obj, there is no
    corresponding DECREF in memory_releasebuf and view.obj should already
    have been INCREFed anyway.

    (if people want to get easily at the base object, we could provide be a
    macro e.g. PyMemory_GET_BASE())

    By the way, perhaps PyBuffer_Release should set view->obj and view->buf
    to NULL (and view->len to -1?), it would be a simple way to signal that
    the buffer can't be used anymore.

    What do you think?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 13, 2008

    I've seen your recent merge and I don't know if you have finished with
    it.

    Indeed, I'm done, r65654. Unless there are actual bugs in these patches
    (in the sense that they don't fix the reported problem, or introduce new
    bugs), I'd like to close this issue.

    I think we should drop the "base" member in PyMemoryViewObject, it has
    become redundant and confusing.

    I tried, and failed; feel free to submit a patch (as a new issue).
    The tricky part is that base is sometimes used as a tuple.

    Also, I don't understand why memory_getbuf INCREFs view.obj, there is no
    corresponding DECREF in memory_releasebuf and view.obj should already
    have been INCREFed anyway.

    Ok, that's an open issue. Is the caller of FromMemory supposed to do
    Buffer_Release afterwards, or is ownership of the buffer transferred
    in the FromMemory call? (the issue didn't exist when the embedded
    reference was borrowed)

    To put it another way: view.obj has *not* been INCREFed. *view holds
    a reference, but that reference belongs to the caller, not to the memory
    object. Every time you initialize a PyObject* (such as view.obj), you
    need to INCREF, unless it's a borrowed reference.

    In any case, the corresponding DECREF *does* exist: in memory_dealloc,
    PyBuffer_Release is invoked, which releases the reference.

    By the way, perhaps PyBuffer_Release should set view->obj and view->buf
    to NULL (and view->len to -1?), it would be a simple way to signal that
    the buffer can't be used anymore.

    That can be done; it's again a separate patch.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 13, 2008

    In any case, the corresponding DECREF *does* exist: in memory_dealloc,
    PyBuffer_Release is invoked, which releases the reference.

    I'm a bit confused. In the PyBuffer_Release implementation you
    committed, there is no DECREF at all.

    > By the way, perhaps PyBuffer_Release should set view->obj and view->buf
    > to NULL (and view->len to -1?), it would be a simple way to signal that
    > the buffer can't be used anymore.

    That can be done; it's again a separate patch.

    Ok.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 14, 2008

    I'm a bit confused. In the PyBuffer_Release implementation you
    committed, there is no DECREF at all.

    Oops, I meant to make the reference own by Py_buffer, but actually
    forgot to implement that. Fixed in r65677, r65678.

    Now, when I try to merge that into the 3k branch, test_unicode terribly
    leaks memory :-( It's really frustrating.

    Regards,
    Martin

    @pitrou
    Copy link
    Member

    pitrou commented Aug 14, 2008

    Le jeudi 14 août 2008 à 16:13 +0000, Martin v. Löwis a écrit :

    Now, when I try to merge that into the 3k branch, test_unicode terribly
    leaks memory :-( It's really frustrating.

    If you don't have the time to work on it anymore, you can post the
    current patch here and I'll take a try.

    Regards

    Antoine.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 14, 2008

    The patch is really trivial, and attached.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 14, 2008

    Le jeudi 14 août 2008 à 18:52 +0000, Martin v. Löwis a écrit :

    The patch is really trivial, and attached.

    Added file: http://bugs.python.org/file11114/refcount.diff

    By the way, even without that patch, there is a memory leak:

    Python 3.0b2+ (py3k, Aug 14 2008, 20:49:19) 
    [GCC 4.3.1 20080626 (prerelease)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys, codecs
    >>> b = bytearray()
    >>> sys.getrefcount(b)
    2
    >>> codecs.ascii_decode(memoryview(b))
    ('', 0)
    >>> sys.getrefcount(b)
    3

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 14, 2008

    By the way, even without that patch, there is a memory leak:

    With the patch, this memory leak goes away.

    Regards,
    Martin

    @pitrou
    Copy link
    Member

    pitrou commented Aug 14, 2008

    Le jeudi 14 août 2008 à 19:06 +0000, Martin v. Löwis a écrit :

    Martin v. Löwis <martin@v.loewis.de> added the comment:

    > By the way, even without that patch, there is a memory leak:

    With the patch, this memory leak goes away.

    But now:

    30
    >>> m = memoryview(b)
    >>> sys.getrefcount(b)
    32
    >>> del m
    >>> sys.getrefcount(b)
    31

    @pitrou
    Copy link
    Member

    pitrou commented Aug 14, 2008

    Sorry, the roundup e-mail interface ate some lines of code:

    >>> b = b''
    >>> sys.getrefcount(b)
    30
    >>> m = memoryview(b)
    >>> sys.getrefcount(b)
    32
    >>> del m
    >>> sys.getrefcount(b)
    31

    It doesn't happen with bytearrays, so I suspect it's that you only
    DECREF when releasebuffer method exists:

    >>> b = bytearray()
    >>> sys.getrefcount(b)
    2
    >>> m = memoryview(b)
    >>> sys.getrefcount(b)
    4
    >>> del m
    >>> sys.getrefcount(b)
    2

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 14, 2008

    It doesn't happen with bytearrays, so I suspect it's that you only
    DECREF when releasebuffer method exists:

    Thanks, that was indeed the problem; this is now fixed in r65683 and r65685.

    My initial observation that test_unicode leaks a lot of memory was
    incorrect - it's rather that test_raiseMemError consumes all my
    memory (probably without leaking).

    test_unicode still leaks 6 references each time; one reference is leaked
    whenever a SyntaxError is raised. I'm not sure though whether this was
    caused by this patch, so I'll close this issue as fixed. Any further
    improvements should be done through separate patches (without my
    involvement, most likely).

    @loewis loewis mannequin closed this as completed Aug 14, 2008
    @donmez
    Copy link
    Mannequin

    donmez mannequin commented Aug 16, 2008

    This seems to break test_unicode on MacOSX 10.5.4,

    test_unicode
    python(80320,0xa0659fa0) malloc: *** mmap(size=2147483648) failed (error
    code=12)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug
    python(80320,0xa0659fa0) malloc: *** mmap(size=2147483648) failed (error
    code=12)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug

    @pitrou
    Copy link
    Member

    pitrou commented Aug 19, 2008

    Le samedi 16 août 2008 à 05:50 +0000, Ismail Donmez a écrit :

    This seems to break test_unicode on MacOSX 10.5.4,

    test_unicode
    python(80320,0xa0659fa0) malloc: *** mmap(size=2147483648) failed (error
    code=12)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug
    python(80320,0xa0659fa0) malloc: *** mmap(size=2147483648) failed (error
    code=12)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug

    Can you run Lib/test/test_unicode.py directly to know which test
    precisely crashes?

    @donmez
    Copy link
    Mannequin

    donmez mannequin commented Aug 20, 2008

    Can you run Lib/test/test_unicode.py directly to know which test
    precisely crashes?

    The latest py3k branch is fine now and the test passes.

    @teoliphant
    Copy link
    Mannequin

    teoliphant mannequin commented Aug 24, 2008

    I'm sorry that I was unavailable for comment during July and August as
    it looks like a lot of decisions were made that have changed the
    semantics a bit. I'm still trying to figure out why the decisions were
    made that were made.

    I get the impression that most of the problems are related to objects
    incorrectly managing their exported buffers, but there may be some
    semantic issues related to "t#" that were not conceived of during the
    many discussions surrounding the design of PEP-3118.

    I'm not convinced that Py_buffer should have grown a link to an object.
    I think this is a shortcut solution due to misuse of the protocol that
    may have unfortunate consequences.

    I'm not sure where PyBuffer_Release came from. I can't find it in the
    PEP and don't remember what it's purpose is. Did I add it or did
    somebody elese?

    @pitrou
    Copy link
    Member

    pitrou commented Aug 24, 2008

    Hi Travis,

    Glad you're back!

    I'm not convinced that Py_buffer should have grown a link to an object.
    I think this is a shortcut solution due to misuse of the protocol that
    may have unfortunate consequences.

    What consequences are you thinking about?

    Specifically, why shouldn't Py_buffer have a link to the object? It's
    the best way we've found to be able to release the buffer without having
    to keep a link to the originator ourselves. The concern is to simplify
    the API for most of its users. Especially, the new format codes ("s*" et
    al.) can just fill the Py_buffer rather than return several things at
    once.

    (please note that link can be NULL if you don't want to have the
    associated resource management)

    I'm not sure where PyBuffer_Release came from. I can't find it in the
    PEP and don't remember what it's purpose is.

    It's a replacement for PyObject_ReleaseBuffer(). Since a Py_buffer now
    has a link to its originator, there's no need to pass it separately to
    the releasing function.

    @boya
    Copy link
    Mannequin

    boya mannequin commented Sep 1, 2009

    Although the bug is fixed, the following three code segments seems
    suspicious in _codecsmodule.c in the latest revision 74624, and they are
    similar to the bug described here:

    (1)
    escape_decode(PyObject *self,
    	      PyObject *args)
    {
        ...
        const char *data;
        ...
        if (!PyArg_ParseTuple(args, "s#|z:escape_decode",
    			  &data, &size, &errors))
    }
    
    (2)
    readbuffer_encode(PyObject *self,
    		  PyObject *args)
    {
        const char *data;
        ...
        if (!PyArg_ParseTuple(args, "s#|z:readbuffer_encode",
    			  &data, &size, &errors))
        ...
    }
    
    (3)
    charbuffer_encode(PyObject *self,
    		  PyObject *args)
    {
        const char *data;
        ...
        if (!PyArg_ParseTuple(args, "t#|z:charbuffer_encode",
    			  &data, &size, &errors))
        ...
    }

    Firstly, "char *data;" have been replaced by "Py_buffer pbuf;" in many
    procedures in this file in the bug fix, but these code did not;
    Secondly, they uses "s#" or "t#" which should probably changed to "s*";

    I could be wrong about it. Does anyone have any opinions on the above
    code? Are they really buggy or am I misunderstanding anything?

    @amauryfa
    Copy link
    Member Author

    amauryfa commented Sep 2, 2009

    In theory, yes. But it happens that the GIL is not released before the
    buffer is used.

    @boya
    Copy link
    Mannequin

    boya mannequin commented Sep 2, 2009

    I am still a little bit confused. Can you explain a little more in
    detail? What is the difference between the suspicious code and the ones
    that are fixed?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Sep 2, 2009

    A problem can only occur if you preserve a pointer to the buffer,
    and the object gets a chance to change its buffer underneath. This
    can happen when there are user-defined callback, and when other
    threads can get control. In the cases being fixed, other threads
    *can* get control, as the GIL is released. In the cases you discuss,
    this cannot happen, since the GIL is not released, and no codepath
    can lead to buffer reallocation.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants