Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing.Value() hangs #56561

Closed
gregath mannequin opened this issue Jun 17, 2011 · 27 comments
Closed

multiprocessing.Value() hangs #56561

gregath mannequin opened this issue Jun 17, 2011 · 27 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@gregath
Copy link
Mannequin

gregath mannequin commented Jun 17, 2011

BPO 12352
Nosy @pitrou, @vstinner
Files
  • gdb_stack.txt: complete gdb stack of the hanging process
  • heap_gc_deadlock_lockless.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-07-03.09:40:37.713>
    created_at = <Date 2011-06-17.13:19:47.913>
    labels = ['type-bug']
    title = 'multiprocessing.Value() hangs'
    updated_at = <Date 2011-07-03.09:40:37.712>
    user = 'https://bugs.python.org/gregath'

    bugs.python.org fields:

    activity = <Date 2011-07-03.09:40:37.712>
    actor = 'neologix'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-07-03.09:40:37.713>
    closer = 'neologix'
    components = ['None']
    creation = <Date 2011-06-17.13:19:47.913>
    creator = 'greg.ath'
    dependencies = []
    files = ['22393', '22546']
    hgrepos = []
    issue_num = 12352
    keywords = ['patch']
    message_count = 27.0
    messages = ['138506', '138741', '138774', '138775', '138776', '138779', '139102', '139126', '139130', '139134', '139156', '139160', '139220', '139329', '139330', '139332', '139372', '139397', '139627', '139630', '139637', '139639', '139640', '139641', '139642', '139643', '139678']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'vstinner', 'neologix', 'python-dev', 'greg.ath']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue12352'
    versions = ['Python 2.6']

    @gregath
    Copy link
    Mannequin Author

    gregath mannequin commented Jun 17, 2011

    Hi,

    My multithreaded application uses multithreading.Value() to ensure thread-safe operations on shared data.
    For unexpected reasons, after some change in my code, the function will consistently hang.

    I did a gdb backtrace of the hanging process, and I discovered that the multiprocessing.head.py tries to acquire twice a same non recursive lock.
    The first aquire is done in the "malloc" function:
    #61 call_function (f=
    Frame 0x13be190, for file /usr/lib/python2.6/multiprocessing/heap.py, line 173, in malloc (self=<Heap(_stop_to_block={}, _lengths=[], _lock=<thread.lock at remote 0x7f00fc770eb8>, _allocated_blocks=set([...

    The second aquire is done in the "free" function:
    #3 0x00000000004a7c5e in call_function (f=
    Frame 0x1662d50, for file /usr/lib/python2.6/multiprocessing/heap.py, line 155, in free (self=<Heap(_stop_to_block={}, _lengths=[], _lock=<thread.lock at remote 0x7f00fc770eb8>, _allocated_blocks=set([...

    I don't understand the link between these two method calls, so I am unable to write an easy script to reproduce the problem. I would say that some garbage collection was done within the malloc, which called the free.

    Python 2.6.5
    Linux dev 2.6.32-25-server #45-Ubuntu SMP Sat Oct 16 20:06:58 UTC 2010 x86_64 GNU/Linux

    @gregath gregath mannequin added the type-bug An unexpected behavior, bug, or error label Jun 17, 2011
    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 20, 2011

    Thanks for reporting this.
    There's indeed a bug which can lead to this deadlock.
    Relevant code in Lib/multiprocessing/heap.py

    • the BufferWrapper class uses a single Heap() shared among instances, protected by a mutex (threading.Lock), from which blocks are allocated
    • when a BufferedWrapper is allocated, a multiprocessing.Finalizer is installed to free the corresponding block allocated from the Heap
    • if another BufferedWrapper is garbage collected while the mutex protecting the Heap is held (in your case, while a new BufferedWrapper is allocated), the corresponding finalizer will try to free the block from the heap
    • free tries to lock the mutex
    • deadlock

    The obvious solution is to use a recursive lock instead.
    Could you try your application after changing:
    """
    class Heap(object):

        _alignment = 8
    
        def __init__(self, size=mmap.PAGESIZE):     
            self._lastpid = os.getpid()     
            self._lock = threading.Lock()
    """

    to
    """
    class Heap(object):

        _alignment = 8
    
        def __init__(self, size=mmap.PAGESIZE):     
            self._lastpid = os.getpid()     
    ->      self._lock = threading.RLock()
    """

    One could probably reproduce this by allocating and freeing many multiprocessing.Values, preferably with a lower GC threshold.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 21, 2011

    The obvious solution is to use a recursive lock instead.

    Note that it's not really a solution, just a workaround to avoid
    deadlocks, become this might lead to a corruption if free is called
    while the heap is in an inconsistent state. I have to think some more
    about a final solution, but I'd like to check first that this is
    really what's happening here.

    @gregath
    Copy link
    Mannequin Author

    gregath mannequin commented Jun 21, 2011

    Hi,

    I also wonder about the performance cost of a recursive lock.

    I am still unable to reproduce the bug in a simple script. Looking
    closely to the gdb stack, there is that frame:
    Frame 0x13be190, for file /usr/lib/python2.6/multiprocessing/heap.py, line 173
    I understand that python reuses only the beginning of a memory block,
    so it frees the remaining of the block.
    I use both Value(c_int) and Value(c_double), which have different
    sizes. That may explain that behaviour.

    in heap.py, in the malloc function:
    167 self._lock.acquire()
    168 try:
    169 size = self._roundup(max(size,1), self._alignment)
    170 (arena, start, stop) = self._malloc(size)
    171 new_stop = start + size
    172 if new_stop < stop:
    173 self._free((arena, new_stop, stop))

    Thanks for your help

    2011/6/21 Charles-François Natali <report@bugs.python.org>:

    Charles-François Natali <neologix@free.fr> added the comment:

    > The obvious solution is to use a recursive lock instead.

    Note that it's not really a solution, just a workaround to avoid
    deadlocks, become this might lead to a corruption if free is called
    while the heap is in an inconsistent state. I have to think some more
    about a final solution, but I'd like to check first that this is
    really what's happening here.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue12352\>


    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 21, 2011

    Looking closely to the gdb stack, there is that frame:

    Yeah, but it calls _free, which runs unlocked. That's not the problem.

    I am still unable to reproduce the bug in a simple script.

    Try with this one:

    """
    import multiprocessing.heap

    tab = []
    
    for i in range(100000):
        print(i)
        b = multiprocessing.heap.BufferWrapper(10)
        # create a circular reference (we want GC and not refcount collection when
        # the block goes out of scope)
        b.tab = tab
        tab.append(b)
        # drop buffers refcount to 0 to make them eligible to GC
        if i % 100 == 0:
            del tab
            tab = []
    """

    It deadlocks pretty quickly (well, on my box).
    And, as expected, disabling/enabling the GC inside malloc solves the problem.
    I have to think a little bit more for a clean solution.

    @vstinner
    Copy link
    Member

    I also wonder about the performance cost of a recursive lock.

    An alternative is to disable the garbage collector in malloc():

        def malloc(self, size):
            ...
            enabled = gc.isenabled()
            if enabled:
                # disable the garbage collector to avoid a deadlock if block
                # is freed (if self.free() is called)
                gc.disable()
            try:
                with self._lock:
                    size = self._roundup(max(size,1), self._alignment)
                    ...
                    return block
            finally:
                if enabled:
                    gc.enable()

    gc.disable() and gc.enable() just set an internal flag and so should be fast.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 25, 2011

    Patch (with test) attached. It disables the garbage collector inside
    critical sections.
    Of course, if another thread re-enables the gc while the current
    thread is inside a critical section, things can break (the probability
    is really low, but who knows).
    I can't think of any satisfying solution, since it's tricky, because
    we don't want it to be thread-safe but reentrant, which is much more
    difficult.
    I've got another patch which does the following:

    • in free(), perform a trylock of the mutex
    • if the trylock fails, then create a new Finalizer to postpone the
      freeing of the same block to a later time, when the gc is called
    • the only problem is that I have to build a dummy reference cycle to
      pass it to Finalize if I want free() to be called by the GC later (and
      not when the object's reference count drops to 0, otherwise we would
      get an infinite recursion).

    Another solution would be to make the Finalizer callback run lockless,
    maybe just set add the block number to a list/set, and perform the
    freeing of pending blocks synchronously when malloc() is called (or
    the heap is finalized). There are two drawbacks to that:

    • adding an element to a set is not guaranteed to be thread-safe
      (well, it is under cPython because of the GIL)
    • freeing blocks synchronously means that the blocks won't be freed
      until malloc() is called (which could be never)

    Suggestions welcome.

    @vstinner
    Copy link
    Member

    Or you can combine your two ideas:

    in free(), perform a trylock of the mutex
    if the trylock fails, then create a new Finalizer to postpone the
    freeing of the same block to a later time,...
    ... perform the freeing of pending blocks synchronously
    when malloc() is called

    If free() is called indirectly from malloc() (by the garbage collector), free() adds the block to free in a "pending free" list. Pseudo code:

    -----------------------

    def __init__(self):
      ...
      self._pending_free = queue.Queue()
    
    def _exec_pending_free(self):
      while True:
        try:
          block = self._pending_free.get_nowait()
        except queue.Empty:
          break
        self._free(block)
    
    def free(self, block):
      if self._lock.acquire(False):
         self._exec_pending_free()
         self._free(block)
      else:
         # malloc holds the lock
         self._pending_free.append(block)
    
    def malloc():
      with self._lock:
        self._malloc()
        self._exec_pending_free()

    Problem: if self._pending_free.append() appends whereas malloc() already exited, the free will have to wait until the next call to malloc() or free(). I don't know if this case (free() called while malloc() is running, but malloc() exits before free()) really occur: this issue is a deadlock because free() is "called" from malloc(), and so malloc() waits until the free() is done. It might occur if the garbage collector calls free after _exec_pending_free() but before releasing the lock.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 25, 2011

    [...]

    def free(self, block):
     if self._lock.acquire(False):
        self._exec_pending_free()
        self._free(block)
     else:
        # malloc holds the lock
        self._pending_free.append(block)

    _pending_free uses a lock internally to make it thread-safe, so I
    think this will have exactly the same problem (the trylock can fail in
    case of contention or free() from multiple threads, thus we can't be
    sure that the else clause is executed on behalf of the garbage
    collector and it won't run while we're adding the block to
    _pending_free).

    Anyway, this seems complicated and error-prone, disabling the gc seems
    the most straightforward way to handle that.

    @vstinner
    Copy link
    Member

    _pending_free uses a lock internally to make it thread-safe, so I
    think this will have exactly the same problem

    You are probably right. Can't we use a lock-less list? list.append is
    atomic thanks to the GIL, isn't it? But I don't know how to implement
    the lock-less list consumer. It would be nice to have a function to
    remove and return the content of the list, an atomic "content=mylist[:];
    del mylist[:]" function.

    (the trylock can fail in case of contention or free() from multiple
    threads, thus we can't be
    sure that the else clause is executed on behalf of the garbage
    collector and it won't run while we're adding the block to
    _pending_free)

    If two threads call free at same time, the "second" (taking the GIL)
    will add the block to pending_free.

    Anyway, this seems complicated and error-prone, disabling the gc seems
    the most straightforward way to handle that.

    I don't like touching such global "variable", but you are right.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 26, 2011

    [...]

    I don't like touching such global "variable", but you are right.

    Well, I don't like it either, but I can't really think of any other solution.
    Antoine, any thought on that?

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 26, 2011

    You are probably right. Can't we use a lock-less list? list.append is
    atomic thanks to the GIL, isn't it? But I don't know how to implement
    the lock-less list consumer. It would be nice to have a function to
    remove and return the content of the list, an atomic "content=mylist[:];
    del mylist[:]" function.

    While not just something like:
    While True:
    try:
    block = list.pop()
    except IndexError:
    break
    _free(block)

    Lock-less lists are not strictly atomic (only on cPython), but I doubt
    that gc.disable() is available and works on every Python interpreter
    anyway...
    So the idea would be:

    • in free(), perform a trylock
    • if trylock fails, append the block to a list of pending blocks to free
    • if trylock succeeds, free the pending blocks and proceed as usual
      (do the same thing in malloc())

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 26, 2011

    Here's a patch based on the second approach.

    @vstinner
    Copy link
    Member

    heap_gc_deadlock_lockless.diff: _free_pending_blocks() and free() execute the following instructions in a different order, is it a problem?

    + self._free(block)
    + self._allocated_blocks.remove(block)

    vs

    + self._allocated_blocks.remove(block)
    + self._free(block)

    You may call _free_pending_blocks() just after loack.acquire() and a second time before before lock.release()... it is maybe overkill, but it should reduce the probability of the delayed free problem.

    You may document that _pending_free_blocks.append() and _pending_free_blocks.pop() are atomic in CPython and don't need a specific lock.

    @vstinner
    Copy link
    Member

    You may document that _pending_free_blocks.append()
    and _pending_free_blocks.pop() are atomic in CPython
    and don't need a specific lock.

    Oops, i skipped complelty your long comment explaining everything! It is enough.

    @vstinner
    Copy link
    Member

    There are different technics to workaround this issue. My preferred is heap_gc_deadlock_lockless.diff because it has less border effect and have a well defined behaviour.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 28, 2011

    Nice work! I also think heap_gc_deadlock_lockless.diff is good, except for Victor's reservation: is it deliberate that you reversed the following two statements in _free_pending_blocks(), compared to the code in free()?

    + self._free(block)
    + self._allocated_blocks.remove(block)

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 29, 2011

    Nice work! I also think heap_gc_deadlock_lockless.diff is good, except for Victor's reservation: is it deliberate that you reversed the following two statements in _free_pending_blocks(), compared to the code in free()?

    •            self._free(block)
    •            self._allocated_blocks.remove(block)

    No, it's not deliberate (it shouldn't have any impact since they're
    protected by the mutex though).
    As for calling _free_pending_blocks() a second time, I'm not sure
    that's necessary, I find the code simpler and cleaner that way.
    I'll provide a new patch in a couple days (no access to my development
    box right now).

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jul 2, 2011

    Updated patch.

    @vstinner
    Copy link
    Member

    vstinner commented Jul 2, 2011

    The last heap_gc_deadlock_lockless.diff looks good.

    Note: please try to use different filenames for different versions of the same patch. For example, add a number (heap_gc_deadlock_lockless-2.diff) to the name.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 2, 2011

    + if gc.isenabled():
    + thresholds = gc.get_threshold()
    + self.addCleanup(gc.set_threshold, *thresholds)
    + else:
    + gc.enable()
    + self.addCleanup(gc.disable)

    It seems you won't restore the thresholds if the GC wasn't enabled at first.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 2, 2011

    New changeset 96a0788583c6 by Charles-François Natali in branch '2.7':
    Issue bpo-12352: Fix a deadlock in multiprocessing.Heap when a block is freed by
    http://hg.python.org/cpython/rev/96a0788583c6

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 2, 2011

    New changeset 874143242d79 by Charles-François Natali in branch '2.7':
    Issue bpo-12352: In test_free_from_gc(), restore the GC thresholds even if the GC
    http://hg.python.org/cpython/rev/874143242d79

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 2, 2011

    New changeset 0d4ca1e77205 by Charles-François Natali in branch '3.1':
    Issue bpo-12352: Fix a deadlock in multiprocessing.Heap when a block is freed by
    http://hg.python.org/cpython/rev/0d4ca1e77205

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 2, 2011

    New changeset 37606505b227 by Charles-François Natali in branch '3.2':
    Merge issue bpo-12352: Fix a deadlock in multiprocessing.Heap when a block is
    http://hg.python.org/cpython/rev/37606505b227

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 2, 2011

    New changeset fd8dc3746992 by Charles-François Natali in branch 'default':
    Merge issue bpo-12352: Fix a deadlock in multiprocessing.Heap when a block is
    http://hg.python.org/cpython/rev/fd8dc3746992

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jul 3, 2011

    The test passes on all the buildbots, closing.
    greg, thanks for reporting this!

    @neologix neologix mannequin closed this as completed Jul 3, 2011
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants