-
-
Notifications
You must be signed in to change notification settings - Fork 29.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiprocessing.Value() hangs #56561
Comments
Hi, My multithreaded application uses multithreading.Value() to ensure thread-safe operations on shared data. I did a gdb backtrace of the hanging process, and I discovered that the multiprocessing.head.py tries to acquire twice a same non recursive lock. The second aquire is done in the "free" function: I don't understand the link between these two method calls, so I am unable to write an easy script to reproduce the problem. I would say that some garbage collection was done within the malloc, which called the free. Python 2.6.5 |
Thanks for reporting this.
The obvious solution is to use a recursive lock instead. _alignment = 8
def __init__(self, size=mmap.PAGESIZE):
self._lastpid = os.getpid()
self._lock = threading.Lock()
""" to _alignment = 8
def __init__(self, size=mmap.PAGESIZE):
self._lastpid = os.getpid()
-> self._lock = threading.RLock()
""" One could probably reproduce this by allocating and freeing many multiprocessing.Values, preferably with a lower GC threshold. |
Note that it's not really a solution, just a workaround to avoid |
Hi, I also wonder about the performance cost of a recursive lock. I am still unable to reproduce the bug in a simple script. Looking in heap.py, in the malloc function: Thanks for your help 2011/6/21 Charles-François Natali <report@bugs.python.org>:
|
Yeah, but it calls _free, which runs unlocked. That's not the problem.
Try with this one: """ tab = []
for i in range(100000):
print(i)
b = multiprocessing.heap.BufferWrapper(10)
# create a circular reference (we want GC and not refcount collection when
# the block goes out of scope)
b.tab = tab
tab.append(b)
# drop buffers refcount to 0 to make them eligible to GC
if i % 100 == 0:
del tab
tab = []
""" It deadlocks pretty quickly (well, on my box). |
An alternative is to disable the garbage collector in malloc(): def malloc(self, size):
...
enabled = gc.isenabled()
if enabled:
# disable the garbage collector to avoid a deadlock if block
# is freed (if self.free() is called)
gc.disable()
try:
with self._lock:
size = self._roundup(max(size,1), self._alignment)
...
return block
finally:
if enabled:
gc.enable() gc.disable() and gc.enable() just set an internal flag and so should be fast. |
Patch (with test) attached. It disables the garbage collector inside
Another solution would be to make the Finalizer callback run lockless,
Suggestions welcome. |
Or you can combine your two ideas:
If free() is called indirectly from malloc() (by the garbage collector), free() adds the block to free in a "pending free" list. Pseudo code: ----------------------- def __init__(self):
...
self._pending_free = queue.Queue()
def _exec_pending_free(self):
while True:
try:
block = self._pending_free.get_nowait()
except queue.Empty:
break
self._free(block)
def free(self, block):
if self._lock.acquire(False):
self._exec_pending_free()
self._free(block)
else:
# malloc holds the lock
self._pending_free.append(block)
def malloc():
with self._lock:
self._malloc()
self._exec_pending_free() Problem: if self._pending_free.append() appends whereas malloc() already exited, the free will have to wait until the next call to malloc() or free(). I don't know if this case (free() called while malloc() is running, but malloc() exits before free()) really occur: this issue is a deadlock because free() is "called" from malloc(), and so malloc() waits until the free() is done. It might occur if the garbage collector calls free after _exec_pending_free() but before releasing the lock. |
[...]
_pending_free uses a lock internally to make it thread-safe, so I Anyway, this seems complicated and error-prone, disabling the gc seems |
You are probably right. Can't we use a lock-less list? list.append is
If two threads call free at same time, the "second" (taking the GIL)
I don't like touching such global "variable", but you are right. |
[...]
Well, I don't like it either, but I can't really think of any other solution. |
While not just something like: Lock-less lists are not strictly atomic (only on cPython), but I doubt
|
Here's a patch based on the second approach. |
heap_gc_deadlock_lockless.diff: _free_pending_blocks() and free() execute the following instructions in a different order, is it a problem? + self._free(block) vs + self._allocated_blocks.remove(block) You may call _free_pending_blocks() just after loack.acquire() and a second time before before lock.release()... it is maybe overkill, but it should reduce the probability of the delayed free problem. You may document that _pending_free_blocks.append() and _pending_free_blocks.pop() are atomic in CPython and don't need a specific lock. |
Oops, i skipped complelty your long comment explaining everything! It is enough. |
There are different technics to workaround this issue. My preferred is heap_gc_deadlock_lockless.diff because it has less border effect and have a well defined behaviour. |
Nice work! I also think heap_gc_deadlock_lockless.diff is good, except for Victor's reservation: is it deliberate that you reversed the following two statements in _free_pending_blocks(), compared to the code in free()? + self._free(block) |
No, it's not deliberate (it shouldn't have any impact since they're |
Updated patch. |
The last heap_gc_deadlock_lockless.diff looks good. Note: please try to use different filenames for different versions of the same patch. For example, add a number (heap_gc_deadlock_lockless-2.diff) to the name. |
+ if gc.isenabled(): It seems you won't restore the thresholds if the GC wasn't enabled at first. |
New changeset 96a0788583c6 by Charles-François Natali in branch '2.7': |
New changeset 874143242d79 by Charles-François Natali in branch '2.7': |
New changeset 0d4ca1e77205 by Charles-François Natali in branch '3.1': |
New changeset 37606505b227 by Charles-François Natali in branch '3.2': |
New changeset fd8dc3746992 by Charles-François Natali in branch 'default': |
The test passes on all the buildbots, closing. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: