Issue 42963: [multiprocessing] Calling pool.terminate() from an error_callback causes deadlock

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/87129

classification

Title:	[multiprocessing] Calling pool.terminate() from an error_callback causes deadlock
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.7, Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	sjelin
Priority:	normal	Keywords:

Created on 2021-01-19 01:08 by sjelin, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg385240 - (view)	Author: Sammy Jelin (sjelin)	Date: 2021-01-19 01:08
As the title says, calling `pool.terminate()` inside an `error_callback` handler causes a deadlock. The deadlock always happens, so it's not a race condition or thread-safety issue. Simple repro: ``` from multiprocessing import Pool p = Pool() def error_callback(x): print(f'error: {x!r}') p.terminate() print('this message is never seen, because p.termiante() deadlocks') p.apply_async(lambda: None, error_callback=error_callback) # The following lines are technically aren't threadsafe, # but I manually verified that that wasn't the problem. p.close() p.join() print('this is also never seen, because the task handler is stuck in the deadlock') ``` This will print the following line and then hang: ``` error: PicklingError("Can't pickle <function <lambda> at 0x112c55e18>: attribute lookup <lambda> on __main__ failed") ``` The hanging happens inside `_help_stuff_finish`, when we call `inqueue._rlock.acquire()`. As far as I can tell, `_handle_tasks` is already holding the lock when it calls the error_callback, so when `_terminate_pool` calls `_help_stuff_finish` a deadlock occurs. Calling `p.terminate()` from a success callback doesn't appear to be an issue, likely because success callbacks are called via `_handle_results` instead of via `_handle_tasks`. If calling `p.terminate()` from an error_callback isn't supported, that should be one of those big red warnings in the documentation. I verified that this bug happens on 3.6 and 3.7, a skimmed the code on the github project to verify that it was likely still an issue.

msg385240 - (view)

Author: Sammy Jelin (sjelin)

Date: 2021-01-19 01:08

As the title says, calling `pool.terminate()` inside an `error_callback` handler causes a deadlock.  The deadlock always happens, so it's not a race condition or thread-safety issue.

Simple repro:

```
from multiprocessing import Pool
p = Pool()

def error_callback(x):
    print(f'error: {x!r}')
    p.terminate()
    print('this message is never seen, because p.termiante() deadlocks')

p.apply_async(lambda: None, error_callback=error_callback)

# The following lines are technically aren't threadsafe,
# but I manually verified that that wasn't the problem.
p.close()
p.join()
print('this is also never seen, because the task handler is stuck in the deadlock')
```

This will print the following line and then hang:
```
error: PicklingError("Can't pickle <function <lambda> at 0x112c55e18>: attribute lookup <lambda> on __main__ failed")
```

The hanging happens inside `_help_stuff_finish`, when we call `inqueue._rlock.acquire()`.  As far as I can tell, `_handle_tasks` is already holding the lock when it calls the error_callback, so when `_terminate_pool` calls `_help_stuff_finish` a deadlock occurs.

Calling `p.terminate()` from a success callback doesn't appear to be an issue, likely because success callbacks are called via `_handle_results` instead of via `_handle_tasks`.

If calling `p.terminate()` from an error_callback isn't supported, that should be one of those big red warnings in the documentation.

I verified that this bug happens on 3.6 and 3.7, a skimmed the code on the github project to verify that it was likely still an issue.

History
Date	User	Action	Args
2022-04-11 14:59:40	admin	set	github: 87129
2021-01-19 01:10:42	sjelin	set	type: behavior
2021-01-19 01:08:51	sjelin	create