New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiprocessing.Pool hangs atexit (and garbage collection sometimes) #82682
Comments
The following code hangs on Python 3.8.0.rc0 on Ubuntu 19.10 when exiting the interpreter: from multiprocessing import Pool
class A(object):
def __init__(self):
self.pool = Pool(processes=2)
solver = A() When you eventually do ctrl-C, the traceback is: ^CProcess ForkPoolWorker-2:
Error in atexit._run_exitfuncs:
Process ForkPoolWorker-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/util.py", line 277, in _run_finalizers
finalizer()
File "/usr/lib/python3.8/multiprocessing/util.py", line 201, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 689, in _terminate_pool
cls._help_stuff_finish(inqueue, task_handler, len(pool))
File "/usr/lib/python3.8/multiprocessing/pool.py", line 674, in _help_stuff_finish
inqueue._rlock.acquire()
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 114, in worker
task = get()
File "/usr/lib/python3.8/multiprocessing/queues.py", line 355, in get
with self._rlock:
File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 114, in worker
task = get()
File "/usr/lib/python3.8/multiprocessing/queues.py", line 356, in get
res = self._reader.recv_bytes()
File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt A similar problem seems to occur when garbage collecting when it's part of a class, and that class has not yet called Cross-ref is in SciPy, scipy/scipy#10927. It appears to also cause hangs on Travis 3.8-dev builds: https://travis-ci.org/scipy/scipy/jobs/598786785 |
Same issue on Window 10 and macOS Catalina with Python 3.8.0. C:\Users\x>py -3.8
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import get_context
>>>
>>> class A(object):
... def __init__(self):
... self.pool = get_context(None).Pool(2)
...
>>> solver=A()
>>> quit()
Process SpawnPoolWorker-2:
Error in atexit._run_exitfuncs:
Process SpawnPoolWorker-1:
Traceback (most recent call last):
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 313, in _bootstrap
self.run()
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\queues.py", line 355, in get
with self._rlock:
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
Traceback (most recent call last):
KeyboardInterrupt
Traceback (most recent call last):
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\util.py", line 277, in _run_finalizers
finalizer()
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\util.py", line 201, in __call__
res = self._callback(*self._args, **self._kwargs)
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 689, in _terminate_pool
cls._help_stuff_finish(inqueue, task_handler, len(pool))
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 674, in _help_stuff_finish
inqueue._rlock.acquire()
KeyboardInterrupt
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 313, in _bootstrap
self.run()
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\queues.py", line 356, in get
res = self._reader.recv_bytes()
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\connection.py", line 305, in _recv_bytes
waitres = _winapi.WaitForMultipleObjects(
KeyboardInterrupt C:\Users\x> |
Can we close the issue then? If not, could yo provide a reproducer and also check if the big is fixed on master or 3.8 HEAD? Thanks! |
On Linux, the reproducer in https://bugs.python.org/issue38501#msg354813 fails on ac10e0c^ and passes on ac10e0c, which is why I say PR 19009 fixes it. Not sure if there are any special considerations for Windows here. HEAD is good too. So from my PoV, good to close this. |
It should not be anything windows specific here. Also notice that although "it works", the reproducer in https://bugs.python.org/issue38501#msg354813 is out of contract because is not closing or terminating the pool correctly and is not using it with the context manager so if you are dealing with code equivalent to that I would suggest to take a look to properly finalize/finalize the pool to about similar problems (or other problems related to late finalization). |
If that's out of contract, perhaps there should probably a big, visible warning at the top of the multiprocessning docs stating that creating one of these objects requires either using a context manager or ensuring manual
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3])) So I suspect this (difficult to track down) problem might hit users without more adequate warning.
and
Or perhaps I misunderstand what this is saying? |
Why? This is a resource like any other and it requires proper resource management. Would you also put a big warning on "open()" stating that opening a file requires either using a context manager or ensure a manual close()?
Python 2.7 is not supported and the pool has changed *a lot* since Python 2. For instance, the pool now does more correct resource management, it does not leak threads and it supports more safe mechanisms as a context manager. The reason it didn't hang that much in Python2.7 is likely because some threads were being leaked.
Yeah, and CPython does not promise that the __del__ method of any object will be called, so is not assured that the finalized will call close(): https://docs.python.org/3/reference/datamodel.html#object.\_\_del__ "It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits" |
One potential reason would be that the consequences of bad resource management in this case are different than in the open() case, i.e., here the interpreter hangs -- or Travis runs for your repo (SciPy) get stuck with over-50-minute errors, which is how we started looking for how to track it down.
Indeed, my point is more about potential prevalence: this (now incorrect) problematic usage pattern was the first example in the docs for multiprocessing for a long time, indicating that there might be a lot of code in the wild that might (still) make use of it.
Ahh, I didn't know that. This might be another good reason to add a warning about the risks of not ensuring closure yourself: others might have the same gap in knowledge I had, and assume the docs are saying that things will be taken care of by garbage collection on exit when they will not. But you of course cannot document every possible problematic situation users could put themselves in, so I leave it up to your judgment whether or not it's worth it in this case. |
Agreed, but is the same problem: resources need to be managed. I am although ok if you want to add some targeted warning to the multiprocessing pool docs indicating what can happen if the resource is not properly managed.
Right, we put great efforts to make the code such that even incorrect usages do not hang (and believe me, is *very* challenging) but we cannot sacrifice correct usage fixes or big improvements so incorrect usages keep working even if they are leaking resources. Hopefully, more and more people start using the context manager or are aware that are doing something wrong leaking the pool. ---- In conclusion, I agree that maybe adding some targetted warning to the pool docs about this is in place. When I prepare the PR, would you like to review the message to confirm that is clear enough and that makes sense? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: