classification
Title: multiprocessing.Pool hangs atexit (and garbage collection sometimes)
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Andrew Nelson, Paul Silisteanu, cjrh, larsoner, miss-islington, pablogsal, ralf.gommers, stefanor
Priority: normal Keywords: patch

Created on 2019-10-16 18:31 by larsoner, last changed 2020-04-11 02:11 by miss-islington. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 11488 pablogsal, 2020-04-10 11:03
PR 19466 merged pablogsal, 2020-04-11 00:52
PR 19467 merged miss-islington, 2020-04-11 02:05
PR 19468 merged miss-islington, 2020-04-11 02:05
Messages (14)
msg354813 - (view) Author: Eric Larson (larsoner) * Date: 2019-10-16 18:31
The following code hangs on Python 3.8.0.rc0 on Ubuntu 19.10 when exiting the interpreter:


from multiprocessing import Pool

class A(object):
    def __init__(self):
        self.pool = Pool(processes=2)

solver = A()


When you eventually do ctrl-C, the traceback is:


^CProcess ForkPoolWorker-2:
Error in atexit._run_exitfuncs:
Process ForkPoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/util.py", line 277, in _run_finalizers
    finalizer()
  File "/usr/lib/python3.8/multiprocessing/util.py", line 201, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 689, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 674, in _help_stuff_finish
    inqueue._rlock.acquire()
KeyboardInterrupt
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 356, in get
    res = self._reader.recv_bytes()
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt


A similar problem seems to occur when garbage collecting when it's part of a class, and that class has not yet called `self.pool.terminate()` and `self.pool.close()`. Hopefully fixing the `atexit` behavior will similarly fix the `gc.collect()` behavior.

Cross-ref is in SciPy, https://github.com/scipy/scipy/issues/10927. It appears to also cause hangs on Travis 3.8-dev builds: https://travis-ci.org/scipy/scipy/jobs/598786785
msg354824 - (view) Author: Paul Silisteanu (Paul Silisteanu) Date: 2019-10-17 01:16
Same issue on Window 10 and macOS Catalina with Python 3.8.0.
Here is the error I get on Windows 10:

C:\Users\x>py -3.8
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import get_context
>>>
>>> class A(object):
...     def __init__(self):
...         self.pool = get_context(None).Pool(2)
...
>>> solver=A()
>>> quit()
Process SpawnPoolWorker-2:
Error in atexit._run_exitfuncs:
Process SpawnPoolWorker-1:
Traceback (most recent call last):
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 313, in _bootstrap
    self.run()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\queues.py", line 355, in get
    with self._rlock:
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
Traceback (most recent call last):
KeyboardInterrupt
Traceback (most recent call last):
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\util.py", line 277, in _run_finalizers
    finalizer()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\util.py", line 201, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 689, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 674, in _help_stuff_finish
    inqueue._rlock.acquire()
KeyboardInterrupt
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 313, in _bootstrap
    self.run()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\queues.py", line 356, in get
    res = self._reader.recv_bytes()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\connection.py", line 305, in _recv_bytes
    waitres = _winapi.WaitForMultipleObjects(
KeyboardInterrupt

C:\Users\x>
msg361464 - (view) Author: Stefano Rivera (stefanor) * Date: 2020-02-06 01:19
A bisect points at https://github.com/python/cpython/pull/11488 being the change that introduced this hang (bpo-35493)
msg366133 - (view) Author: Stefano Rivera (stefanor) * Date: 2020-04-10 15:54
Looks like it was fixed by PR 19009 (bpo-39360)).
msg366134 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-04-10 15:57
>Looks like it was fixed by PR 19009 (bpo-39360)).

Can we close the issue then? If not, could yo provide a reproducer and also check if the big is fixed on master or 3.8 HEAD? Thanks!
msg366135 - (view) Author: Stefano Rivera (stefanor) * Date: 2020-04-10 16:00
On Linux, the reproducer in https://bugs.python.org/issue38501#msg354813 fails on ac10e0c93218627d1a639db0b7b41714c5f6a883^ and passes on ac10e0c93218627d1a639db0b7b41714c5f6a883, which is why I say PR 19009 fixes it.

Not sure if there are any special considerations for Windows here.

HEAD is good too. So from my PoV, good to close this.
msg366136 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-04-10 16:10
> Not sure if there are any special considerations for Windows here.

It should not be anything windows specific here.

Also notice that although "it works", the reproducer in https://bugs.python.org/issue38501#msg354813 is out of contract because is not closing or terminating the pool correctly and is not using it with the context manager so if you are dealing with code equivalent to that I would suggest to take a look to properly finalize/finalize the pool to about similar problems (or other problems related to late finalization).
msg366137 - (view) Author: Eric Larson (larsoner) * Date: 2020-04-10 16:31
If that's out of contract, perhaps there should probably a big, visible warning at the top of the multiprocessning docs stating that creating one of these objects requires either using a context manager or ensuring manual `.close()`ing?

1. It either used to be in contract not to close it manually or was wrongly represented, the first Python 2.7 example in the docs (https://docs.python.org/2/library/multiprocessing.html#introduction) is:

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(f, [1, 2, 3]))

So I suspect this (difficult to track down) problem might hit users without more adequate warning.


2. I'm surprised it's actually out of contract when the 3.8 docs state that close will be automatically called:

> close()
>
> Indicate that no more data will be put on this queue by the current process. The background thread will quit once it has flushed all buffered data to the pipe. This is called automatically when the queue is garbage collected.

and

> terminate()
>
> Stops the worker processes immediately without completing outstanding work. When the pool object is garbage collected terminate() will be called immediately.

Or perhaps I misunderstand what this is saying?
msg366142 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-04-10 17:54
> If that's out of contract, perhaps there should probably a big, visible warning at the top of the multiprocessning docs stating that creating one of these objects requires either using a context manager or ensuring manual `.close()`ing?

Why? This is a resource like any other and it requires proper resource management. Would you also put a big warning on "open()" stating that opening a file requires either using a context manager or ensure a manual close()?

>  the first Python 2.7 example in the docs 

Python 2.7 is not supported and the pool has changed *a lot* since Python 2. For instance, the pool now does more correct resource management, it does not leak threads and it supports more safe mechanisms as a context manager. The reason it didn't hang that much in Python2.7 is likely because some threads were being leaked.

> This is called automatically when the queue is garbage collected.

Yeah, and CPython does not promise that the __del__ method of any object will be called, so is not assured that the finalized will call close():

https://docs.python.org/3/reference/datamodel.html#object.__del__

"It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits"
msg366145 - (view) Author: Eric Larson (larsoner) * Date: 2020-04-10 18:21
> Why? This is a resource like any other and it requires proper resource management. Would you also put a big warning on "open()" stating that opening a file requires either using a context manager or ensure a manual close()?

One potential reason would be that the consequences of bad resource management in this case are different than in the open() case, i.e., here the interpreter hangs -- or Travis runs for your repo (SciPy) get stuck with over-50-minute errors, which is how we started looking for how to track it down.

> > the first Python 2.7 example in the docs 

> Python 2.7 is not supported and the pool has changed *a lot* since Python 2. 

Indeed, my point is more about potential prevalence: this (now incorrect) problematic usage pattern was the first example in the docs for multiprocessing for a long time, indicating that there might be a lot of code in the wild that might (still) make use of it.

> Yeah, and CPython does not promise that the __del__ method of any object will be called, so is not assured that the finalized will call close():

Ahh, I didn't know that. This might be another good reason to add a warning about the risks of not ensuring closure yourself: others might have the same gap in knowledge I had, and assume the docs are saying that things will be taken care of by garbage collection on exit when they will not.

But you of course cannot document every possible problematic situation users could put themselves in, so I leave it up to your judgment whether or not it's worth it in this case.
msg366147 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-04-10 18:53
> One potential reason would be that the consequences of bad resource management in this case are different than in the open() case, i.e., here the interpreter hangs -- or Travis runs for your repo (SciPy) get stuck with over-50-minute errors, which is how we started looking for how to track it down.

Agreed, but is the same problem: resources need to be managed. I am although ok if you want to add some targeted warning to the multiprocessing pool docs indicating what can happen if the resource is not properly managed.

> Indeed, my point is more about potential prevalence: this (now incorrect) problematic usage pattern was the first example in the docs for multiprocessing for a long time, indicating that there might be a lot of code in the wild that might (still) make use of it.

Right, we put great efforts to make the code such that even incorrect usages do not hang (and believe me, is *very* challenging) but we cannot sacrifice correct usage fixes or big improvements so incorrect usages keep working even if they are leaking resources. Hopefully, more and more people start using the context manager or are aware that are doing something wrong leaking the pool.

----

In conclusion, I agree that maybe adding some targetted warning to the pool docs about this is in place. When I prepare the PR, would you like to review the message to confirm that is clear enough and that makes sense?
msg366178 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-04-11 02:05
New changeset 7ec43a73092d43c6c95e7dd2669f49d54b57966f by Pablo Galindo in branch 'master':
bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466)
https://github.com/python/cpython/commit/7ec43a73092d43c6c95e7dd2669f49d54b57966f
msg366179 - (view) Author: miss-islington (miss-islington) Date: 2020-04-11 02:11
New changeset 2e49b5254aa0888dd21a7929be14b6cfe03d56ef by Miss Islington (bot) in branch '3.7':
bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466)
https://github.com/python/cpython/commit/2e49b5254aa0888dd21a7929be14b6cfe03d56ef
msg366180 - (view) Author: miss-islington (miss-islington) Date: 2020-04-11 02:11
New changeset ceba0648d70524ec4ebb7a46d5d1162d4938c7ba by Miss Islington (bot) in branch '3.8':
bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466)
https://github.com/python/cpython/commit/ceba0648d70524ec4ebb7a46d5d1162d4938c7ba
History
Date User Action Args
2020-04-11 02:11:20miss-islingtonsetmessages: + msg366180
2020-04-11 02:11:19miss-islingtonsetmessages: + msg366179
2020-04-11 02:05:59miss-islingtonsetpull_requests: + pull_request18823
2020-04-11 02:05:52miss-islingtonsetnosy: + miss-islington

pull_requests: + pull_request18822
2020-04-11 02:05:43pablogsalsetmessages: + msg366178
2020-04-11 00:52:59pablogsalsetpull_requests: + pull_request18821
2020-04-10 18:53:20pablogsalsetmessages: + msg366147
2020-04-10 18:21:32larsonersetmessages: + msg366145
2020-04-10 17:54:06pablogsalsetmessages: + msg366142
2020-04-10 16:31:07larsonersetmessages: + msg366137
2020-04-10 16:10:33pablogsalsetstatus: open -> closed
resolution: fixed
messages: + msg366136

stage: patch review -> resolved
2020-04-10 16:00:53stefanorsetmessages: + msg366135
2020-04-10 15:57:36pablogsalsetmessages: + msg366134
2020-04-10 15:54:00stefanorsetmessages: + msg366133
2020-04-10 11:03:43pablogsalsetkeywords: + patch
nosy: + pablogsal

pull_requests: + pull_request18808
stage: patch review
2020-04-09 01:52:45Andrew Nelsonsetnosy: + Andrew Nelson
2020-02-06 01:19:21stefanorsetnosy: + stefanor
messages: + msg361464
2019-11-09 14:58:29ralf.gommerssetnosy: + ralf.gommers
2019-10-19 12:27:36cjrhsetnosy: + cjrh
2019-10-17 01:16:41Paul Silisteanusetnosy: + Paul Silisteanu
messages: + msg354824
2019-10-16 18:31:14larsonercreate