multiprocessing.Pool hangs atexit (and garbage collection sometimes) #82682

larsoner · 2019-10-16T18:31:15Z

BPO	38501
Nosy	@cjrh, @stefanor, @pablogsal, @miss-islington, @larsoner
PRs	bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool #11488 bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing #19466 [3.8] bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466) #19467 [3.7] bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466) #19468

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-04-10.16:10:33.859>
created_at = <Date 2019-10-16.18:31:14.663>
labels = ['3.8', 'type-bug', 'library']
title = 'multiprocessing.Pool hangs atexit (and garbage collection sometimes)'
updated_at = <Date 2020-04-11.02:11:20.736>
user = 'https://github.com/larsoner'

bugs.python.org fields:

activity = <Date 2020-04-11.02:11:20.736>
actor = 'miss-islington'
assignee = 'none'
closed = True
closed_date = <Date 2020-04-10.16:10:33.859>
closer = 'pablogsal'
components = ['Library (Lib)']
creation = <Date 2019-10-16.18:31:14.663>
creator = 'larsoner'
dependencies = []
files = []
hgrepos = []
issue_num = 38501
keywords = ['patch']
message_count = 14.0
messages = ['354813', '354824', '361464', '366133', '366134', '366135', '366136', '366137', '366142', '366145', '366147', '366178', '366179', '366180']
nosy_count = 8.0
nosy_names = ['cjrh', 'stefanor', 'ralf.gommers', 'pablogsal', 'miss-islington', 'larsoner', 'Andrew Nelson', 'Paul Silisteanu']
pr_nums = ['11488', '19466', '19467', '19468']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue38501'
versions = ['Python 3.8']

larsoner · 2019-10-16T18:31:14Z

The following code hangs on Python 3.8.0.rc0 on Ubuntu 19.10 when exiting the interpreter:

from multiprocessing import Pool

class A(object):
    def __init__(self):
        self.pool = Pool(processes=2)

solver = A()

When you eventually do ctrl-C, the traceback is:

^CProcess ForkPoolWorker-2:
Error in atexit._run_exitfuncs:
Process ForkPoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/util.py", line 277, in _run_finalizers
    finalizer()
  File "/usr/lib/python3.8/multiprocessing/util.py", line 201, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 689, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 674, in _help_stuff_finish
    inqueue._rlock.acquire()
KeyboardInterrupt
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 356, in get
    res = self._reader.recv_bytes()
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

A similar problem seems to occur when garbage collecting when it's part of a class, and that class has not yet called self.pool.terminate() and self.pool.close(). Hopefully fixing the atexit behavior will similarly fix the gc.collect() behavior.

Cross-ref is in SciPy, scipy/scipy#10927. It appears to also cause hangs on Travis 3.8-dev builds: https://travis-ci.org/scipy/scipy/jobs/598786785

PaulSilisteanu · 2019-10-17T01:16:41Z

Same issue on Window 10 and macOS Catalina with Python 3.8.0.
Here is the error I get on Windows 10:

C:\Users\x>py -3.8
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import get_context
>>>
>>> class A(object):
...     def __init__(self):
...         self.pool = get_context(None).Pool(2)
...
>>> solver=A()
>>> quit()
Process SpawnPoolWorker-2:
Error in atexit._run_exitfuncs:
Process SpawnPoolWorker-1:
Traceback (most recent call last):
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 313, in _bootstrap
    self.run()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\queues.py", line 355, in get
    with self._rlock:
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
Traceback (most recent call last):
KeyboardInterrupt
Traceback (most recent call last):
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\util.py", line 277, in _run_finalizers
    finalizer()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\util.py", line 201, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 689, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 674, in _help_stuff_finish
    inqueue._rlock.acquire()
KeyboardInterrupt
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 313, in _bootstrap
    self.run()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\queues.py", line 356, in get
    res = self._reader.recv_bytes()
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\connection.py", line 305, in _recv_bytes
    waitres = _winapi.WaitForMultipleObjects(
KeyboardInterrupt

C:\Users\x>

stefanor · 2020-02-06T01:19:21Z

A bisect points at #11488 being the change that introduced this hang (bpo-35493)

stefanor · 2020-04-10T15:54:01Z

Looks like it was fixed by PR 19009 (bpo-39360)).

pablogsal · 2020-04-10T15:57:37Z

Looks like it was fixed by PR 19009 (bpo-39360)).

Can we close the issue then? If not, could yo provide a reproducer and also check if the big is fixed on master or 3.8 HEAD? Thanks!

stefanor · 2020-04-10T16:00:53Z

On Linux, the reproducer in https://bugs.python.org/issue38501#msg354813 fails on ac10e0c^ and passes on ac10e0c, which is why I say PR 19009 fixes it.

Not sure if there are any special considerations for Windows here.

HEAD is good too. So from my PoV, good to close this.

pablogsal · 2020-04-10T16:10:34Z

Not sure if there are any special considerations for Windows here.

It should not be anything windows specific here.

Also notice that although "it works", the reproducer in https://bugs.python.org/issue38501#msg354813 is out of contract because is not closing or terminating the pool correctly and is not using it with the context manager so if you are dealing with code equivalent to that I would suggest to take a look to properly finalize/finalize the pool to about similar problems (or other problems related to late finalization).

larsoner · 2020-04-10T16:31:07Z

If that's out of contract, perhaps there should probably a big, visible warning at the top of the multiprocessning docs stating that creating one of these objects requires either using a context manager or ensuring manual .close()ing?

It either used to be in contract not to close it manually or was wrongly represented, the first Python 2.7 example in the docs (https://docs.python.org/2/library/multiprocessing.html#introduction) is:

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(f, [1, 2, 3]))

So I suspect this (difficult to track down) problem might hit users without more adequate warning.

I'm surprised it's actually out of contract when the 3.8 docs state that close will be automatically called:

close()

Indicate that no more data will be put on this queue by the current process. The background thread will quit once it has flushed all buffered data to the pipe. This is called automatically when the queue is garbage collected.

and

terminate()

Stops the worker processes immediately without completing outstanding work. When the pool object is garbage collected terminate() will be called immediately.

Or perhaps I misunderstand what this is saying?

pablogsal · 2020-04-10T17:54:07Z

If that's out of contract, perhaps there should probably a big, visible warning at the top of the multiprocessning docs stating that creating one of these objects requires either using a context manager or ensuring manual .close()ing?

Why? This is a resource like any other and it requires proper resource management. Would you also put a big warning on "open()" stating that opening a file requires either using a context manager or ensure a manual close()?

the first Python 2.7 example in the docs

Python 2.7 is not supported and the pool has changed *a lot* since Python 2. For instance, the pool now does more correct resource management, it does not leak threads and it supports more safe mechanisms as a context manager. The reason it didn't hang that much in Python2.7 is likely because some threads were being leaked.

This is called automatically when the queue is garbage collected.

Yeah, and CPython does not promise that the __del__ method of any object will be called, so is not assured that the finalized will call close():

https://docs.python.org/3/reference/datamodel.html#object.\_\_del__

"It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits"

larsoner · 2020-04-10T18:21:32Z

Why? This is a resource like any other and it requires proper resource management. Would you also put a big warning on "open()" stating that opening a file requires either using a context manager or ensure a manual close()?

One potential reason would be that the consequences of bad resource management in this case are different than in the open() case, i.e., here the interpreter hangs -- or Travis runs for your repo (SciPy) get stuck with over-50-minute errors, which is how we started looking for how to track it down.

> the first Python 2.7 example in the docs

Python 2.7 is not supported and the pool has changed *a lot* since Python 2.

Indeed, my point is more about potential prevalence: this (now incorrect) problematic usage pattern was the first example in the docs for multiprocessing for a long time, indicating that there might be a lot of code in the wild that might (still) make use of it.

Yeah, and CPython does not promise that the __del__ method of any object will be called, so is not assured that the finalized will call close():

Ahh, I didn't know that. This might be another good reason to add a warning about the risks of not ensuring closure yourself: others might have the same gap in knowledge I had, and assume the docs are saying that things will be taken care of by garbage collection on exit when they will not.

But you of course cannot document every possible problematic situation users could put themselves in, so I leave it up to your judgment whether or not it's worth it in this case.

pablogsal · 2020-04-10T18:53:20Z

One potential reason would be that the consequences of bad resource management in this case are different than in the open() case, i.e., here the interpreter hangs -- or Travis runs for your repo (SciPy) get stuck with over-50-minute errors, which is how we started looking for how to track it down.

Agreed, but is the same problem: resources need to be managed. I am although ok if you want to add some targeted warning to the multiprocessing pool docs indicating what can happen if the resource is not properly managed.

Indeed, my point is more about potential prevalence: this (now incorrect) problematic usage pattern was the first example in the docs for multiprocessing for a long time, indicating that there might be a lot of code in the wild that might (still) make use of it.

Right, we put great efforts to make the code such that even incorrect usages do not hang (and believe me, is *very* challenging) but we cannot sacrifice correct usage fixes or big improvements so incorrect usages keep working even if they are leaking resources. Hopefully, more and more people start using the context manager or are aware that are doing something wrong leaking the pool.

----

In conclusion, I agree that maybe adding some targetted warning to the pool docs about this is in place. When I prepare the PR, would you like to review the message to confirm that is clear enough and that makes sense?

pablogsal · 2020-04-11T02:05:43Z

New changeset 7ec43a7 by Pablo Galindo in branch 'master':
bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466)
7ec43a7

miss-islington · 2020-04-11T02:11:19Z

New changeset 2e49b52 by Miss Islington (bot) in branch '3.7':
bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466)
2e49b52

miss-islington · 2020-04-11T02:11:21Z

New changeset ceba064 by Miss Islington (bot) in branch '3.8':
bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466)
ceba064

larsoner mannequin added 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Oct 16, 2019

pablogsal closed this as completed Apr 10, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiprocessing.Pool hangs atexit (and garbage collection sometimes) #82682

multiprocessing.Pool hangs atexit (and garbage collection sometimes) #82682

larsoner mannequin commented Oct 16, 2019

larsoner mannequin commented Oct 16, 2019

PaulSilisteanu mannequin commented Oct 17, 2019

stefanor mannequin commented Feb 6, 2020

stefanor mannequin commented Apr 10, 2020

pablogsal commented Apr 10, 2020

stefanor mannequin commented Apr 10, 2020

pablogsal commented Apr 10, 2020

larsoner mannequin commented Apr 10, 2020

pablogsal commented Apr 10, 2020

larsoner mannequin commented Apr 10, 2020

pablogsal commented Apr 10, 2020

pablogsal commented Apr 11, 2020

miss-islington commented Apr 11, 2020

miss-islington commented Apr 11, 2020

multiprocessing.Pool hangs atexit (and garbage collection sometimes) #82682

multiprocessing.Pool hangs atexit (and garbage collection sometimes) #82682

Comments

larsoner mannequin commented Oct 16, 2019

larsoner mannequin commented Oct 16, 2019

PaulSilisteanu mannequin commented Oct 17, 2019

stefanor mannequin commented Feb 6, 2020

stefanor mannequin commented Apr 10, 2020

pablogsal commented Apr 10, 2020

stefanor mannequin commented Apr 10, 2020

pablogsal commented Apr 10, 2020

larsoner mannequin commented Apr 10, 2020

pablogsal commented Apr 10, 2020

larsoner mannequin commented Apr 10, 2020

pablogsal commented Apr 10, 2020

pablogsal commented Apr 11, 2020

miss-islington commented Apr 11, 2020

miss-islington commented Apr 11, 2020