Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing.Pool hangs atexit (and garbage collection sometimes) #82682

Closed
larsoner mannequin opened this issue Oct 16, 2019 · 14 comments
Closed

multiprocessing.Pool hangs atexit (and garbage collection sometimes) #82682

larsoner mannequin opened this issue Oct 16, 2019 · 14 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@larsoner
Copy link
Mannequin

larsoner mannequin commented Oct 16, 2019

BPO 38501
Nosy @cjrh, @stefanor, @pablogsal, @miss-islington, @larsoner
PRs
  • bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool #11488
  • bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing #19466
  • [3.8] bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466) #19467
  • [3.7] bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466) #19468
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-04-10.16:10:33.859>
    created_at = <Date 2019-10-16.18:31:14.663>
    labels = ['3.8', 'type-bug', 'library']
    title = 'multiprocessing.Pool hangs atexit (and garbage collection sometimes)'
    updated_at = <Date 2020-04-11.02:11:20.736>
    user = 'https://github.com/larsoner'

    bugs.python.org fields:

    activity = <Date 2020-04-11.02:11:20.736>
    actor = 'miss-islington'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-04-10.16:10:33.859>
    closer = 'pablogsal'
    components = ['Library (Lib)']
    creation = <Date 2019-10-16.18:31:14.663>
    creator = 'larsoner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38501
    keywords = ['patch']
    message_count = 14.0
    messages = ['354813', '354824', '361464', '366133', '366134', '366135', '366136', '366137', '366142', '366145', '366147', '366178', '366179', '366180']
    nosy_count = 8.0
    nosy_names = ['cjrh', 'stefanor', 'ralf.gommers', 'pablogsal', 'miss-islington', 'larsoner', 'Andrew Nelson', 'Paul Silisteanu']
    pr_nums = ['11488', '19466', '19467', '19468']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue38501'
    versions = ['Python 3.8']

    @larsoner
    Copy link
    Mannequin Author

    larsoner mannequin commented Oct 16, 2019

    The following code hangs on Python 3.8.0.rc0 on Ubuntu 19.10 when exiting the interpreter:

    from multiprocessing import Pool
    
    class A(object):
        def __init__(self):
            self.pool = Pool(processes=2)
    
    solver = A()

    When you eventually do ctrl-C, the traceback is:

    ^CProcess ForkPoolWorker-2:
    Error in atexit._run_exitfuncs:
    Process ForkPoolWorker-1:
    Traceback (most recent call last):
      File "/usr/lib/python3.8/multiprocessing/util.py", line 277, in _run_finalizers
        finalizer()
      File "/usr/lib/python3.8/multiprocessing/util.py", line 201, in __call__
        res = self._callback(*self._args, **self._kwargs)
      File "/usr/lib/python3.8/multiprocessing/pool.py", line 689, in _terminate_pool
        cls._help_stuff_finish(inqueue, task_handler, len(pool))
      File "/usr/lib/python3.8/multiprocessing/pool.py", line 674, in _help_stuff_finish
        inqueue._rlock.acquire()
    KeyboardInterrupt
    Traceback (most recent call last):
      File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
        self.run()
      File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/lib/python3.8/multiprocessing/pool.py", line 114, in worker
        task = get()
      File "/usr/lib/python3.8/multiprocessing/queues.py", line 355, in get
        with self._rlock:
      File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
        return self._semlock.__enter__()
    KeyboardInterrupt
    Traceback (most recent call last):
      File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
        self.run()
      File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/lib/python3.8/multiprocessing/pool.py", line 114, in worker
        task = get()
      File "/usr/lib/python3.8/multiprocessing/queues.py", line 356, in get
        res = self._reader.recv_bytes()
      File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
        buf = self._recv_bytes(maxlength)
      File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
        buf = self._recv(4)
      File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
        chunk = read(handle, remaining)
    KeyboardInterrupt

    A similar problem seems to occur when garbage collecting when it's part of a class, and that class has not yet called self.pool.terminate() and self.pool.close(). Hopefully fixing the atexit behavior will similarly fix the gc.collect() behavior.

    Cross-ref is in SciPy, scipy/scipy#10927. It appears to also cause hangs on Travis 3.8-dev builds: https://travis-ci.org/scipy/scipy/jobs/598786785

    @larsoner larsoner mannequin added 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Oct 16, 2019
    @PaulSilisteanu
    Copy link
    Mannequin

    PaulSilisteanu mannequin commented Oct 17, 2019

    Same issue on Window 10 and macOS Catalina with Python 3.8.0.
    Here is the error I get on Windows 10:

    C:\Users\x>py -3.8
    Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from multiprocessing import get_context
    >>>
    >>> class A(object):
    ...     def __init__(self):
    ...         self.pool = get_context(None).Pool(2)
    ...
    >>> solver=A()
    >>> quit()
    Process SpawnPoolWorker-2:
    Error in atexit._run_exitfuncs:
    Process SpawnPoolWorker-1:
    Traceback (most recent call last):
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 313, in _bootstrap
        self.run()
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 108, in run
        self._target(*self._args, **self._kwargs)
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 114, in worker
        task = get()
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\queues.py", line 355, in get
        with self._rlock:
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\synchronize.py", line 95, in __enter__
        return self._semlock.__enter__()
    Traceback (most recent call last):
    KeyboardInterrupt
    Traceback (most recent call last):
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\util.py", line 277, in _run_finalizers
        finalizer()
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\util.py", line 201, in __call__
        res = self._callback(*self._args, **self._kwargs)
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 689, in _terminate_pool
        cls._help_stuff_finish(inqueue, task_handler, len(pool))
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 674, in _help_stuff_finish
        inqueue._rlock.acquire()
    KeyboardInterrupt
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 313, in _bootstrap
        self.run()
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 108, in run
        self._target(*self._args, **self._kwargs)
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 114, in worker
        task = get()
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\queues.py", line 356, in get
        res = self._reader.recv_bytes()
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\connection.py", line 216, in recv_bytes
        buf = self._recv_bytes(maxlength)
      File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\multiprocessing\connection.py", line 305, in _recv_bytes
        waitres = _winapi.WaitForMultipleObjects(
    KeyboardInterrupt

    C:\Users\x>

    @stefanor
    Copy link
    Mannequin

    stefanor mannequin commented Feb 6, 2020

    A bisect points at #11488 being the change that introduced this hang (bpo-35493)

    @stefanor
    Copy link
    Mannequin

    stefanor mannequin commented Apr 10, 2020

    Looks like it was fixed by PR 19009 (bpo-39360)).

    @pablogsal
    Copy link
    Member

    Looks like it was fixed by PR 19009 (bpo-39360)).

    Can we close the issue then? If not, could yo provide a reproducer and also check if the big is fixed on master or 3.8 HEAD? Thanks!

    @stefanor
    Copy link
    Mannequin

    stefanor mannequin commented Apr 10, 2020

    On Linux, the reproducer in https://bugs.python.org/issue38501#msg354813 fails on ac10e0c^ and passes on ac10e0c, which is why I say PR 19009 fixes it.

    Not sure if there are any special considerations for Windows here.

    HEAD is good too. So from my PoV, good to close this.

    @pablogsal
    Copy link
    Member

    Not sure if there are any special considerations for Windows here.

    It should not be anything windows specific here.

    Also notice that although "it works", the reproducer in https://bugs.python.org/issue38501#msg354813 is out of contract because is not closing or terminating the pool correctly and is not using it with the context manager so if you are dealing with code equivalent to that I would suggest to take a look to properly finalize/finalize the pool to about similar problems (or other problems related to late finalization).

    @larsoner
    Copy link
    Mannequin Author

    larsoner mannequin commented Apr 10, 2020

    If that's out of contract, perhaps there should probably a big, visible warning at the top of the multiprocessning docs stating that creating one of these objects requires either using a context manager or ensuring manual .close()ing?

    1. It either used to be in contract not to close it manually or was wrongly represented, the first Python 2.7 example in the docs (https://docs.python.org/2/library/multiprocessing.html#introduction) is:
    from multiprocessing import Pool
    
    def f(x):
        return x*x
    
    if __name__ == '__main__':
        p = Pool(5)
        print(p.map(f, [1, 2, 3]))

    So I suspect this (difficult to track down) problem might hit users without more adequate warning.

    1. I'm surprised it's actually out of contract when the 3.8 docs state that close will be automatically called:

    close()

    Indicate that no more data will be put on this queue by the current process. The background thread will quit once it has flushed all buffered data to the pipe. This is called automatically when the queue is garbage collected.

    and

    terminate()

    Stops the worker processes immediately without completing outstanding work. When the pool object is garbage collected terminate() will be called immediately.

    Or perhaps I misunderstand what this is saying?

    @pablogsal
    Copy link
    Member

    If that's out of contract, perhaps there should probably a big, visible warning at the top of the multiprocessning docs stating that creating one of these objects requires either using a context manager or ensuring manual .close()ing?

    Why? This is a resource like any other and it requires proper resource management. Would you also put a big warning on "open()" stating that opening a file requires either using a context manager or ensure a manual close()?

    the first Python 2.7 example in the docs

    Python 2.7 is not supported and the pool has changed *a lot* since Python 2. For instance, the pool now does more correct resource management, it does not leak threads and it supports more safe mechanisms as a context manager. The reason it didn't hang that much in Python2.7 is likely because some threads were being leaked.

    This is called automatically when the queue is garbage collected.

    Yeah, and CPython does not promise that the __del__ method of any object will be called, so is not assured that the finalized will call close():

    https://docs.python.org/3/reference/datamodel.html#object.\_\_del__

    "It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits"

    @larsoner
    Copy link
    Mannequin Author

    larsoner mannequin commented Apr 10, 2020

    Why? This is a resource like any other and it requires proper resource management. Would you also put a big warning on "open()" stating that opening a file requires either using a context manager or ensure a manual close()?

    One potential reason would be that the consequences of bad resource management in this case are different than in the open() case, i.e., here the interpreter hangs -- or Travis runs for your repo (SciPy) get stuck with over-50-minute errors, which is how we started looking for how to track it down.

    > the first Python 2.7 example in the docs

    Python 2.7 is not supported and the pool has changed *a lot* since Python 2.

    Indeed, my point is more about potential prevalence: this (now incorrect) problematic usage pattern was the first example in the docs for multiprocessing for a long time, indicating that there might be a lot of code in the wild that might (still) make use of it.

    Yeah, and CPython does not promise that the __del__ method of any object will be called, so is not assured that the finalized will call close():

    Ahh, I didn't know that. This might be another good reason to add a warning about the risks of not ensuring closure yourself: others might have the same gap in knowledge I had, and assume the docs are saying that things will be taken care of by garbage collection on exit when they will not.

    But you of course cannot document every possible problematic situation users could put themselves in, so I leave it up to your judgment whether or not it's worth it in this case.

    @pablogsal
    Copy link
    Member

    One potential reason would be that the consequences of bad resource management in this case are different than in the open() case, i.e., here the interpreter hangs -- or Travis runs for your repo (SciPy) get stuck with over-50-minute errors, which is how we started looking for how to track it down.

    Agreed, but is the same problem: resources need to be managed. I am although ok if you want to add some targeted warning to the multiprocessing pool docs indicating what can happen if the resource is not properly managed.

    Indeed, my point is more about potential prevalence: this (now incorrect) problematic usage pattern was the first example in the docs for multiprocessing for a long time, indicating that there might be a lot of code in the wild that might (still) make use of it.

    Right, we put great efforts to make the code such that even incorrect usages do not hang (and believe me, is *very* challenging) but we cannot sacrifice correct usage fixes or big improvements so incorrect usages keep working even if they are leaking resources. Hopefully, more and more people start using the context manager or are aware that are doing something wrong leaking the pool.

    ----

    In conclusion, I agree that maybe adding some targetted warning to the pool docs about this is in place. When I prepare the PR, would you like to review the message to confirm that is clear enough and that makes sense?

    @pablogsal
    Copy link
    Member

    New changeset 7ec43a7 by Pablo Galindo in branch 'master':
    bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466)
    7ec43a7

    @miss-islington
    Copy link
    Contributor

    New changeset 2e49b52 by Miss Islington (bot) in branch '3.7':
    bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466)
    2e49b52

    @miss-islington
    Copy link
    Contributor

    New changeset ceba064 by Miss Islington (bot) in branch '3.8':
    bpo-38501: Add a warning section to multiprocessing.Pool docs about resource managing (GH-19466)
    ceba064

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants