classification
Title: python 3.8 hang in multiprocessing.Pool() locking on FreeBSD / Linux
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: AMDmi3, arekm, davin, koobs, pablogsal, pitrou
Priority: normal Keywords:

Created on 2019-11-08 15:00 by AMDmi3, last changed 2020-03-15 19:54 by pablogsal. This issue is now closed.

Messages (11)
msg356243 - (view) Author: Dmitry Marakasov (AMDmi3) * Date: 2019-11-08 15:00
System: FreeBSD 12.0-RELEASE, amd64.

This simple program

    from multiprocessing import Pool
    from time import sleep

    Pool().map(sleep, [0.01] * 10)

works fine with python 3.7, but is likely (about 20-50% probability on my 4 core box) to hang with python 3.8. Example backtraces after interruption:

% python3.8 1.py
^CException ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/multiprocessing/util.py", line 201, in __call__
Process ForkPoolWorker-2:
Process ForkPoolWorker-4:
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 689, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 674, in _help_stuff_finish
    inqueue._rlock.acquire()
KeyboardInterrupt: 
Process ForkPoolWorker-3:
Process ForkPoolWorker-1:

% python3.8 1.py
^CException ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/multiprocessing/util.py", line 201, in __call__
Process ForkPoolWorker-3:
Process ForkPoolWorker-4:
Process ForkPoolWorker-1:
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 689, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 674, in _help_stuff_finish
    inqueue._rlock.acquire()
KeyboardInterrupt: 
Process ForkPoolWorker-2:

% python3.8 1.py
^CException ignored in: <Finalize object, dead>
Process ForkPoolWorker-2:
Process ForkPoolWorker-3:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/multiprocessing/util.py", line 201, in __call__
Process ForkPoolWorker-1:
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 689, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 674, in _help_stuff_finish
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/local/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/usr/local/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/usr/local/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/local/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
    inqueue._rlock.acquire()
KeyboardInterrupt: 
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/local/lib/python3.8/multiprocessing/queues.py", line 356, in get
    res = self._reader.recv_bytes()
  File "/usr/local/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/local/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/local/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
Process ForkPoolWorker-4:

Related issue in FreeBSD bugzilla: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241801
msg356546 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-11-13 18:59
What happens if you instead write:

with Pool() as pool:
    pool.map(sleep, [0.01] * 10)
msg356551 - (view) Author: Dmitry Marakasov (AMDmi3) * Date: 2019-11-13 20:18
> What happens if you instead write:

It survived more than 4000 runs, it looks like it doesn't experience the problem.
msg364003 - (view) Author: Arkadiusz Miśkiewicz (arekm) Date: 2020-03-12 08:54
That test program hangs even on Linux (kernel 4.9.184, glibc 2.30):


$ python3 a.py
Process ForkPoolWorker-13:
Process ForkPoolWorker-12:
Process ForkPoolWorker-16:
Process ForkPoolWorker-15:
Process ForkPoolWorker-11:
Process ForkPoolWorker-6:
Process ForkPoolWorker-7:
Process ForkPoolWorker-14:
Process ForkPoolWorker-5:
Process ForkPoolWorker-2:
Process ForkPoolWorker-10:
Process ForkPoolWorker-3:
Process ForkPoolWorker-8:
Process ForkPoolWorker-1:
Process ForkPoolWorker-4:
Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "/usr/lib64/python3.8/multiprocessing/util.py", line 201, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib64/python3.8/multiprocessing/pool.py", line 689, in _terminate_pool
Process ForkPoolWorker-9:
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/lib64/python3.8/multiprocessing/pool.py", line 674, in _help_stuff_finish
    inqueue._rlock.acquire()
KeyboardInterrupt:

$ python3 --version
Python 3.8.2
msg364178 - (view) Author: Arkadiusz Miśkiewicz (arekm) Date: 2020-03-14 14:59
Bisecting led to this commit and reverting it on 3.8 branch makes the hang go away.

7c994549dcffd0d9d3bb37475e6374f356e7240e is the first bad commit
commit 7c994549dcffd0d9d3bb37475e6374f356e7240e
Author: Pablo Galindo <Pablogsal@gmail.com>
Date:   Sat Mar 16 22:34:24 2019 +0000

    bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool (#11488)

    * bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool

    * Use self-pipe pattern to avoid polling for changes

    * Refactor some variable names and add comments

    * Restore timeout and poll

    * Use reader object only on wait()

    * Recompute worker sentinels every time

    * Remove timeout and use change notifier

    * Refactor some methods to be overloaded by the ThreadPool, document the cache class and fix typos

 Lib/multiprocessing/pool.py                        | 88 +++++++++++++++++++---
 .../2019-01-09-23-43-08.bpo-35493.kEcRGE.rst       |  3 +
 2 files changed, 80 insertions(+), 11 deletions(-)
 create mode 100644 bpo-35493.kEcRGE.rst">Misc/NEWS.d/next/Library/2019-01-09-23-43-08.bpo-35493.kEcRGE.rst
msg364182 - (view) Author: Arkadiusz Miśkiewicz (arekm) Date: 2020-03-14 15:18
bpo-35493
msg364183 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-14 15:36
Check out https://bugs.python.org/issue39360 foe more context on this issue.
msg364184 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-14 15:41
Also, notice that writing

Pool().map(sleep, [0.01] * 10)

is out of contact as the Pool object can
be collected immediately and there is no
proper termination and cleanup. The correct
way is to use the context manager:

with Pool() as pool:
    pool.map(sleep, [0.01] * 10)
msg364211 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-15 02:15
I am mentoring @BTaskaya and I asked to take a look a this. After some debugging, we found that the problem is the following:

When the pull is not used via the context manager or terminate() is called, there is a system in multiprocessing.util that handles finalization of all pools via an atexit handler (the Finalize) class. This class registers the _terminate_pool handler in the registry of finalizers of the module, and that registry is called on interpreter exit via _exit_function. 

The problem is that the "happy" path with the context manager or manual call to finalize() does some extra steps that _terminate_pool does not:

        def terminate(self):
            util.debug('terminating pool')
            self._state = TERMINATE
            self._worker_handler._state = TERMINATE
            self._change_notifier.put(None)
            self._terminate()

In this code self._terminate() calls _terminate_pool. The step that is not executed when the atexit() handler calls _terminate_pool is pinging the _change_notifier queue to unblock the maintenance threads.

The fix is moving the "self._change_notifier.put(None)" to the _terminate_pool function.
msg364212 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-15 02:15
BTaskaya will create a PR to fix this soon.
msg364256 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-15 19:54
Closed via PR 19009
History
Date User Action Args
2020-03-15 19:54:21pablogsalsetversions: + Python 3.9
2020-03-15 19:54:16pablogsalsetstatus: open -> closed
resolution: fixed
messages: + msg364256

stage: resolved
2020-03-15 02:15:35pablogsalsetmessages: + msg364212
2020-03-15 02:15:13pablogsalsetmessages: + msg364211
2020-03-14 15:41:12pablogsalsetmessages: + msg364184
2020-03-14 15:36:32pablogsalsetmessages: + msg364183
2020-03-14 15:18:58arekmsetmessages: + msg364182
2020-03-14 14:59:40arekmsetmessages: + msg364178
2020-03-12 08:54:01arekmsetnosy: + arekm

messages: + msg364003
title: python 3.8 hang in multiprocessing.Pool() locking on FreeBSD -> python 3.8 hang in multiprocessing.Pool() locking on FreeBSD / Linux
2019-11-18 02:52:20koobssetnosy: + koobs
2019-11-13 20:18:30AMDmi3setmessages: + msg356551
2019-11-13 19:05:34pitrousetnosy: + pablogsal
2019-11-13 18:59:19pitrousetmessages: + msg356546
2019-11-08 19:42:59ned.deilysetnosy: + pitrou, davin
2019-11-08 15:00:44AMDmi3create