This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: concurrent.futures deadlock
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: bentoi, bquinlan, cagney, gregory.p.smith, hroncok, hugh, josh.r, jwilk, pablogsal, pitrou, vstinner
Priority: normal Keywords:

Created on 2019-01-31 10:05 by jwilk, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
cf-deadlock.py jwilk, 2019-01-31 10:05
gdb-bt-parent.txt jwilk, 2019-03-19 19:37
gdb-bt-child.txt jwilk, 2019-03-19 19:37
cf-deadlock-alarm.py cagney, 2019-04-02 18:32 Version of cf-deadlock.py that tries to always exit (eventually)
cf-deadlock-1.py cagney, 2019-04-17 18:49
gdb.sh cagney, 2019-04-17 18:50
stack-python.txt bentoi, 2019-09-18 07:34
Messages (34)
msg334618 - (view) Author: Jakub Wilk (jwilk) Date: 2019-01-31 10:05
The attached test program hangs eventually (it may need a few thousand of iterations).

Tested with Python v3.7.2 on Linux, amd64.
msg334648 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-02-01 05:22
I've only got 3.7.1 Ubuntu bash on Windows (also amd64) immediately available, but I'm not seeing a hang, nor is there any obvious memory leak that might eventually lead to problems (memory regularly drops back to under 10 MB shared, 24 KB private working set). I modified your code to add a sys.stdout.flush() after the write so it would actually echo the dots as they were written instead of waiting for a few thousand of them to build up in the buffer, but otherwise it's the same code.

Are you sure you're actually hanging, and it's not just the output getting buffered?
msg334652 - (view) Author: Jakub Wilk (jwilk) Date: 2019-02-01 06:51
You're right that sys.stdout.flush() is missing in my code; but on Linux it doesn't make a big difference, because multiprocessing flushes stdout before fork()ing.

And yes, it really hangs.
msg338376 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-03-19 16:13
This seem related to https://bugs.python.org/issue35809
msg338377 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-03-19 16:15
Could you use gdb/lldb to attach to the process hanging and give us a stack trace?
msg338402 - (view) Author: Jakub Wilk (jwilk) Date: 2019-03-19 19:37
There are two processes running (parent and child) when the thing hangs.
I'm attaching GDB backtraces for both.
msg338501 - (view) Author: Hugh Redelmeier (hugh) Date: 2019-03-20 17:28
@jwilk: thanks for creating cf-deadlock.py

I can replicate the test program hang on Fedora 29 with python3-3.7.2-4.fc29.x86_64

The test program hasn't yet hung on Fedora 29 with older packages, in particular 
python3-3.7.1-4.fc29.x86_64

My interest is due to the fact that the libreswan.org test suite has started to hang and we don't know why.  It might well be this bug.
msg338549 - (view) Author: Hugh Redelmeier (hugh) Date: 2019-03-21 15:38
I've filed a Fedora bug report that points to this one: <https://bugzilla.redhat.com/show_bug.cgi?id=1691434>
msg339334 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-02 13:52
Any update on this issue? I don't understand why the example hangs.
msg339361 - (view) Author: cagney (cagney) Date: 2019-04-02 18:32
I've attached a variation on cf-deadlock.py that, should nothing happen for 2 minutes, will kill itself.  Useful with git bisect.
msg339370 - (view) Author: cagney (cagney) Date: 2019-04-02 21:34
I'm seeing cf-deadlock-alarm.py hangs on vanilla python 3.7.[0123] with:

Linux 5.0.5-100.fc28.x86_64 #1 SMP Wed Mar 27 22:16:29 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
glibc-2.27-37.fc28.x86_64

can anyone reproduce this?

I also wonder if this is connected to bpo-6721 where a recent "fix" made things worse - the fedora versions that work for libreswan don't have the "fix".
msg339451 - (view) Author: cagney (cagney) Date: 2019-04-04 16:04
More info from adding a faulthandler ...

    15	def f():
    16	    import ctypes
    17	
    18	for i in range(0,50):
    19	    sys.stdout.write("\r%d" % i)
    20	    sys.stdout.flush()
    21	    signal.alarm(60*2)
    22	    for j in range(0,1000):
    23	        with concurrent.futures.ProcessPoolExecutor() as executor:
    24	            ftr = executor.submit(f)
    25	            ftr.result()

Thread 0x00007f1ce7fff700 (most recent call first):
  File "/home/python/v3.7.3/lib/python3.7/threading.py", line 296 in wait
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
  File "/home/python/v3.7.3/lib/python3.7/threading.py", line 865 in run
  File "/home/python/v3.7.3/lib/python3.7/threading.py", line 917 in _bootstrap_inner
  File "/home/python/v3.7.3/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f1cec917700 (most recent call first):
  File "/home/python/v3.7.3/lib/python3.7/selectors.py", line 415 in select
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/connection.py", line 920 in wait
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 354 in _queue_management_worker
  File "/home/python/v3.7.3/lib/python3.7/threading.py", line 865 in run
  File "/home/python/v3.7.3/lib/python3.7/threading.py", line 917 in _bootstrap_inner
  File "/home/python/v3.7.3/lib/python3.7/threading.py", line 885 in _bootstrap

Current thread 0x00007f1cfd9486c0 (most recent call first):
  File "/home/python/v3.7.3/lib/python3.7/threading.py", line 296 in wait
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/_base.py", line 427 in result
  File "cf-deadlock.py", line 25 in <module>
msg339464 - (view) Author: cagney (cagney) Date: 2019-04-04 21:45
Here's the children; yes there are somehow 4 children sitting around.  Hopefully this is enough to figure out where things deadlock.

29970  8752  8752 29970 pts/6     8752 Sl+   1000   1:00  |   |   \_ ./v3.7.3/bin/python3 cf-deadlock.py
 8752  8975  8752 29970 pts/6     8752 S+    1000   0:00  |   |       \_ ./v3.7.3/bin/python3 cf-deadlock.py
 8752  8976  8752 29970 pts/6     8752 S+    1000   0:00  |   |       \_ ./v3.7.3/bin/python3 cf-deadlock.py
 8752  8977  8752 29970 pts/6     8752 S+    1000   0:00  |   |       \_ ./v3.7.3/bin/python3 cf-deadlock.py
 8752  8978  8752 29970 pts/6     8752 S+    1000   0:00  |   |       \_ ./v3.7.3/bin/python3 cf-deadlock.py

8975

Current thread 0x00007f3be65126c0 (most recent call first):
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1043 in create_module
  File "<frozen importlib._bootstrap>", line 583 in module_from_spec
  File "<frozen importlib._bootstrap>", line 670 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/home/python/v3.7.3/lib/python3.7/ctypes/__init__.py", line 7 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "cf-deadlock.py", line 17 in f
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 232 in _process_worker
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 99 in run
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 297 in _bootstrap
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/popen_fork.py", line 74 in _launch
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/popen_fork.py", line 20 in __init__
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/context.py", line 277 in _Popen
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 112 in start
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 593 in _adjust_process_count
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 569 in _start_queue_management_thread
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 615 in submit
  File "cf-deadlock.py", line 25 in <module>

8976

Current thread 0x00007f3be65126c0 (most recent call first):
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/connection.py", line 379 in _recv
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/connection.py", line 407 in _recv_bytes
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/connection.py", line 216 in recv_bytes
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/queues.py", line 94 in get
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 226 in _process_worker
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 99 in run
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 297 in _bootstrap
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/popen_fork.py", line 74 in _launch
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/popen_fork.py", line 20 in __init__
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/context.py", line 277 in _Popen
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 112 in start
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 593 in _adjust_process_count
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 569 in _start_queue_management_thread
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 615 in submit
  File "cf-deadlock.py", line 25 in <module>

8977

Current thread 0x00007f3be65126c0 (most recent call first):
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/synchronize.py", line 95 in __enter__
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/queues.py", line 93 in get
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 226 in _process_worker
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 99 in run
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 297 in _bootstrap
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/popen_fork.py", line 74 in _launch
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/popen_fork.py", line 20 in __init__
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/context.py", line 277 in _Popen
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 112 in start
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 593 in _adjust_process_count
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 569 in _start_queue_management_thread
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 615 in submit
  File "cf-deadlock.py", line 25 in <module>

8978

Current thread 0x00007f3be65126c0 (most recent call first):
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/synchronize.py", line 95 in __enter__
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/queues.py", line 93 in get
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 226 in _process_worker
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 99 in run
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 297 in _bootstrap
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/popen_fork.py", line 74 in _launch
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/popen_fork.py", line 20 in __init__
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/context.py", line 277 in _Popen
  File "/home/python/v3.7.3/lib/python3.7/multiprocessing/process.py", line 112 in start
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 593 in _adjust_process_count
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 569 in _start_queue_management_thread
  File "/home/python/v3.7.3/lib/python3.7/concurrent/futures/process.py", line 615 in submit
  File "cf-deadlock.py", line 25 in <module>
msg340263 - (view) Author: Miro Hrončok (hroncok) * Date: 2019-04-15 11:05
Reverting 3b699932e5ac3e76031bbb6d700fbea07492641d makes problem go away.
msg340278 - (view) Author: cagney (cagney) Date: 2019-04-15 14:30
@hroncok see comment msg339370

Vanilla 3.7.0 (re-confirmed) didn't contain the change, nor did 3.6.8 (ok, that isn't vanilla) but both can hang using the test.  It can take a while and, subjectively, it seems to depend on machine load.  I've even struggled to get 3.7.3 to fail without load.

Presumably there's a race and grinding the test machine into the ground increases the odds of it happening.

The patch for bpo-6721 could be causing many things, but two to mind:

- turning this bug into bpo-36533 (aka bpo-6721 caused a regression)
- slowed down the fork (sending time acquiring locks) which increased the odds of this hang

My hunch is the latter as the stack dumps look nothing like those I analyzed for bpo-36533 (see messages msg339454 and msg339458).
msg340343 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-16 13:49
Gregory: It seems like https://github.com/python/cpython/commit/3b699932e5ac3e76031bbb6d700fbea07492641d is causing deadlocks which is not a good thing. What do you think of reverting this change?
msg340344 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-16 13:54
A least 2 projects were broken by the logging change: libreswan and Anaconda.

> I've filed a Fedora bug report that points to this one: <https://bugzilla.redhat.com/show_bug.cgi?id=1691434>

That's related to the libreswan project.


Last year, there was another regression in Anaconda: Fedora installer:

> https://bugzilla.redhat.com/show_bug.cgi?id=1644936

The workaround/fix was to revert 3b699932e5ac3e7 in Python. Anaconda has been modified, and we were able to revert the revert 3b699932e5ac3e7 :-)

I'm not sure what was the Anaconda fix. Maybe this change?
https://github.com/rhinstaller/anaconda/pull/1721
msg340358 - (view) Author: cagney (cagney) Date: 2019-04-16 17:05
(disclaimer: I'm mashing my high level backtraces in with @jwiki's low level backtraces)

The Python backtrace shows the deadlocked process called 'f' which then 'called':
    import ctypes
which, in turn 'called':
    from _ctypes import Union, Structure, Array
and that hung.

The low-level back-trace shows it was trying to acquire a lock (no surprises there); but the surprise is that it is inside of dlopen() trying to load '_ctypes...so'!

#11 __dlopen (file=file@entry=0x7f398da4b050 "_ctypes.cpython-37m-x86_64-linux-gnu.so", mode=<optimized out>) at dlopen.c:87
...
#3 _dl_map_object_from_fd (name="_ctypes.cpython-37m-x86_64-linux-gnu.so", origname=origname@entry=0x0, fd=-1, fbp=<optimized out>, realname=<optimized out>, loader=loader@entry=0x0, l_type=<optimized out>, mode=<optimized out>, stack_endp=<optimized out>, nsid=<optimized out>) at dl-load.c:1413
#2 _dl_add_to_namespace_list (new=0x55f8b8f34540, nsid=0) at dl-object.c:34
#1 __GI___pthread_mutex_lock (mutex=0x7f3991fb9970 <_rtld_global+2352>) at ../nptl/pthread_mutex_lock.c:115

and the lock in question (assuming my sources roughly match above) seems to be:

  /* We modify the list of loaded objects.  */
  __rtld_lock_lock_recursive (GL(dl_load_write_lock));

presumably a thread in the parent held this lock at the time of the fork.

If one of the other children also has the lock pre-acquired then this is confirmed (unfortunately not having the lock won't rebut the theory).

So, any guesses as to what dl related operation was being performed by the parent?

----

I don't think the remaining processes are involved (and I've probably got 4 in total because my machine has 4 cores).

8976 - this acquired the multi-process semaphore and is blocked in '_recv' awaiting further instructions
8978, 8977 - these are blocked waiting for above to free the multi-process semaphore
msg340359 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-04-16 17:20
Please do not blindly revert that.  See my PR in https://bugs.python.org/issue36533 which is specific to this "issue" with logging.
msg340360 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-04-16 17:22
I'd appreciate it if someone with an application running into the issue could be tested with my PR from issue36533 (https://github.com/python/cpython/pull/12704) applied.
msg340362 - (view) Author: Jakub Wilk (jwilk) Date: 2019-04-16 19:49
https://github.com/python/cpython/pull/12704 doesn't fix the bug for me.
Reverting 3b699932e5ac3e76031bbb6d700fbea07492641d doesn't fix it either.
msg340430 - (view) Author: cagney (cagney) Date: 2019-04-17 18:46
Here's a possible stack taken during the fork():

Thread 1 "python3" hit Breakpoint 1, 0x00007ffff7124734 in fork () from /lib64/libc.so.6

Thread 1814 (Thread 0x7fffe69d5700 (LWP 23574)):
#0  0x00007ffff7bc24e5 in __pthread_mutex_unlock_usercnt () from /lib64/libpthread.so.0
#1  0x00007ffff71928e3 in dl_iterate_phdr () from /lib64/libc.so.6
#2  0x00007fffe5fcfe55 in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#3  0x00007fffe5fcc403 in uw_frame_state_for () from /lib64/libgcc_s.so.1
#4  0x00007fffe5fcd90f in _Unwind_ForcedUnwind_Phase2 () from /lib64/libgcc_s.so.1
#5  0x00007fffe5fcdf30 in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1
#6  0x00007ffff7bc7712 in __pthread_unwind () from /lib64/libpthread.so.0
#7  0x00007ffff7bbf7e7 in pthread_exit () from /lib64/libpthread.so.0
#8  0x000000000051b2fc in PyThread_exit_thread () at Python/thread_pthread.h:238
#9  0x000000000055ed16 in t_bootstrap (boot_raw=0x7fffe8da0e40) at ./Modules/_threadmodule.c:1021
#10 0x00007ffff7bbe594 in start_thread () from /lib64/libpthread.so.0
#11 0x00007ffff7157e5f in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffff7fca080 (LWP 20524)):
#0  0x00007ffff7124734 in fork () from /lib64/libc.so.6
#1  0x0000000000532c8a in os_fork_impl (module=<optimized out>) at ./Modules/posixmodule.c:5423
#2  os_fork (module=<optimized out>, _unused_ignored=<optimized out>) at ./Modules/clinic/posixmodule.c.h:1913

where, in my source code, dl_iterate_phdr() starts with something like:

  /* Make sure nobody modifies the list of loaded objects.  */
  __rtld_lock_lock_recursive (GL(dl_load_write_lock));

i.e., when the fork occures, the non-fork thread has acquired dl_load_write_lock - the same lock that the child will later try to acquire (and hang)

no clue as to what that thread is doing though; other than it looks like it is trying to generate a backtrace?
msg340431 - (view) Author: cagney (cagney) Date: 2019-04-17 18:49
run ProcessPoolExecutor with one fixed child (over ride default of #cores)
msg340432 - (view) Author: cagney (cagney) Date: 2019-04-17 18:50
script to capture stack backtrace at time of fork, last backtrace printed will be for hang
msg340442 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-04-17 20:07
I am unable to get cf-deadlock.py to hang on my own builds of pure CPython 3.7.2+ d7cb2034bb or 3.6.8+ be77fb7a6e (versions i had in a local git clone).

which specific python builds are seeing the hang using?  Which specific platform/distro version?  "3.7.2" isn't enough, if you are using a distro supplied interpreter please try and reproduce this with a build from the CPython tree itself.  distros always apply their own patches to their own interpreters.

...

Do realize that while working on this it is fundamentally *impossible* per POSIX for os.fork() to be safely used at the Python level in a process also using pthreads.  That this _ever_ appeared to work is a pure accident of implementations of underlying libc, malloc, system libraries, and kernel behaviors.  POSIX considers it undefined behavior.  Nothing done in CPython can avoid that.  Any "fix" for these kinds of issues is merely working around the inevitable which will re-occur.

concurrent.futures.ProcessPoolExecutor uses multiprocessing for its process management.  As of 3.7 ProcessPoolExecutor accepts a mp_context parameter to specify the multiprocessing start method.  Alternatively the default appears
to be controllable as a global setting https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods.

Use the 'spawn' start method and the problem should go away as it'll no longer be misusing os.fork().  You _might_ be able to get the 'forkserver' start method to work, but only reliably if you make sure the forkserver is spawned _before_ any threads in the process (such as ProcessPoolExecutor's own queue management thread - which appears to be spawned upon the first call to .submit()).
msg340451 - (view) Author: cagney (cagney) Date: 2019-04-17 22:01
We're discussing vanilla Python, for instance v3.7.0 is:

  git clone .../cpython
  cd cpython
  git checkout v3.7.0
  ./configure --prefix=/home/python/v3.7.0
  make -j && make -j install

(my 3.6.x wasn't vanilla, but I clearly stated that)

Like I also mentioned, loading down the machine also helps.  Try something like running #cores*2 of the script in parallel?
msg340456 - (view) Author: cagney (cagney) Date: 2019-04-17 23:23
@gregory.p.smith, I'm puzzled by your references to POSIX and/or os.fork().

The code in question looks like:

import concurrent.futures
import sys

def f():
    import ctypes

while True:
    with concurrent.futures.ProcessPoolExecutor() as executor:
        ftr = executor.submit(f)
        ftr.result()

which, to me, looks like pure Python.

Are you saying that this code can't work on GNU/Linux systems.
msg340459 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-04-17 23:55
concurrent.futures.ProcessPoolExecutor uses both multiprocessing and threading.  multiprocessing defaults to using os.fork().
msg340464 - (view) Author: cagney (cagney) Date: 2019-04-18 02:34
So:

#1 we've a bug: the single-threaded ProcessPoolExecutor test program should work 100% reliably - it does not

#2 we've a cause: ProcessPoolExecutor is implemented internally using an unfortunate combination of fork and threads, this is causing the deadlock

#3 we've got a workaround - something like:
   ProcessPoolExecutor(multiprocessing.get_context('spawn'))
but I'm guessing, the documentation is scant.

As for a fix, maybe:
- have ProcessPoolExecutor use 'spawn' by default; this way things always work
- have ProcessPoolExecutor properly synchronized its threads before "spawning"/"forking"/... so that "single-threaded" code works
- document that combining ProcessPoolExecutor's "fork" option and user threads isn't a good idea
msg340468 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-04-18 05:07
> "the single-threaded ProcessPoolExecutor test program"

I doubt it is single threaded, the .submit() method appears to spawn a thread internally.
msg352707 - (view) Author: bentoi (bentoi) Date: 2019-09-18 07:34
FYI, I'm getting a similar deadlock in a child Python process which is stuck on locking a mutex from the dl library. See attached stack. 

I'm not using concurrent.futures however, the parent Python process is a test driver that uses threading.Thread and subprocess.Popen to spawn new processes... I'm not using os.fork().

This occurred on ArchLinux with Python 3.7.4.
msg396675 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-28 23:45
> #6  0x00007ffff7bc7712 in __pthread_unwind () from /lib64/libpthread.so.0
> #7  0x00007ffff7bbf7e7 in pthread_exit () from /lib64/libpthread.so.0
> #8  0x000000000051b2fc in PyThread_exit_thread () at Python/thread_pthread.h:238
> #9  0x000000000055ed16 in t_bootstrap (boot_raw=0x7fffe8da0e40) at ./Modules/_threadmodule.c:1021

The thread_run() function (previously "t_bootstrap()") which is the low-level C function to run a thread in Python _thread.start_new_thread() no longer calls PyThread_exit_thread(): see bpo-44434.

I'm able to reproduce the issue using attached cf-deadlock.py with Python 3.8:

* Run 3 processes running cf-deadlock.py 
* Run the Python test suite to stress the machine: ./python -m test -r -j4  # try different -j values to stess the machine more or less

In the main branch (with bpo-44434 fix), I can no longer reproduce the issue. I ran cf-deadlock.py in 4 terminals in parallel with "./python -m test -r -j2" in a 5th terminal for 5 minutes. I couldn't reproduce the issue. On Python 3.8, I reproduced the issue in less than 1 minute.

Can someone please confirm that the issue is now fixed? Can we mark this issue as a duplicate of bpo-44434?
msg396910 - (view) Author: Jakub Wilk (jwilk) Date: 2021-07-03 15:30
I can no longer reproduce the bug with Python from git.
msg396929 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-07-03 21:53
Great! I close the issue.
History
Date User Action Args
2022-04-11 14:59:10adminsetgithub: 80047
2021-07-03 21:53:10vstinnersetmessages: + msg396929
2021-07-03 21:52:31vstinnersetstatus: open -> closed
resolution: fixed
stage: resolved
2021-07-03 15:30:56jwilksetmessages: + msg396910
2021-06-28 23:45:32vstinnersetmessages: + msg396675
2019-09-18 07:34:39bentoisetfiles: + stack-python.txt
nosy: + bentoi
messages: + msg352707

2019-04-18 05:07:53gregory.p.smithsetmessages: + msg340468
2019-04-18 02:34:06cagneysetmessages: + msg340464
2019-04-17 23:55:56gregory.p.smithsetmessages: + msg340459
2019-04-17 23:23:45cagneysetmessages: + msg340456
2019-04-17 22:01:48cagneysetmessages: + msg340451
2019-04-17 20:07:40gregory.p.smithsetmessages: + msg340442
2019-04-17 18:50:20cagneysetfiles: + gdb.sh

messages: + msg340432
2019-04-17 18:49:45cagneysetfiles: + cf-deadlock-1.py

messages: + msg340431
2019-04-17 18:46:21cagneysetmessages: + msg340430
2019-04-16 19:49:45jwilksetmessages: + msg340362
2019-04-16 17:22:31gregory.p.smithsetmessages: + msg340360
2019-04-16 17:20:47gregory.p.smithsetmessages: + msg340359
versions: + Python 3.8
2019-04-16 17:05:00cagneysetmessages: + msg340358
2019-04-16 13:54:57vstinnersetmessages: + msg340344
2019-04-16 13:49:44vstinnersetnosy: + gregory.p.smith
messages: + msg340343
2019-04-15 14:30:09cagneysetmessages: + msg340278
2019-04-15 11:05:54hroncoksetnosy: + hroncok
messages: + msg340263
2019-04-04 21:45:30cagneysetmessages: + msg339464
2019-04-04 16:04:24cagneysetmessages: + msg339451
2019-04-02 21:34:08cagneysetmessages: + msg339370
2019-04-02 18:32:48cagneysetfiles: + cf-deadlock-alarm.py

messages: + msg339361
2019-04-02 13:52:34vstinnersetnosy: + vstinner
messages: + msg339334
2019-03-21 15:38:50hughsetmessages: + msg338549
2019-03-20 17:28:38hughsetnosy: + hugh
messages: + msg338501
2019-03-19 19:37:37jwilksetfiles: + gdb-bt-child.txt
2019-03-19 19:37:28jwilksetfiles: + gdb-bt-parent.txt

messages: + msg338402
2019-03-19 16:15:12pablogsalsetmessages: + msg338377
2019-03-19 16:13:45pablogsalsetnosy: + pablogsal
messages: + msg338376
2019-03-19 16:06:46cagneysetnosy: + cagney
2019-02-01 06:51:35jwilksetmessages: + msg334652
2019-02-01 05:22:29josh.rsetnosy: + josh.r
messages: + msg334648
2019-01-31 10:45:59SilentGhostsetnosy: + bquinlan, pitrou
2019-01-31 10:05:11jwilkcreate