This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author cagney
Recipients bquinlan, cagney, gregory.p.smith, hroncok, hugh, josh.r, jwilk, pablogsal, pitrou, vstinner
Date 2019-04-16.17:05:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1555434300.64.0.221149233392.issue35866@roundup.psfhosted.org>
In-reply-to
Content
(disclaimer: I'm mashing my high level backtraces in with @jwiki's low level backtraces)

The Python backtrace shows the deadlocked process called 'f' which then 'called':
    import ctypes
which, in turn 'called':
    from _ctypes import Union, Structure, Array
and that hung.

The low-level back-trace shows it was trying to acquire a lock (no surprises there); but the surprise is that it is inside of dlopen() trying to load '_ctypes...so'!

#11 __dlopen (file=file@entry=0x7f398da4b050 "_ctypes.cpython-37m-x86_64-linux-gnu.so", mode=<optimized out>) at dlopen.c:87
...
#3 _dl_map_object_from_fd (name="_ctypes.cpython-37m-x86_64-linux-gnu.so", origname=origname@entry=0x0, fd=-1, fbp=<optimized out>, realname=<optimized out>, loader=loader@entry=0x0, l_type=<optimized out>, mode=<optimized out>, stack_endp=<optimized out>, nsid=<optimized out>) at dl-load.c:1413
#2 _dl_add_to_namespace_list (new=0x55f8b8f34540, nsid=0) at dl-object.c:34
#1 __GI___pthread_mutex_lock (mutex=0x7f3991fb9970 <_rtld_global+2352>) at ../nptl/pthread_mutex_lock.c:115

and the lock in question (assuming my sources roughly match above) seems to be:

  /* We modify the list of loaded objects.  */
  __rtld_lock_lock_recursive (GL(dl_load_write_lock));

presumably a thread in the parent held this lock at the time of the fork.

If one of the other children also has the lock pre-acquired then this is confirmed (unfortunately not having the lock won't rebut the theory).

So, any guesses as to what dl related operation was being performed by the parent?

----

I don't think the remaining processes are involved (and I've probably got 4 in total because my machine has 4 cores).

8976 - this acquired the multi-process semaphore and is blocked in '_recv' awaiting further instructions
8978, 8977 - these are blocked waiting for above to free the multi-process semaphore
History
Date User Action Args
2019-04-16 17:05:00cagneysetrecipients: + cagney, gregory.p.smith, bquinlan, pitrou, vstinner, jwilk, josh.r, hroncok, pablogsal, hugh
2019-04-16 17:05:00cagneysetmessageid: <1555434300.64.0.221149233392.issue35866@roundup.psfhosted.org>
2019-04-16 17:05:00cagneylinkissue35866 messages
2019-04-16 17:05:00cagneycreate