Issue 30395: deadlocked child process after forking on pystate.c's head_mutex

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74580

classification

Title:	deadlocked child process after forking on pystate.c's head_mutex
Type:	enhancement	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.7, Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Louis Brandy, Piotr Wysocki, cooperlees, drothera, gregory.p.smith, lukasz.langa, phantez, pixelbeat, vstinner
Priority:	normal	Keywords:

Created on 2017-05-17 23:25 by Louis Brandy, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 1734	merged	fried, 2017-05-22 23:18
PR 1740	merged	lukasz.langa, 2017-05-23 00:03

Messages (10)
msg293906 - (view)	Author: Louis Brandy (Louis Brandy)	Date: 2017-05-17 23:25
A forked process (via os.fork) can inherit a locked `head_mutex` from its parent process and will promptly deadlock in this stack (on my linux box): Child Process (deadlocked): #0 0x00007f1a4da82e3c in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7f1a4c2964e0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 #1 0x00007f1a4da82e3c in do_futex_wait (sem=sem@entry=0x7f1a4c2964e0, abstime=0x0) at sem_waitcommon.c:111 #2 0x00007f1a4da82efc in __new_sem_wait_slow (sem=0x7f1a4c2964e0, abstime=0x0) at sem_waitcommon.c:181 #3 0x00007f1a4da82fb9 in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29 #4 0x00007f1a4de4c605 in PyThread_acquire_lock_timed (lock=0x7f1a4c2964e0, microseconds=-1, intr_flag=0) at Python/thread_pthread.h:352 #5 0x00007f1a4de4c4b4 in PyThread_acquire_lock (lock=<optimized out>, waitflag=waitflag@entry=1) at Python/thread_pthread.h:556 #6 0x00007f1a4dd59e08 in _PyThreadState_DeleteExcept (tstate=0x7f19f2301800) at Python/pystate.c:483 #7 0x00007f1a4dd46af4 in PyEval_ReInitThreads () at Python/ceval.c:326 #8 0x00007f1a4dd78b0b in PyOS_AfterFork () at ./Modules/signalmodule.c:1608 The parent process has a race between one thread calling `os.fork` (and holding the GIL) and another (in our case C++) thread trying to use PyGILState_Ensure. PyGILState_Ensure will grab the head_mutex before it tries to get the GIL. So if a fork happens in this critical section, the child process will get the locked head_mutex. The path from PyGILState_Ensure -> head_mutex looks like this: #0 new_threadstate (interp=0x7fb5fd483d80, init=init@entry=1) at Python/pystate.c:183 #1 0x00007fb5ff149027 in PyThreadState_New (interp=<optimized out>) at Python/pystate.c:250 #2 0x00007fb5ff006ac7 in PyGILState_Ensure () at Python/pystate.c:838 ---- Possible fix? A simple fix would be to, inside PyOS_AfterFork, reset/unlock pystate.c's head_mutex if it's already locked. Unclear if this is related to: https://bugs.python.org/issue28812
msg294165 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2017-05-22 17:49
You cannot safely use Python's os.fork() in a process that has threads. Because of POSIX. The CPython interpreter is by definition not async signal safe so no Python code may safely be executed after the fork() system call if the process contained _any_ threads at fork() system call time. The only reliable recommendation: Never use os.fork() (if your process _might ever_ contain any threads now or in the future started by your or any of your dependencies). The closest thing to a fix to this is a bunch of band-aids to setup atfork functions to clear the state of known locks in the forked child. -- But even that can't guarantee things because that can't know about all locks. The C standard library can contain locks. malloc() for example. Any C/C++ extension modules can contain locks of their own. The problem isn't solvable within CPython. Adding a clear of pystate.c's head_mutex after forking makes sense. That may even get you further. But you will encounter other problems of the same nature in the future. related: https://bugs.python.org/issue6721 and https://bugs.python.org/issue16500
msg294205 - (view)	Author: Łukasz Langa (lukasz.langa) *	Date: 2017-05-22 23:58
New changeset f82c951d1c5416f3550d544e50ff5662d3836e73 by Łukasz Langa (Jason Fried) in branch 'master': bpo-30395 _PyGILState_Reinit deadlock fix (#1734) https://github.com/python/cpython/commit/f82c951d1c5416f3550d544e50ff5662d3836e73
msg294222 - (view)	Author: Łukasz Langa (lukasz.langa) *	Date: 2017-05-23 05:23
New changeset d29feccec3ce3dcd9ab3100f8956171c70ec3027 by Łukasz Langa in branch '3.6': [3.6] bpo-30395 _PyGILState_Reinit deadlock fix (GH-1734) (#1740) https://github.com/python/cpython/commit/d29feccec3ce3dcd9ab3100f8956171c70ec3027
msg294256 - (view)	Author: Louis Brandy (Louis Brandy)	Date: 2017-05-23 15:37
Thanks to everyone jumping in. I need no convincing that mixing forks and threads isn't just a problem but a problem factory. Given that the rest of this code seems to try to avoid similar deadlocks with similar mutexes, I figured we'd want to include this mutex to make a best-effort at being safe here. That made it worth reporting. To be sure, I still believe that the application code that led us here needs deeper fixes to address the fork/thread problems.
msg294316 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-05-24 04:23
+ head_mutex = NULL; Shouldn't we free memory here?
msg294326 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2017-05-24 05:57
Would PyThread_free_lock (effectively sem_destroy()) work without (additional) problems?
msg294331 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-05-24 07:22
Gregory P. Smith added the comment: Would PyThread_free_lock (effectively sem_destroy()) work without (additional) problems? If I recall correctly, no, you can get issues if the lock is still acquired... But I should check.
msg294366 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2017-05-24 17:40
Another alternative might be to check if the lock is locked (non-blocking acquire?) and release it if so. Under the normal assumption that we are the only thread running immediately post-fork(). I'm not sure that can be guaranteed reliable given that other C code could've used pthread_atfork to register an "after" fork handler that spawns threads. But that should be rare, and nothing here can really fix the underlying "programmer has mixed fork and threads" issue. Only ameliorate it. But i'm not sure this post fork memory leak is really a problem other than potentially showing up in memory sanatizer runs involving forked children. The scenario where it would be an actual leak is if a process does serial fork() calls with the parent(s) dying rather than forking new children from a common parent. That would grow the leak as each child would have an additional lock allocated (a few bytes). I don't _believe_ that kind of process pattern is common (but people do everything).
msg295582 - (view)	Author: Łukasz Langa (lukasz.langa) *	Date: 2017-06-09 21:23
If you'd like to fix the miniscule leak this introduces, feel free but I don't think it's worth the additional complexity. Closing this for now as it solved an issue for us internally and we haven't observed any memory-related issues due to it.

History
Date	User	Action	Args
2022-04-11 14:58:46	admin	set	github: 74580
2017-06-09 21:23:13	lukasz.langa	set	status: open -> closed resolution: not a bug -> fixed messages: + msg295582 stage: needs patch -> resolved
2017-05-24 17:40:04	gregory.p.smith	set	messages: + msg294366
2017-05-24 07:22:27	vstinner	set	messages: + msg294331
2017-05-24 05:57:12	gregory.p.smith	set	messages: + msg294326
2017-05-24 04:23:06	vstinner	set	messages: + msg294316
2017-05-23 15:37:09	Louis Brandy	set	messages: + msg294256
2017-05-23 05:23:07	lukasz.langa	set	messages: + msg294222
2017-05-23 00:03:46	lukasz.langa	set	pull_requests: + pull_request1830
2017-05-22 23:58:57	lukasz.langa	set	nosy: + lukasz.langa messages: + msg294205
2017-05-22 23:18:32	fried	set	pull_requests: + pull_request1825
2017-05-22 17:49:51	gregory.p.smith	set	type: enhancement versions: + Python 3.7 nosy: + gregory.p.smith messages: + msg294165 resolution: not a bug stage: needs patch
2017-05-22 16:08:44	cooperlees	set	nosy: + cooperlees
2017-05-18 14:55:41	Piotr Wysocki	set	nosy: + Piotr Wysocki
2017-05-18 07:12:34	phantez	set	nosy: + phantez
2017-05-18 06:23:17	drothera	set	nosy: + drothera
2017-05-17 23:41:42	pixelbeat	set	nosy: + pixelbeat
2017-05-17 23:28:37	vstinner	set	nosy: + vstinner
2017-05-17 23:25:58	Louis Brandy	create