Issue 30676: [Windows] Potential for GIL deadlock on Windows in threadmodule acquire_lock

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74861

classification

Title:	[Windows] Potential for GIL deadlock on Windows in threadmodule acquire_lock
Type:	behavior	Stage:	resolved
Components:	Library (Lib), Windows	Versions:	Python 3.6, Python 3.4, Python 3.5

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	paul.moore, pitrou, sdeibel, steve.dower, tim.golden, zach.ware
Priority:	normal	Keywords:

Created on 2017-06-15 13:34 by sdeibel, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg296088 - (view)	Author: Stephan R.A. Deibel (sdeibel)	Date: 2017-06-15 13:34
In acquire_timed in _threadmodule.c (used to implement threading lock acquire()) there is an optimization from https://github.com/python/cpython/commit/e75ff35af2b6c85d48c68b95f295aeac7396b162 that says it first tries non-blocking lock w/o releasing the GIL. However, it seems that this call can block on Windows. The call to PyThread_acquire_lock_timed(lock, 0, 0) (w/o releasing GIL) on Windows calls EnterNonRecursiveMutex which in turn, in the _PY_USE_CV_LOCKS impl, immediately calls PyMUTEX_LOCK as if it's not going to block. However, various implementations of this seem like they can block. Specifically, the impls of PyMUTEX_LOCK that end up using AcquireSRWLockExclusive and EnterCriticalSection both block. The only case that is correct (does not block) is when !_PY_USE_CV_LOCKS where EnterNonRecursiveMutex() instead calls WaitForSingleObjectEx() with the zero timeout. This seems like an incorrect potential for GIL deadlock, because the thread blocks without releasing GIL. Shouldn't it ultimately be using TryAcquireSRWLockExclusive and TryEnterCriticalSection in these two cases? This seems like an issue on Windows only. The pthreads implementations look like they correctly handle timeout of 0 sent to PyThread_acquire_lock_timed().
msg296113 - (view)	Author: Steve Dower (steve.dower) *	Date: 2017-06-15 16:56
We recently tried switching the GIL implementation on Windows by changing the setting you refer to. It didn't work, so we didn't keep the change, but I don't recall the details - likely the same issue you bring up here. I don't have the other issue number handy right now, but it may be worth trying to find it. We should be able to switch to the OS implementation of condition variables, and I'd expect performance improvement, but we need to resolve these issues first.
msg296117 - (view)	Author: Stephan R.A. Deibel (sdeibel)	Date: 2017-06-15 17:30
I think I misunderstood the implementation of EnterNonRecursiveMutex -- the mutex that could block there is the internal 'cs' mutex, which would only be held only briefly while Enter/LeaveNonRecursiveMutex are running, and it looks like the 'cs' mutex is released before doing anything that blocks (in the two impls of PyCOND_WAIT and PyCOND_TIMEDWAIT). So my report is invalid and I'm closing it. Sorry about that!

History
Date	User	Action	Args
2022-04-11 14:58:47	admin	set	github: 74861
2017-06-15 17:30:43	sdeibel	set	status: open -> closed resolution: not a bug messages: + msg296117 stage: resolved
2017-06-15 16:56:22	steve.dower	set	messages: + msg296113
2017-06-15 13:44:58	vstinner	set	title: Potential for GIL deadlock on Windows in threadmodule acquire_lock -> [Windows] Potential for GIL deadlock on Windows in threadmodule acquire_lock
2017-06-15 13:44:33	vstinner	set	nosy: + pitrou
2017-06-15 13:34:17	sdeibel	create