classification
Title: LoadLibraryExW called with GIL held can cause deadlock
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Tony Roberts, eryksun, paul.moore, pitrou, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2018-06-18 10:45 by Tony Roberts, last changed 2019-02-02 18:59 by steve.dower. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 7789 merged Tony Roberts, 2018-06-19 10:55
Messages (10)
msg319876 - (view) Author: Tony Roberts (Tony Roberts) * Date: 2018-06-18 10:45
In dynload_win.c LoadLibraryExW is called with the GIL held.

This can cause a deadlock in an uncommon case where the GIL also needs to be acquired when another thread is being detached.

Both LoadLibrary and FreeLibrary acquire the Windows loader-lock. If FreeLibrary is called on a module that acquires the GIL when detaching, a dead-lock occurs when another thread with the GIL held blocks on the loader-lock by calling LoadLibrary.

This can happen when Python is embedded in another application via an extension, and where that application may create threads that call into that extension that results in Python code being called. Because the application is creating the thread, the extension that's embedding Python doesn't know when the thread will terminate. The first time the extension is called from that thread and it needs to run some Python code it has to create a new thread state, and when the thread terminates that thread state should be destroyed. In other situations the thread state would be destroyed as part of cleaning up the thread, but here the extension does not know when the thread terminates and so must do it on thread detach in DllMain. Attempting to destroy the thread state this way requires acquiring the GIL, which can cause the deadlock described above.

The safest way to avoid this deadlock (without potentially leaking thread states) would be to release the GIL before calling LoadLibrary.
msg319905 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-06-18 18:27
Do you want to submit a PR for this?  You can take a look at our developer's guide if you're new to contributing to Python:
https://devguide.python.org/
msg319906 - (view) Author: Tony Roberts (Tony Roberts) * Date: 2018-06-18 18:29
Sure, I'll get that done in the next couple of days.
msg319972 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-06-19 14:42
What about existing code that assumes the GIL is already held?

Also, your patch addresses many more situations than raised here, most of which are unnecessary (GetProcAddress and GetModuleHandle don't block in the way that LoadLibrary and FreeLibrary may).

Perhaps the application needs a way to terminate its threads cleanly, other than simply letting them exit and assuming it will be fine?
msg319975 - (view) Author: Tony Roberts (Tony Roberts) * Date: 2018-06-19 14:54
GetProcAddress and GetModuleHandle do block in the same way as LoadLibrary and FreeLibrary - they acquire the loader lock too.

Yes, ideally the application would terminate its threads cleanly, however when Python is embedded in another application it may not have control on when or how the threads are terminated (see original problem description).

I would be surprised if there is any code that assumes the GIL is acquired during any of those functions. When going though the code I found that sometimes the GIL was acquired when calling one of those four functions and at other times it wasn't.

See https://docs.microsoft.com/en-us/dotnet/framework/debug-trace-profile/loaderlock-mda for a (possibly incomplete) list of functions that acquire the loader lock.
msg319977 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-06-19 15:16
Yeah, just after posting I remembered that the blocker is the loader lock and not filesystem/arbitrary code.


Still, "I don't think" isn't sufficient to justify making the change in 3.7 or earlier. If we can show that properly working code could never have assumed the GIL, then I guess it's fine, but otherwise it's too risky.

Making the change without any deprecation period in 3.8 is fine.
msg319981 - (view) Author: Tony Roberts (Tony Roberts) * Date: 2018-06-19 15:27
Sure, that's reasonable :)

For my case I have a usable workaround so not back porting it to < 3.8 is fine for me. My workaround will just leak the thread state if another thread is in __import__, which happens so rarely that it's not really a problem (but not rarely enough that blocking is acceptable!). The other cases changed in this PR would cause the same issue, but in practice for my application it's unlikely.

Thanks!
msg319986 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-06-19 16:34
In some of these cases, it would be simpler to just remove the explicit dynamic linking. 3.6+ doesn't support XP, so CancelIoEx, CreateSymbolicLinkW, RegDeleteKeyExW, RegDisableReflectionKey, RegEnableReflectionKey, and RegQueryReflectionKey can be linked implicitly. 3.7+ doesn't support Vista, in which case GetMaximumProcessorCount can be linked implicitly. (It should be GetActiveProcessorCount. See issue 33166.) OTOH, ShellExecuteExW is loaded explicitly on purpose to avoid a static dependency on shell32.dll.
msg334750 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-02-02 16:19
Sorry for the delay on this. I've approved the PR and restarted the CI systems to make sure it's all okay.

I agree that many of these cases no longer have to dynamically load the functions, but that should be fixed separately.
msg334757 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-02-02 17:16
New changeset 4860f01ac0f07cdc8fc0cc27c33f5a64e5cfec9f by Steve Dower (Tony Roberts) in branch 'master':
bpo-33895: Relase GIL while calling functions that acquire Windows loader lock (GH-7789)
https://github.com/python/cpython/commit/4860f01ac0f07cdc8fc0cc27c33f5a64e5cfec9f
History
Date User Action Args
2019-02-02 18:59:06steve.dowersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-02-02 17:16:44steve.dowersetmessages: + msg334757
2019-02-02 16:19:31steve.dowersetmessages: + msg334750
versions: - Python 3.6, Python 3.7
2018-06-19 16:34:46eryksunsetnosy: + eryksun
messages: + msg319986
2018-06-19 15:27:46Tony Robertssetmessages: + msg319981
2018-06-19 15:16:33steve.dowersetmessages: + msg319977
2018-06-19 14:54:51Tony Robertssetmessages: + msg319975
2018-06-19 14:42:20steve.dowersetmessages: + msg319972
2018-06-19 10:55:30Tony Robertssetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request7393
2018-06-18 18:29:40Tony Robertssetmessages: + msg319906
2018-06-18 18:27:06pitrousetversions: - Python 2.7, Python 3.4, Python 3.5
nosy: + pitrou

messages: + msg319905

type: behavior
stage: needs patch
2018-06-18 10:45:01Tony Robertscreate