Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadLibraryExW called with GIL held can cause deadlock #78076

Closed
tonyroberts mannequin opened this issue Jun 18, 2018 · 10 comments
Closed

LoadLibraryExW called with GIL held can cause deadlock #78076

tonyroberts mannequin opened this issue Jun 18, 2018 · 10 comments
Labels
3.8 only security fixes OS-windows type-bug An unexpected behavior, bug, or error

Comments

@tonyroberts
Copy link
Mannequin

tonyroberts mannequin commented Jun 18, 2018

BPO 33895
Nosy @pfmoore, @pitrou, @tjguk, @zware, @eryksun, @zooba, @tonyroberts
PRs
  • bpo-33895: Relase GIL while calling functions that acquire Windows loader lock #7789
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-02-02.18:59:06.944>
    created_at = <Date 2018-06-18.10:45:01.046>
    labels = ['type-bug', '3.8', 'OS-windows']
    title = 'LoadLibraryExW called with GIL held can cause deadlock'
    updated_at = <Date 2019-02-02.18:59:06.944>
    user = 'https://github.com/tonyroberts'

    bugs.python.org fields:

    activity = <Date 2019-02-02.18:59:06.944>
    actor = 'steve.dower'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-02-02.18:59:06.944>
    closer = 'steve.dower'
    components = ['Windows']
    creation = <Date 2018-06-18.10:45:01.046>
    creator = 'Tony Roberts'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33895
    keywords = ['patch']
    message_count = 10.0
    messages = ['319876', '319905', '319906', '319972', '319975', '319977', '319981', '319986', '334750', '334757']
    nosy_count = 7.0
    nosy_names = ['paul.moore', 'pitrou', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'Tony Roberts']
    pr_nums = ['7789']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue33895'
    versions = ['Python 3.8']

    @tonyroberts
    Copy link
    Mannequin Author

    tonyroberts mannequin commented Jun 18, 2018

    In dynload_win.c LoadLibraryExW is called with the GIL held.

    This can cause a deadlock in an uncommon case where the GIL also needs to be acquired when another thread is being detached.

    Both LoadLibrary and FreeLibrary acquire the Windows loader-lock. If FreeLibrary is called on a module that acquires the GIL when detaching, a dead-lock occurs when another thread with the GIL held blocks on the loader-lock by calling LoadLibrary.

    This can happen when Python is embedded in another application via an extension, and where that application may create threads that call into that extension that results in Python code being called. Because the application is creating the thread, the extension that's embedding Python doesn't know when the thread will terminate. The first time the extension is called from that thread and it needs to run some Python code it has to create a new thread state, and when the thread terminates that thread state should be destroyed. In other situations the thread state would be destroyed as part of cleaning up the thread, but here the extension does not know when the thread terminates and so must do it on thread detach in DllMain. Attempting to destroy the thread state this way requires acquiring the GIL, which can cause the deadlock described above.

    The safest way to avoid this deadlock (without potentially leaking thread states) would be to release the GIL before calling LoadLibrary.

    @tonyroberts tonyroberts mannequin added 3.7 (EOL) end of life 3.8 only security fixes OS-windows labels Jun 18, 2018
    @pitrou
    Copy link
    Member

    pitrou commented Jun 18, 2018

    Do you want to submit a PR for this? You can take a look at our developer's guide if you're new to contributing to Python:
    https://devguide.python.org/

    @pitrou pitrou added the type-bug An unexpected behavior, bug, or error label Jun 18, 2018
    @tonyroberts
    Copy link
    Mannequin Author

    tonyroberts mannequin commented Jun 18, 2018

    Sure, I'll get that done in the next couple of days.

    @zooba
    Copy link
    Member

    zooba commented Jun 19, 2018

    What about existing code that assumes the GIL is already held?

    Also, your patch addresses many more situations than raised here, most of which are unnecessary (GetProcAddress and GetModuleHandle don't block in the way that LoadLibrary and FreeLibrary may).

    Perhaps the application needs a way to terminate its threads cleanly, other than simply letting them exit and assuming it will be fine?

    @tonyroberts
    Copy link
    Mannequin Author

    tonyroberts mannequin commented Jun 19, 2018

    GetProcAddress and GetModuleHandle do block in the same way as LoadLibrary and FreeLibrary - they acquire the loader lock too.

    Yes, ideally the application would terminate its threads cleanly, however when Python is embedded in another application it may not have control on when or how the threads are terminated (see original problem description).

    I would be surprised if there is any code that assumes the GIL is acquired during any of those functions. When going though the code I found that sometimes the GIL was acquired when calling one of those four functions and at other times it wasn't.

    See https://docs.microsoft.com/en-us/dotnet/framework/debug-trace-profile/loaderlock-mda for a (possibly incomplete) list of functions that acquire the loader lock.

    @zooba
    Copy link
    Member

    zooba commented Jun 19, 2018

    Yeah, just after posting I remembered that the blocker is the loader lock and not filesystem/arbitrary code.

    Still, "I don't think" isn't sufficient to justify making the change in 3.7 or earlier. If we can show that properly working code could never have assumed the GIL, then I guess it's fine, but otherwise it's too risky.

    Making the change without any deprecation period in 3.8 is fine.

    @tonyroberts
    Copy link
    Mannequin Author

    tonyroberts mannequin commented Jun 19, 2018

    Sure, that's reasonable :)

    For my case I have a usable workaround so not back porting it to < 3.8 is fine for me. My workaround will just leak the thread state if another thread is in __import__, which happens so rarely that it's not really a problem (but not rarely enough that blocking is acceptable!). The other cases changed in this PR would cause the same issue, but in practice for my application it's unlikely.

    Thanks!

    @eryksun
    Copy link
    Contributor

    eryksun commented Jun 19, 2018

    In some of these cases, it would be simpler to just remove the explicit dynamic linking. 3.6+ doesn't support XP, so CancelIoEx, CreateSymbolicLinkW, RegDeleteKeyExW, RegDisableReflectionKey, RegEnableReflectionKey, and RegQueryReflectionKey can be linked implicitly. 3.7+ doesn't support Vista, in which case GetMaximumProcessorCount can be linked implicitly. (It should be GetActiveProcessorCount. See bpo-33166.) OTOH, ShellExecuteExW is loaded explicitly on purpose to avoid a static dependency on shell32.dll.

    @zooba
    Copy link
    Member

    zooba commented Feb 2, 2019

    Sorry for the delay on this. I've approved the PR and restarted the CI systems to make sure it's all okay.

    I agree that many of these cases no longer have to dynamically load the functions, but that should be fixed separately.

    @zooba zooba removed the 3.7 (EOL) end of life label Feb 2, 2019
    @zooba
    Copy link
    Member

    zooba commented Feb 2, 2019

    New changeset 4860f01 by Steve Dower (Tony Roberts) in branch 'master':
    bpo-33895: Relase GIL while calling functions that acquire Windows loader lock (GH-7789)
    4860f01

    @zooba zooba closed this as completed Feb 2, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants