This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyGILState_Release does not release gil correctly, resulting in deadlock
Type: Stage:
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: 123 wlpha, Damien LEFEVRE, ronaldoussoren
Priority: normal Keywords:

Created on 2019-11-04 04:48 by 123 wlpha, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
interpreterlock.zip Damien LEFEVRE, 2019-11-21 12:50
Messages (9)
msg355921 - (view) Author: 123 wlpha (123 wlpha) Date: 2019-11-04 04:48
PyGILState_Release does not really release gil, causing the next PyGILState_Ensure deadlock, now you need to call if (PyGILState_Check () > 0) {PyEval_SaveThread ();}


```
auto gil = PyGILState_Ensure();

// call c api

PyGILState_Release (gil);
if (PyGILState_Check () > 0) {
   PyEval_SaveThread ();
}
    
```
msg355922 - (view) Author: 123 wlpha (123 wlpha) Date: 2019-11-04 04:55
Usually in the python extension, only the python function needs to be written between Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW THREADS, but the c callback function needs PyGILState_Ensure and PyGILState_Release, but PyGILState_Releas cannot release gil correctly.
msg355943 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-11-04 14:25
What's the value of PyGILState_Check() before the call to PyGILState_Ensure()? 

PyGILState_Ensure() and PyGILState_Release() should bracket code where you're not sure about the current GIL state, but need the GIL.  The Release function only undoes the action of the Ensure function, and won't release the GIL if the GIL was held before the call to the Ensure function.
msg356011 - (view) Author: Damien LEFEVRE (Damien LEFEVRE) Date: 2019-11-05 08:28
I see the same problem.

We call python functions from C++ and use this locking class within each call:

'''
#pragma once
#ifdef _DEBUG
    #undef _DEBUG
    #include <Python.h>
    #define _DEBUG
#else
    #include <Python.h>
#endif // _DEBUG

#include "gcc_helper.h"

GCC_DIAG_OFF(maybe-uninitialized)

class GIL_lock
{
public:
    GIL_lock()
    {
        m_wasPreviouslyLockedByThisThread = PyGILState_Check() == 1;

        // If already locked, we shouldn't lock again.
        if (m_wasPreviouslyLockedByThisThread == false)
        {
            m_state = PyGILState_Ensure();
        }
    }

    ~GIL_lock()
    {
        release();
    }

    void release()
    {
        // Release only if it wasn't locked previously and wasn't released during lifetime.
        if (m_released == false && m_wasPreviouslyLockedByThisThread == false)
        {
            PyGILState_Release(m_state);
        }

        m_released = true;
    }

private:
    PyGILState_STATE m_state;
    bool m_released = false;
    bool m_wasPreviouslyLockedByThisThread = false;
};
'''

Example:
'''
void PythonInterpreterPrivate::setPythonHome(const QString& path)
{
    GIL_lock gilLock;

#ifdef _WIN32
    wchar_t* pyHome = new wchar_t[MAX_PATH];
#else
    wchar_t* pyHome = new wchar_t[PATH_MAX];
#endif
    QDir pythonDir(path);

    if (pythonDir.exists())
    {
        path.toWCharArray(pyHome);
        pyHome[path.size()] = 0;
        Py_SetPythonHome(pyHome);
    }
}
'''

With Python3.6, the first instance of GIL_lock in the main thread goes to PyGILState_Ensure(). With Python3.7, still in the main thread PyGILState_Check() reports the lock is already taken while we don't take it.

In a new thread, with Python3.7 GIL_lock() goes to PyGILState_Ensure() and deadlocks there. With Python3.6 PyGILState_Ensure() returns the state.
msg356012 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-11-05 09:00
I don't think this is a bug:

- PyGILState_Ensure() acquires the GIL when it is not already acquired (and can be called without checking).  

It cannot acquire the GIL when some other thread has already taken the GIL, but wait until the GIL is released.  The GIL will get released periodically when running Python code.  

If the thread holding the GIL is primarily running native code that doesn't use the Python API it should release the GIL (for example by bracketing the long-running C code with Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS)


- PyGILState_Release() restores the state from before the call to PyGILState_Ensure().
msg356018 - (view) Author: Damien LEFEVRE (Damien LEFEVRE) Date: 2019-11-05 10:47
@ronaldoussoren

The issue here is that the behavior between Python 3.6 and 3.7 has changed.

Our code runs perfectly fine with 3.6. We release the lock each time our functions get out of scope and so we know for sure there is no lock left behind.

Starting with 3.7 GIL is acquired when we start the application, but not by us, maybe internally?, and never gets released; thus the deadlock.
msg356059 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-11-05 20:29
I'm afraid it will be close to impossible to determine what's going on without a reproducer. 

There's not enough context in the issue to understand the report, in particular what the main thread is doing and how that's releasing the GIL. 

BTW. The code in msg356011 is too complex, it can be simplified to:

// START

class GIL_lock
{
public:
    GIL_lock()
    {
        m_state = PyGILState_Ensure();
    }

    ~GIL_lock()
    {
       PyGILState_Release(m_state);
    }

private:
    PyGILState_STATE m_state;
};

// END

This is always safe, even if the thread currently owns the GIL. This will block when another thread owns the GIL, and will block forever when that other thread is not running Python code (and will therefore not release the GIL)

Is the problem also present with Python 3.8, or only 3.7? 

And for what it is worth, I don't have problems in my own code that uses these APIs.
msg356961 - (view) Author: Damien LEFEVRE (Damien LEFEVRE) Date: 2019-11-19 10:54
Back to testing this.

I have the same issue on Linux, i.e: the deadlock on PyGILState_Ensure.

I cannot test 3.8 because many of the modules are not yet available for this version.

I'll try making a small test application to reproduce the issue.
msg357165 - (view) Author: Damien LEFEVRE (Damien LEFEVRE) Date: 2019-11-21 12:50
Here is a code example reproducing the issue.
History
Date User Action Args
2022-04-11 14:59:22adminsetgithub: 82861
2019-11-21 12:50:43Damien LEFEVREsetfiles: + interpreterlock.zip

messages: + msg357165
2019-11-19 10:54:58Damien LEFEVREsetmessages: + msg356961
2019-11-05 20:29:41ronaldoussorensetmessages: + msg356059
2019-11-05 10:47:24Damien LEFEVREsetmessages: + msg356018
2019-11-05 09:00:14ronaldoussorensetmessages: + msg356012
2019-11-05 08:28:30Damien LEFEVREsetnosy: + Damien LEFEVRE
messages: + msg356011
2019-11-04 14:25:10ronaldoussorensetnosy: + ronaldoussoren
messages: + msg355943
2019-11-04 04:55:37123 wlphasetmessages: + msg355922
2019-11-04 04:48:28123 wlphacreate