classification
Title: _tkinter PythonCmd fails to acquire GIL
Type: crash Stage: patch review
Components: Tkinter Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: epaine, paul.moore, serhiy.storchaka, speleo3, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2020-03-26 12:16 by speleo3, last changed 2020-09-29 17:45 by speleo3.

Files
File name Uploaded Description Edit
qt_tk_demo.py speleo3, 2020-03-26 12:16 Minimal example
simple_tk_demo.py speleo3, 2020-07-30 07:30 Minimal example with only stdlib
tstate_acquire.diff epaine, 2020-07-30 09:56
Pull Requests
URL Status Linked Edit
PR 19178 closed speleo3, 2020-03-26 16:18
PR 22453 open epaine, 2020-09-29 17:11
Messages (8)
msg365064 - (view) Author: Thomas Holder (speleo3) * Date: 2020-03-26 12:16
The attached demo application runs a Tkinter GUI and a PyQt GUI in the same thread. PyQt owns the main loop and keeps updating the Tkinter instance by calling `update()`.

On Windows, when binding a "<Configure>" event, resizing the Tk window will lead to a crash:

```
Fatal Python error: PyEval_RestoreThread: NULL tstate

Current thread 0x00001f1c (most recent call first):
  File "qt_tk_demo.py", line 50 in <module>
```

This crash happens in `_tkinter.c` in `PythonCmd` inside the `ENTER_PYTHON` macro.

The issue can be fixed by using `PyGILState_Ensure` and `PyGILState_Release` instead of the `ENTER_PYTHON` macro inside the `PythonCmd` function.
msg374567 - (view) Author: E. Paine (epaine) * Date: 2020-07-29 09:21
Sorry for taking so long to do something with this issue. Can you please explain *why* (mostly because I don't really have a clue) acquiring the GIL fixes this crash (I am not disputing that it does - I have compared the current 3.9 branch against yours and confirmed this). Also, can it be reproduced with just the stdlib (so we can write a unittest for this)? Is it an assumption in QT about the GIL rather than a problem with _tkinter?
msg374587 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-07-29 21:25
At a glance, it looks like ENTER_PYTHON will *restore* the GIL on the current thread, but it may be coming in on a thread that's never seen the GIL before.

"The GIL" is actually the Python thread state, which is actually a per-thread data structure that's either active/locked or inactive/unlocked. If the current thread doesn't have a thread state, PyGILState_Ensure will create one, while ENTER_PYTHON will not.

So the underlying issue is probably that the callbacks are coming in from a thread that they shouldn't be, and really ought to be marshalled back into the correct event loop first.
msg374602 - (view) Author: Thomas Holder (speleo3) * Date: 2020-07-30 07:30
Attaching an even simpler example which doesn't use PyQt. It shows a native Windows dialog using ctypes. I could not reproduce the crash without showing any non-tk dialog.

Steve's explanation sounds plausible.
msg374608 - (view) Author: E. Paine (epaine) * Date: 2020-07-30 09:56
Thanks Steve for your explanation. I had a quick experiment with the ENTER_PYTHON definition and initially just added a call to PyThreadState_Get if the tstate was NULL. This still crashed the interpreter with the following error (which I think reaffirms Steve's explanation):

Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)

To work-around this error, I temporarily acquired the GIL and then released it after the call to PyThreadState_Get. The result worked correctly and can be found in the attached diff. I am not saying that the attached diff is the solution, but I don't think we need to stop using tstate in favour of the gstate introduced in the PR.

> I could not reproduce the crash without showing any non-tk dialog.

That would help explain why I cannot reproduce it on Linux - it is differences in the windowing API under Tcl not some platform-specific code in Tcl/Tk or _tkinter.
msg377672 - (view) Author: Thomas Holder (speleo3) * Date: 2020-09-29 15:55
I finally managed to test tstate_acquire.diff and it works perfectly! Should I close my pull request? Or update it with your patch?
msg377675 - (view) Author: E. Paine (epaine) * Date: 2020-09-29 16:55
I do think we should close the existing PR (TBH, I completely forgot about this issue). Briefly looking again at the issue, it does indeed look like the GIL hasn't seen the thread before the attempted call to restore the tstate (as Steve said). If its alright with you, Thomas, I will open a new PR with the attached patch so it can be formally reviewed (though you can open the PR if you would prefer).
msg377676 - (view) Author: Thomas Holder (speleo3) * Date: 2020-09-29 17:45
All good, thanks for opening PR 22453! I'll close my old PR.
History
Date User Action Args
2020-09-29 17:45:58speleo3setmessages: + msg377676
2020-09-29 17:11:18epainesetpull_requests: + pull_request21482
2020-09-29 16:55:45epainesetmessages: + msg377675
components: - Windows
2020-09-29 15:55:27speleo3setmessages: + msg377672
2020-07-30 09:56:02epainesetfiles: + tstate_acquire.diff

messages: + msg374608
2020-07-30 07:30:53speleo3setfiles: + simple_tk_demo.py

messages: + msg374602
2020-07-29 21:25:44steve.dowersetmessages: + msg374587
2020-07-29 09:21:29epainesetnosy: + epaine

messages: + msg374567
versions: + Python 3.8, Python 3.9, Python 3.10
2020-03-28 22:46:44terry.reedysetnosy: + serhiy.storchaka
2020-03-26 16:18:11speleo3setkeywords: + patch
stage: patch review
pull_requests: + pull_request18538
2020-03-26 12:16:09speleo3create