classification
Title: tkinter objects garbage collected from non-tkinter thread cause crash
Type: crash Stage:
Components: Tkinter Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: epaine, gpolo, obserience, pitrou, serhiy.storchaka
Priority: normal Keywords:

Created on 2019-12-19 02:56 by obserience, last changed 2020-06-18 10:04 by epaine.

Files
File name Uploaded Description Edit
error_case.py obserience, 2019-12-19 02:56
quit_behavior.py obserience, 2019-12-19 16:36
Messages (6)
msg358652 - (view) Author: obserience (obserience) Date: 2019-12-19 02:56
All tkinter objects have a reference to the TCL interpreter object "self.tk". The current cleanup code does not remove these when a widget is destroyed.

Garbage collection of the TCL interpreter object occurs only after all gui objects are garbage collected. This may  be triggered from another thread causing TCL to panic and trigger a core dump.

Error message:
>Tcl_AsyncDelete: async handler deleted by the wrong thread
>Aborted (core dumped)

Adding:
"self.tk = None"
to the end of Misc.destroy() (tkinter/__init__.py line:439) should fix this by removing these reference when widgets are destroyed. (Note:destroy is recursive on the widget tree and called on the root object when a Tkinter GUI exits)

I can't see any problem with removing the interpreter object from a widget when it is destroyed. There doesn't seem to be any way to reassign a widget to a new parent so this shouldn't affect anything.

Doing this makes it safe(r) to use tkinter from a non-main thread since if the GUI cleans up properly no "landmines" are left to cause a crash when garbage collected in the wrong thread.
msg358654 - (view) Author: obserience (obserience) Date: 2019-12-19 03:25
A quick google search for "Tcl_AsyncDelete: async handler deleted by the wrong thread" yields plenty of users complaining about this error case. Usually they're trying add a gui to an existing application. They run it in a thread other than main and then leak a reference to a GUI widget into another thread. The GUI exits and then some time later, the last reference to a widget dissapears but this is triggered by the leaked reference and occurs in another thread context. The TK library panics and the application crashes.

https://github.com/matplotlib/matplotlib/issues/12085
https://github.com/gboeing/osmnx/issues/75
https://stackoverflow.com/questions/27073762/tcl-asyncdelete-error-multithreading-python
(and so on)

The response time and time again is just "you can't run tkinter in any thread but main". Hopefully if this fix is implemented, that becomes less true.
msg358685 - (view) Author: obserience (obserience) Date: 2019-12-19 16:36
My proposed fix isn't compatible with a lot of current Tkinter using code. It was a naive first attempt and breaks existing code and doesn't completely solve the problem.

Variables currently retain a reference to the TCL interpreter self._tk and aren't linked into the widget tree or otherwise accessible to any cleanup logic.

Existing code:
-may expect the self.tk attribute of destroyed widgets to still be set

Existing code (after calling root.destroy()):
-calls methods like Tk.quit() which runs Tcl commands requiring self.tk to still be there
--Idle does this "# (Needed for clean exit on Windows 98)" <--apparently
---is this still the case??
-may call get() and set() on Variable instances (this currently works)


Given this it might be better to track all references to the TCL interpreter in a separate data structure and add a Tk.cleanup() method that gets rid of them all with documentation indicating all calls after this will fail. Alternate implementation of this deallocates the TCL interpreter in the _tkinter.c bindings with the same results.

One of the core issues here is that the cause of the crash is asynchronous signal handlers installed by the TCL library and necessary for the TCL interpreter to run at all. If you can run TCL commands, the crash can still be triggered.

###separate issue###
The Tk.Quit() doc is wrong. Current inline docs say that it destroys all widgets, which it does not. All it seems to do is cause mainloop to return after which mainloop can be called again and the GUI resumes functioning. This has caused problems when the main thread doesn't exit since the GUI continues to exist but is frozen. (See quit_behavior.py)
msg371536 - (view) Author: E. Paine (epaine) * Date: 2020-06-15 10:36
My initial thoughts were just to give the standard lecture on why "tkinter + threads = bad". However, looking again at the problem, the async handler error causes the Python interpreter to crash, which, no matter how frowned upon the code is, shouldn't happen.

Ultimately, the issue is that a Tkapp object is being deleted in any thread other than the one it was created in. I believe there are two parts to the solution, both of which are for the Tkapp object in the _tkinter module (as there is no guarantee that the user will go through the standard tkinter module):

1. If a Tkapp object is created in a thread other than the main and it is made global, it is deleted in the main thread as part of the interpreter finalisation, not when the thread is finalising. I propose that the Tkapp object can only be created in the main thread, helping with part 2.
2. The Tkapp object 'refuses' to delete in a thread other than main, and instead gets collected on interpreter finalisation if needed (in the main thread).

I don't know how to implement this, but if someone has some ideas, I am more than willing to help solidify them into a PR.

Antoine, I hope you are the right person to add to the nosy for this (apologies if not).
msg371773 - (view) Author: obserience (obserience) Date: 2020-06-17 19:59
I looked into this in greater depth a while back and never followed up.

In "Modules/_tkinter.c"  the Tkapp_Dealloc() function has a commented out check that would ensure it isn't called from the wrong thread.

Tkapp_Dealloc(PyObject *self)
{
    PyObject *tp = (PyObject *) Py_TYPE(self);
    /*CHECK_TCL_APPARTMENT;*/
    ENTER_TCL
    Tcl_DeleteInterp(Tkapp_Interp(self));
    LEAVE_TCL
    PyObject_Del(self);
    Py_DECREF(tp);
    DisableEventHook();
}

If this is uncommented, garbage collection from a non-main thread will fail and the object will be leaked.
Adding a warning would be a good idea so the user knows that a memory leak has occurred.

**Doing this makes the code safe.** No more crashes. It should be done a bugfix regardless of anything else.

####################
>> I propose that the Tkapp object can only be created in the main thread

Restricting interpreter object creation/destruction to the main thread is very much an API change that would break any code that puts the interpreter in a non-main thread.

Since we're keeping the objects around till the interpreter exits we still "leak" the memory and calling the destructor before the process exits has no benefits. Directly leaking the memory isn't any worse in terms of memory consumption and doesn't break currently working code.

####################
My opinion is we should leak the memory, warn the user and forget about it. TCL interpreters are tiny and anyone who was doing this before was crashing their application anyway.

A nice to have would be changing the python tkinter code to cleanup the references when the associated GUI exits (as it arguably should)
--root.quit() doesn't do what the docs say and is regularly abused(ignore it)
--root.destroy() is more clear cut and a better candidate for adding cleanup behavior

implementations:
1)explicitly manage multiple references to the interpreter removing when appropriate (messy)
  --remove widget.tk refs on widget.destroy()
  --remove variable._tk when root.destroy() called (requires tracking variables with weakrefs)
2)replace root.tk with a weakproxy at initialisation and everything references the weakproxy (no other code changes)
3)replace widget.tk and variable._tk with a property that fetches root.tk
  --no weakrefs or explicit cleanup needed
  --**I favor this options**

For all implementations we need code to handle post-cleanup tk retrieval failure:
--Calls to root.quit() and other NOP equivalent methods should be ignored
--Variable sets should be ignored but trigger a warning
--Variable reads should RuntimeError since the TCL interpreter is dead

The user can shoot themselves in the foot by grabbing an explicit reference to the TCL interpreter but I've never seen that done and now it only causes a memory leak (which we can warn about).

Again, the python changes are nice to have but the "Modules/_tkinter.c" "CHECK_TCL_APPARTMENT;" macro needs to be uncommented.

I'd be happy to do the python side-changes if that's likely to be accepted.
msg371804 - (view) Author: E. Paine (epaine) * Date: 2020-06-18 10:04
> ... would break any code that puts the interpreter in a non-main thread
Have people not been warned enough?! But seriously, looking anywhere on Google will scream at you to stop using using threads with tkinter, let alone having the interpreter in a thread.

I am not sure I agree with your statement that we would be keeping the interpreter exits, as that was only a 'fallback' if a delete was attempted in a thread. However, I agree with you that we should avoid breaking existing code where possible and so such a change shouldn't be added.

As for uncommenting CHECK_TCL_APPARTMENT in Tkapp_Dealloc, I looked at the blame and found that it was added as a comment:
https://github.com/python/cpython/commit/b5bfb9f38c786c3330b2d52d93b664588c42283dbecaus
I don't know the reason for adding it commented, but it is possible Martin couldn't find a situation in which it was required. My only problem with uncommenting it would be that I don't understand how the GC works. I suspect it would not be a problem, but if the error caused the GC to halt collecting, we would be in exactly the same situation (though I doubt this would be true).

> It should be done a bugfix regardless of anything else.
Agreed. Such a simple change to stop the interpreter crashing is definitely beneficial.

As for actually fixing the root of the problem, I suspect any of the changes you proposed, similar to mine, would be too controversial and not really go anywhere (being fairly large changes to fix just a handful of situations).
History
Date User Action Args
2020-06-18 10:04:23epainesetmessages: + msg371804
2020-06-17 19:59:31obseriencesetmessages: + msg371773
title: tkinter objects garbage collected from non-tkinter thread cause panic and core dump -> tkinter objects garbage collected from non-tkinter thread cause crash
2020-06-15 10:36:14epainesetnosy: + pitrou, epaine

messages: + msg371536
versions: + Python 3.7, Python 3.8, Python 3.9, Python 3.10
2019-12-19 16:36:27obseriencesetfiles: + quit_behavior.py

messages: + msg358685
2019-12-19 05:28:19xtreaksetnosy: + gpolo, serhiy.storchaka
2019-12-19 03:25:41obseriencesetmessages: + msg358654
2019-12-19 02:56:56obseriencecreate