Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions in Tkinter with non-threaded Tcl #77438

Open
native-api mannequin opened this issue Apr 10, 2018 · 30 comments
Open

Race conditions in Tkinter with non-threaded Tcl #77438

native-api mannequin opened this issue Apr 10, 2018 · 30 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes topic-tkinter type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@native-api
Copy link
Mannequin

native-api mannequin commented Apr 10, 2018

BPO 33257
Nosy @terryjreedy, @roseman, @serhiy-storchaka, @native-api
PRs
  • bpo-33257: Fix race conditions for non-threaded Tcl #6444
  • [2.7] bpo-33257: Fix race conditions for non-threaded Tcl (GH-6444) #6972
  • bpo-33479: Remove unqualified tkinter threadsafe claim. #6990
  • Files
  • TkinterCrash2-2.py: Py2 version, crashing
  • TkinterCrash3-2-2.py: Py3 version, working
  • TkinterHanders.py
  • TkinterHanders3.py
  • 0001-Build-non-threaded-debug-Tcl.zip: For testing only! Build master (win64) with nonthreaded debug Tcl
  • 0001-build-2.7-with-threaded-Tcl.patch: For testing only! Build 2.7 (win64) with threaded Tcl
  • test_threads_tk.py: reproduce hang when creating multiple Tk's
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-04-10.16:41:09.432>
    labels = ['3.8', '3.7', 'expert-tkinter', 'type-crash']
    title = 'Race conditions in Tkinter with non-threaded Tcl'
    updated_at = <Date 2018-10-27.23:10:44.251>
    user = 'https://github.com/native-api'

    bugs.python.org fields:

    activity = <Date 2018-10-27.23:10:44.251>
    actor = 'terry.reedy'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Tkinter']
    creation = <Date 2018-04-10.16:41:09.432>
    creator = 'Ivan.Pozdeev'
    dependencies = []
    files = ['47529', '47530', '47536', '47561', '47565', '47566', '47620']
    hgrepos = []
    issue_num = 33257
    keywords = ['patch']
    message_count = 29.0
    messages = ['315172', '315271', '315283', '315285', '315286', '315304', '315878', '315899', '316087', '316089', '316092', '316093', '316095', '316096', '316097', '316098', '316099', '316100', '316103', '316105', '316106', '316241', '316265', '316418', '316451', '316689', '316921', '317875', '328662']
    nosy_count = 5.0
    nosy_names = ['terry.reedy', 'markroseman', 'PythonInTheGrass', 'serhiy.storchaka', 'Ivan.Pozdeev']
    pr_nums = ['6444', '6972', '6990']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue33257'
    versions = ['Python 2.7', 'Python 3.6', 'Python 3.7', 'Python 3.8']

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented Apr 10, 2018

    (Marked only 2.7 as affected but this would affect any branch if built with nonthreaded Tcl.)

    When running the attached TkinterCrash2-2.py repeatedly with 2.7.14 and 2.7 head, win7 x64, two kinds of errors pop up randomly:

    1. Crashes and freezes.
    2. Exceptions on the console like:
    Exception in thread Thread-14:
    Traceback (most recent call last):
      File "C:\Users\Sasha\Documents\cpython\lib\threading.py", line 801, in __boots
    trap_inner
        self.run()
      File "../tkt/TkinterCrash2-2.py", line 50, in run
        self.deliverToqueue((self.target, z, y))
      File "../tkt/TkinterCrash2-2.py", line 133, in arrival_122
        new_yz[1])
      File "C:\Users\Sasha\Documents\cpython\lib\lib-tk\Tkinter.py", line 2328, in create_line
        return self._create('line', args, kw)
      File "C:\Users\Sasha\Documents\cpython\lib\lib-tk\Tkinter.py", line 2310, in _create
        *(args + self._options(cnf, kw))))
    ValueError: invalid literal for int() with base 10: 'None'

    Running the same code with the minimal required changes (attached as TkinterCrash3-2-2.py) under 3.6 (same platform) goes without any errors.

    Diagnostics showed:

    1. Under debug Python, a crash became an MSVC double-free assertion error. The stacktrace is: Tkapp_Call->Tkapp_CallDeallocArgs->Tcl_DecrRefCount->TclFreeObj->free

    2. The exceptions are caused by a <canvas> create line Tk call randomly returning "None" (a string) instead of an integer -- which it should never do according to its doc. Since it happens inconsistently, this also suggests a race condition.

    3. In Tkapp_Call and SetVar, Tcl lock isn't held when creating/destroying Tcl objects for the call. The underlying fns modify the global state (free objects list, reference counters), so it should be held.
      These two are the only such places in the code.

    I've fixed this, will file the PR shortly. Holding the lock when calling Tkapp_CallDeallocArgs in Tkapp_Call eliminated the crashes and freezes, locking objects' creation in the same fn decreased the number of exceptions greatly, and locking creation in SetVar, too, eliminated them completely.

    I did not check if objects created with AsObj in SetVar need to be disposed of like in Tkapp_Call.

    Also have no idea how to autotest the fix. If someone does, I'm all ears.

    @native-api native-api mannequin added topic-tkinter type-crash A hard crash of the interpreter, possibly with a core dump labels Apr 10, 2018
    @serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life 3.8 only security fixes labels Apr 11, 2018
    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented Apr 13, 2018

    My best idea for a test as of now is to run the example ~20 times in a loop (or continuously for a comparable amount of time & threads) and catch and register any exceptions in all threads as per https://stackoverflow.com/questions/18349614/check-if-a-python-thread-threw-an-exception .

    @terryjreedy
    Copy link
    Member

    I believe I have seen this exact error message within the last few months, and I found one example on SO. https://stackoverflow.com/questions/21209124/weird-error-that-comes-up-during-python-program-run

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented Apr 14, 2018

    @terry.reedy Apparently, this bug has gone unnoticed for years (to be precise, since b5bfb9f in 2002),
    and there are all kinds of weird rumors circulating about whether tkinter is thread-safe or not, and what can be run where ( see e.g. https://stackoverflow.com/search?q=tkinter+thread+safe ).
    Having run into it in my project, I set out to get to the bottom of this ( RedFantom/mtTkinter#5 ) ...and looks like I did.

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented Apr 14, 2018

    Wait a second... I think I noticed another similar bug.

    In Tkapp_CallProc. Same case, Tkapp_CallArgs and Tkapp_CallDeallocArgs are called when not holding both locks.

    The attached example doesn't use Python event handlers, so it didn't manifest itself in my tests.

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented Apr 14, 2018

    Attached reproducing example for event handlers. If launching more than one EventThread, it abort()'s immediately.

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented Apr 29, 2018

    Finally dug up the reason but unsure how to fix it.

    Here's the trace. The format is `$TID: $FUNCTION [some of the args]'

    0x1A34: Tkapp_Call(_object *, _object *)
    0x1A34: PythonCmd(void *, Tcl_Interp *, int, const char * *) 0x000000000230f430 0x0000000002d297d0 "47566760dummy_handler" 0x0000000003a28580 "5" 0x0000000003a29140 "??" 0x0000000003a29140 "??"
    0x1D7C: Tkapp_Call(_object *, _object *)
    0x1D7C: PythonCmd(void *, Tcl_Interp *, int, const char * *) 0x000000000230f430 0x0000000002d297d0 "47566760dummy_handler" 0x0000000003a28580 "5" 0x0000000003a29140 "??" 0x0000000003a29140 "??"
    0x1A34: PythonCmd(void *, Tcl_Interp *, int, const char * *) setting result
    0x1A34: PythonCmd(void *, Tcl_Interp *, int, const char * *) exit: 0
    TclStackFree: incorrect freePtr (0000000002BA6430 != 0000000002BA6A20). Call out of sequence?python_d.exe has triggered a breakpoint.

    A second PythonCmd is started when the 1st one is still in progress, then the 1st one returns, the Tcl interpreter tries to unwind the topmost stack frame, and it's the wrong one.

    Denying other PythonCmd is out of question (a nested cmd deadlocks). Waiting for them to finish, either (would wait for potentially unlimited time).

    Look like the only way is to rearrange Tcl stack frames so that the right one is on top. (Since Tkinter's "interpreter calls" (sequences of Tcl calls done while holding the lock) are supposed to be independent from one another, it doesn't really matter which order the frames are in.)

    I seriously doubt that's possible with just the public interface though.

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented Apr 29, 2018

    Look like the only way is to rearrange Tcl stack frames so that the right one is on top.

    Scratch that. Tkinter allows to execute entire scripts where the order of the frames is important.

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented May 2, 2018

    Here are the possible courses of action to fix this.

    All require design changes, thus can be shot down on a whim. So I'll ask for advice and approval at python-dev first, to be sure that my efforts won't be scrapped.

    First, some terminology I'll be using:

    • A "(Tcl interpreter) call" (in quotes) is a sequence of interdependent calls into Tcl that a Tkinter routine makes in the course of its action.
    • Outstanding "calls" are those from other threads that have not started yet and are currently waiting for the Tcl lock.
    • A "downstream" "call" is a call made by a running PythonCmd's Python payload.
    • Each call to PythonCmd has a "start" -- before releasing the Tcl lock -- and an "end" -- when it reacquires it to Tcl_SetObjResult() and return
    • A PythonCmd is "active" when it's before its "start" and "end"

    Now, the fix options proper:

    • At the "end", rearrange the Tcl interpreter's stack frames so that the current PythonCmd's one is on top (effectively, make it look as if independent "calls" use independent Tcl stacks. Theoretically possible.)
      • Confirmed this to be impossible with Tcl's public API
    • Prohibit other `PythonCmd's while one is already "active", except its "downstream" ones. Options:
      • At the "end", the old PythonCmd checks if there are others "calls" on top of it (will only happen if such "call" has another "active" PythonCmd on top: all other Tkinter routines hold the Tcl lock for their entire duration after the 33257 fix), wait if there are.
        • Problematic: this wait algorithm is biased against the old PythonCmd. Will lead to a potentially infinite wait if there are many threads making Tcl calls that result in PythonCmds.
      • Hold the Tcl lock for the entire duration of a PythonCmd. To allow "downstream" calls, make the Tcl lock reentrant.
        • Requires redesign of about two dozen functions in thread_*.c and pycond.h: there are only non-reentrant locks currently.
        • Behaviour change: wouldn't allow outstanding "calls" while a PythonCmd is "active". This change is transparent to Python code though.
      • Reuse the threaded versions of Tkapp_* routines: bind a nonthreaded interpreter to a thread and forward calls to it as events like in the threaded case
        • Problematic: visible behaviour change: tk.call()s would now hang without a running mainloop(). Currently, only event handlers don't fire.

    Rejected options:

    • The new PythonCmd somehow checks at "start" if another one is "active" and waits if it's not a descendant of it.
      • The new PythonCmd's Tcl interpreter frame has already been allocated when it gets control, so the old one cannot return before the new one in any case.

    @gvanrossum
    Copy link
    Member

    So this issue doesn't occur when linked with a thread-aware Tcl/Tk, right? Maybe we should just make sure that's the only configuration we ensure?

    @Vano
    Copy link
    Mannequin

    Vano mannequin commented May 2, 2018

    So this issue doesn't occur when linked with a thread-aware Tcl/Tk,
    right? Maybe we should just make sure that's the only configuration we
    ensure?

    This would break compatibility, including some usage patterns (see the
    "Reuse the threaded versions" option above).
    Can this actually be considered for anything short of the next major
    release?
    While official Windows releases use a pocket version, POSIX ones would
    probably link against whatever a distribution provides.

    @terryjreedy
    Copy link
    Member

    I reproduced the error on installed 2.7.14, running tk 8.5.18.

    I ran TkinterCrash3-2-2.py 15 times on 64-bit Win10, installed x64 3.6.5, with tk 8.6.8, and experienced no crashes. I ran it another 10 times with 32-bit repository 3.8 debug and no crashes. My conclusion: this is at least partly a tcl/tk issue fixed in current tcl/tk.

    "this would affect any branch if built with nonthreaded Tcl." If python is delivered (on Windows and Mac) with 'threaded tcl' and rebuilding with 'nonthreaded tcl' causes a problem with tkinter, then one should not do that. In practice, essentially no one does so.

    Serhiy, do you know what type of tcl/tk is distributed on Linux?

    @terryjreedy
    Copy link
    Member

    As I reported on the pydev thread "bpo-33257: seeking advice & approval on the course of action", Firefox cannot find the string 'thread' in the tkinter chapter of the official doc. Nor in the tkinter.ttk chapter. IDLE does not find it in tkinter.__init__. So the claim of 'thread-safe' seems not to be 'official'.

    As I understand 'crash', arrival_122, which calls Canvas.create_line, get called in each thread by Track.run. On the other pydev thread, Mac expert Ronald Oussorn said "at least on macOS calling GUI methods in Apple’s libraries from secondary threads is unsafe unless those methods are explicitly documented as thread-safe."

    Maybe we should officially say that tkinter/tk/os gui calls from secondary threads is chancey, even while we try to make it less so.

    Ivan, as I also said there, it is not clear to me, given your subsequent comments, what you consider to be the status of the PR.

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented May 3, 2018

    I ran TkinterCrash3-2-2.py 15 times on 64-bit Win10, installed x64 3.6.5, with tk 8.6.8, and experienced no crashes.

    I wrote in the initial message that this bug only happens with nonthreaded Tcl, regardless of Python version and Tcl version.

    I've built Py2 with threaded Tcl 8.5.19.0 just to make sure, and indeed, it went away.

    @terryjreedy
    Copy link
    Member

    Rerunning 2.7, I sometimes get the following almost immediately without clicking [launch].

    Exception in thread Thread-1:
    Traceback (most recent call last):
      File "C:\Programs\Python27\lib\threading.py", line 801, in __bootstrap_inner
        self.run()
      File "f:/python/a/tem.py", line 50, in run
        self.deliverToqueue((self.target, z, y))
      File "f:/python/a/tem.py", line 133, in arrival_122
        new_yz[1])
      File "C:\Programs\Python27\lib\lib-tk\Tkinter.py", line 2322, in create_line
        return self._create('line', args, kw)
      File "C:\Programs\Python27\lib\lib-tk\Tkinter.py", line 2310, in _create
        *(args + self._options(cnf, kw))))
    TclError: bad option "97380032LrunLocal": must be addtag, bbox, bind, canvasx, canvasy, cget, configure, coords, create, dchars, delete, dtag, find, focus, gettags, icursor, index, insert, itemcget, itemconfigure, lower, move, postscript, raise, scale, scan, select, type, xview, or yview

    An automated test would need to fail more consistently, without needing to click. Running faster would also be good (and perhaps increase failure rate). The the test would be that the file runs in subprocess without an error (perhaps multiple times). There may be a test.support function for this.

    The TkinterCrash files currently have way too many comments for a non-beginner audience.

    The t

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented May 3, 2018

    Ivan, as I also said there, it is not clear to me, given your subsequent comments, what you consider to be the status of the PR.

    The PR fixes the problem exposed by TkinterCrash2-2.py and TkinterCrash3-2-2.py and only lacks an autotest (I asked for any ideas for it and got no response).

    Then I found another related bug (https://bugs.python.org/issue33257#msg315286 ) and though to fix it as well -- but it turned out to be deeper than expected.

    I'd be fine to commit the current fix and work on the second bug separately.

    @terryjreedy
    Copy link
    Member

    You did not say 'never with threaded tcl'. You also claimed that tkinter is so broken, with either threaded or non-threaded tcl/tk, that tkinter should be trashed. Anyway, I like to test things for myself.

    If 'never with threaded tcl' is true, then the crashes I observed indicate that CPython's current 2.7 windows installer is installing non-threaded 8.5.18. (I have no idea if or how I could check.) It would seem then, that we should change to threaded 8.5.18 before 2.7.16. (2.7.15 was just released). I am surprised since the Windows installers were in Martin Loewis' charge until sometime after 2.7 was released, and he is the one who claimed that tkinter should be thread-safe.

    @terryjreedy
    Copy link
    Member

    Is threaded tcl just a compile switch, as with CPython?

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented May 3, 2018

    An automated test would need to fail more consistently, without needing to click. Running faster would also be good (and perhaps increase failure rate). The the test would be that the file runs in subprocess without an error (perhaps multiple times). There may be a test.support function for this.

    Thanks for the ideas. I'm on it.

    Was thinking of just wrapping the logic with unittest machinery but for a subprocess, it's easier to test stderr for exceptions.

    You also claimed that tkinter is so broken, with either threaded or non-threaded tcl/tk

    https://bugs.python.org/issue33412

    CPython's current 2.7 windows installer is installing non-threaded 8.5.18. (I have no idea if or how I could check.)

    You can check this by seeing that the Tcl/Tk DLLs lack the 't' suffix.

    It would seem then, that we should change to threaded 8.5.18 before 2.7.16. (2.7.15 was just released).

    https://bugs.python.org/issue33257#msg316092

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented May 3, 2018

    Is threaded tcl just a compile switch, as with CPython?

    Yes.

    In tcl.vcxproj, tk.vcxproj and tix.vcxproj, there are parameters named like TclOpts that are passed as OPTS arg to the lib's makefiles. For Tcl 8.5, you need to pass "threads" to build with threads; for 8.6, "nothreads" to build without.

    Also, in tcltk.props, the resulting DLL/LIB names are hardcoded with or without the 't' suffix. You'll need to fix that if you wish to switch the Tcl flavor.

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented May 3, 2018

    ... Finally, since https://bugs.python.org/issue30916, the 3.x Windows build uses a binary external and needs even more patching to build and link against a debug and/or custom Tcl/Tk (too long to describe, can give a patch).

    @terryjreedy
    Copy link
    Member

    This issue is effectively a re-opening and continuation of bpo-11077. TkinterCrash2(3)-2.py are altered versions of Serhiy's file of the same name on the old issue. (Ivan, you really should have said all this at the beginning of this issue.)

    Serhiy, you wrote "If there no other errors in TkinterCrash2-2.py, this issue can be closed." When I ran your file on Win10 with 2.7.15 multiple times, I got messages like one or more of the following about half the runs.

    Exception in thread Thread-1:
    Traceback (most recent call last):
      File "C:\Programs\Python27\lib\threading.py", line 801, in __bootstrap_inner
        self.run()
      File "f:/dev/tem/thread_event27.py", line 50, in run
        self.deliverToQueue((self.target, z, y))
      File "f:/dev/tem/thread_event27.py", line 133, in arrival_122
        new_yz[1])
      File "C:\Programs\Python27\lib\lib-tk\Tkinter.py", line 2322, in create_line
        return self._create('line', args, kw)
      File "C:\Programs\Python27\lib\lib-tk\Tkinter.py", line 2310, in _create
        *(args + self._options(cnf, kw))))
    TclError: unknown or ambiguous item type "87414752LtickHook"

    Exception in Tkinter callback
    TclStackFree: incorrect freePtr. Call out of sequence?

    This application has requested the Runtime to terminate it in an unusual way.
    Please contact the application's support team for more information.
    Traceback (most recent call last):
      File "C:\Programs\Python27\lib\lib-tk\Tkinter.py", line 1541, in __call__
        return self.func(*args)
    TypeError: callit() takes no arguments (6 given)

    @PythonInTheGrass
    Copy link
    Mannequin

    PythonInTheGrass mannequin commented May 7, 2018

    7 years and counting...
    My need for a fix is long gone, but I'd like to be able to tell the original group I worked with whether it's now safe to use tkinter from threads. It looks like my original guesses were validated and a fix has been made, but I can't tell if it's universally available. What version of Python would they need to be on, and are any platforms still misbehaved "out of the box"? They will not be willing to rebuild python itself. TIA.

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented May 12, 2018

    The 1st PR is ready.
    I'll continue working on the 2nd one in this ticket, too, if there are no objections.

    @terryjreedy
    Copy link
    Member

    I opened bpo-33479 Document tkinter and threads, as a spinoff of both this issue and bpo-11077, msg183774.

    @terryjreedy
    Copy link
    Member

    Mark, would you review or at least comment on PR 6444? Your comments in msg316669 on bpo-33479 suggest that you have the requisite knowledge.

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented May 17, 2018

    2nd fix ready.

    @DevTeam The issue is fixed completely now, the fix can be merged.
    Pushed to the same PR, see #6444 (comment) why.

    Used the option "Hold the Tcl lock for the entire duration of a PythonCmd. To allow "downstream" calls, make the Tcl lock reentrant." from https://bugs.python.org/issue33257#msg316087 as it's the cleanest fix.

    @scott M, you can help here by doing a code review.

    @native-api
    Copy link
    Mannequin Author

    native-api mannequin commented May 28, 2018

    Found yet another race condition -- when creating multiple Tk()s. Attached example (loops indefinitely inside Tcl: some internal list becomes circular for some reason).

    The obvious fix by wrapping Tcl calls with locks in Tkapp_New doesn't fix it -- must be something else.

    Since creatng multiple Tk's is an uncommon use case, this can wait.

    @terryjreedy
    Copy link
    Member

    Serhiy, if you now think that we should try to fix _tkinter to actually support non-thread tcl builds, please review the two PRs on this issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @terryjreedy
    Copy link
    Member

    Serhiy, Given that 8.6 is compiled with thread support by default and that we (and most everybody) keeps the default and that a year ago only 1 *nix distribution was known to only supply tcl/tk 8.5, should be close this?

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes topic-tkinter type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    Status: No status
    Development

    No branches or pull requests

    3 participants