Created on 2012-03-22 20:48 by jcbollinger, last changed 2012-06-08 19:01 by r.david.murray.
|deadlocktest-0.2.tar.gz||jcbollinger, 2012-03-26 20:00||Tarball of a standalone test exhibiting the faulty behavior|
|reentrant-tkinter.patch||jcbollinger, 2012-06-08 17:04||Patch for source, tests, and docs||review|
|msg156619 - (view)||Author: John Bollinger (jcbollinger) *||Date: 2012-03-22 20:48|
This is the same as issue 452973, created as a new issue pursuant to the instruction given when 452973 was closed as "out of date". In a nutshell, in a program using combining Tkinter with Tcl callbacks written in C, it is possible for even a single-threaded program to deadlock. The case I ran into had these particulars: The main program is in Python, but it relies on a custom extension written in C. Through that extension, C callbacks are registered for various Tcl GUI events, and most of these invoke Python functions via Python's C API. Many of those Python functions invoke Tkinter methods. For example, many of the callbacks are bound to menu item activations, and these typically [try to] contruct a Tkinter dialog the first time they are called. What happens in practice is that the program starts fine, but the GUI freezes as soon as any menu item is activated that has one of the affected callbacks bound to it. Gdb and I are confident that the problem is as described in issue 452973: the program's single thread acquires TKinter's internal Tcl lock when the mouse event processing begins, and does not release it before control re-enters Python (there is no public API by which it can be made to do so). When the Python function invokes Tkinter methods, tkinter attempts to acquire the lock again, at which point it deadlocks because it holds the lock already. I encountered this issue on CentOS 6 (thus Python 2.6.6), but it appears that the problem is still present in the Python 3 trunk. I have flagged this issue only for version 2.6, however, because I cannot currently confirm that it affects later versions (see below regarding testing). I developed a patch against 2.6.6. It fixes the problem by allowing the Tcl lock to be acquired multiple times by any one thread (and requiring it to be released the same number of times before another thread can acquire it). That is perhaps technically inferior to creating public functions around _tkinter.c's ENTER_PYTHON and LEAVE_PYTHON macros, but it doesn't touch the public API. Even if new public functions were provided, the reentrant locking might still be a good fallback. The patch applies cleanly to the trunk, so probably also to every version between that and 2.6.6. I would be happy to contribute the patch, but I am a bit at a loss as to how to write an automated test for it because (1) such a test must depend on an extension module, and (2) test failure means causing a deadlock. Any advice as to whether such a patch would be considered, or as to how best to test it would be welcome.
|msg156768 - (view)||Author: Andrew Svetlov (asvetlov) *||Date: 2012-03-25 20:33|
Can you make a test code to introduce you issue? I understand — it's not easy to extract failing code from your big project but please make simple example with python code and trivial C Extension for presentation of your problem. Let's start from manual test can be reproduced by everyone. Also, please, make it for 'default' branch of python repo. It's possible to include bugfix for 2.7 but upcoming release the most important. Thank you.
|msg156861 - (view)||Author: John Bollinger (jcbollinger) *||Date: 2012-03-26 20:00|
I was already working on a standalone test, and now I have it ready. Using it I can demonstrate the issue against both the cpython trunk and against my local v2.6.6 binary distribution, therefore I have added v3.3 as an affected version. It is reasonable to suppose that all versions in between are affected as well, but I have not tested versions 2.7, 3.1, or 3.2. I attach a complete package with source and Autotools build scripts. A bit of overkill, I guess, but pretty easy to use. As is typical with the Autotools, the build system is far larger than the actual project sources (those are only 162 lines of C and 57 lines of Python, both reasonably well commented). The test should be run against a Python configured with --enable-shared --with-threads (I also used --with-pydebug), and that can be an uninstalled working copy. To build and perform the test: 1) Unpack the tarball tar xzf deadlocktest-0.2.tar.gz 2) Change to the test source directory cd deadlocktest-0.2 3) Configure the test for building ./configure [--with-python-build=/path/to/working/copy] 4) Build the test make 5) Run the test make check The test builds and runs (and fails) against both Python 2.6 and the current trunk (3.3). It passes when run against my patched versions of 2.6 and 3.3.
|msg156924 - (view)||Author: John Bollinger (jcbollinger) *||Date: 2012-03-27 13:57|
For what it's worth, I can convert my standalone test into a PyUnit testcase easily enough (or so it appears). I'm having trouble, however, figuring out how to get the extension it depends on built and accessible to the test, yet not installed with the normal modules.
|msg156927 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2012-03-27 14:19|
I believe there is an example in the packaging unit tests of building an extension module in a test (but I'm not 100% sure). Also, the fact that the test will deadlock if it fails is not a deal breaker: the test should pass normally, and if it fails the buildbots will eventually time out and we'll see the error.
|msg156942 - (view)||Author: John Bollinger (jcbollinger) *||Date: 2012-03-27 17:00|
I looked at the packaging tests (thanks), but I didn't find anything useful to me. There were a couple whose names looked promising, but they turned out to be stubs. As far as I can tell, none of those tests actually invoke the system's C compiler, even indirectly. They are numerous, however, so I could have overlooked something. It occurs to me that because the extension only needs to provide one function, I could just add that to _tkinter. That would ease testing without adding anything to the *public* API, but it seems a bit smelly to me because the point is that a user extension can trigger the bug. Also, the added function would be accessible to programs that choose to ignore privacy convention. Also, I am assuming that tests only need to be runnable by developers and build automatons -- i.e. someone who can and did build Python from source. If they need also to be runnable by end users then a compiled version of any extension the tests depend upon needs to be included in binary distributions.
|msg156943 - (view)||Author: Ned Deily (ned.deily) *||Date: 2012-03-27 17:13|
For examples of tests that build extension modules, see Lib/packaging/tests/test_command_build_ext.py or the Distutils equivalent, Lib/distutils/tests/test_build_ext.py. These tests are also runnable from installed versions of Python, assuming the user has the necessary build tools (compiler, etc) installed.
|msg162536 - (view)||Author: John Bollinger (jcbollinger) *||Date: 2012-06-08 17:04|
I attach a patch fixing the issue and providing a test and docs. The fix is substantially as I described earlier: a thread that holds the Tcl lock is permitted to acquire it logically any number of times, but physically attempts to acquire it only if it doesn't already hold it. A thread-local counter ensures that the lock is logically released the same number of times it has been acquired before it is physically released. The external API is unchanged, and even source changes are minimized to the greatest extent possible. If this fix ultimately is accepted then I hope it can also be back-ported to 2.7.
|msg162539 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2012-06-08 17:35|
Thanks for working on this. This is not my area of expertise, but what you describe sounds like an RLock, and there is a C implementation of RLock in Python3. Could you just use that for Python3? Also, very minor comments on the patch format (I'm not in a position to review the patch itself): we prefer not to add additional copyright notices (some files have older ones). My understanding is you have the copyright by virtue of having published the patch here, and your contributor agreement on file allows us to incorporate it into the codebase, and nothing more is needed. I don't believe we generally include bug fixes in What's New, unless they are significant enough behavior changes that they don't get put into the older versions. It's Raymond's call, though.
|msg162542 - (view)||Author: John Bollinger (jcbollinger) *||Date: 2012-06-08 18:14|
Yes, I have basically made tkinter's Tcl lock into an Rlock. With respect to Python3's Rlock implementation, though, are you talking about what I see in Modules/_threadmodule.c? Even if it would be acceptable to make the tkinter module depend on the thread module (not clear), I don't think I can easily use that because it looks like all the relevant functions are static, in typical extension module fashion. In other words, it provides only a Python API, not a C API. Moreover, the current implementation can easily be backported to Python 2, but that would not be true of an implementation based on the thread module's Rlock. If you would nevertheless prefer that the thread module's Rlock be used then I would appreciate technical suggestions for how to overcome the lack of a C API. I am content to comply with the PSF copyright marking policy. Is it documented somewhere? My understanding is that my copyright does not depend in any way on marking the work -- at least in the US -- but there are other reasons to prefer to mark. Anyway, show me the policy or else just confirm that it is to not mark in cases such as this, and I will remove it. Tkinter threading and re-entrancy issues have been somewhat of a sore spot for a very long time, so I think this change is worth calling out. Nevertheless, if Raymund disagrees then so be it. Thanks
|msg162547 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2012-06-08 19:00|
That's why I phrased it as a question, I don't know enough about the C stuff. Someone else nosy on this bug will probably have a more informed opinion. I don't think the copyright marking policy is currently written down. It ought to be, but I have a sinking feeling making that happen isn't going to be easy, because it involves lawyerly stuff.
messages: + msg162547
|2012-06-08 18:14:06||jcbollinger||set||messages: + msg162542|
|2012-06-08 17:35:37||r.david.murray||set||stage: patch review|
messages: + msg162539
versions: + Python 2.7, Python 3.2, - Python 2.6
keywords: + patch
messages: + msg162536
messages: + msg156943
|2012-03-27 17:00:26||jcbollinger||set||messages: + msg156942|
messages: + msg156927
|2012-03-27 13:57:57||jcbollinger||set||messages: + msg156924|
messages: + msg156861
versions: + Python 3.3
messages: + msg156768