classification
Title: test_ttk_guionly and test_tk can cause Tk segfaults on OS X when run with regrtest -j option
Type: crash Stage: resolved
Components: Tests, Tkinter Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ned.deily Nosy List: ned.deily, python-dev, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2014-10-30 23:05 by ned.deily, last changed 2014-11-02 02:36 by ned.deily. This issue is now closed.

Files
File name Uploaded Description Edit
issue22770.patch serhiy.storchaka, 2014-10-31 09:02 review
Messages (10)
msg230307 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-10-30 23:05
The changes to tkinter tests introduced by the changes for Issue22236 (2.7 32fdaf401e50, 3.4 dd1dffe6f0d2, and default/3.5 014060738f7f) may cause Python to crash due to a Tk segfault on OS X.  The crash only shows up when using the -j option to regrtest and is dependent on other factors, e.g. 64-bit vs 32-bit and randomized address space allocations, so it is not 100% reproducible.  But it happens often enough (to me) when running the Python test suite from an installed framework location with a current Cocoa Tk 8.5.x or 8.6.x, for example, with an instance from a python.org binary installer:

    /usr/local/bin/python3.4 -m test -w -uall -j3

The reasons for the segfault are due to an arcane bug in Tk which shows up when Tcl interpreter instances are destroyed and created within the execution of an OS X app bundle, behavior which now happens as a result of the above changes when -j > 0 is used in an OS X framework build.  There is a more detailed analysis in the Tk bug I've opened:

    https://core.tcl.tk/tk/tktview?name=c18c36f80c

The bug is not one that most Tk users would run into, I think, so it's certainly not a critical bug in Tk.  But it is annoying to run into when running Python tests.  On the other hand, I'm not sure that it is worth trying to work around the problem in the tests; I think the basic idea of the Issue22236 changes is sound.

There is a relatively easy workaround if one runs into the problem: exclude the two problematic Tk tests and run them separately sequentially.

    pythonx.y -m test -w -uall -j3 -x test_tk test_ttk_guionly
    pythonx.y -m test -w -uall        test_tk test_ttk_guionly

(-m test.regrtest for Python 2.7.x)

It may be sufficient to just document the workaround here in case others run into it.
msg230309 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-10-30 23:16
For the record, a typical instance of this failure will look like this:

$ /usr/local/bin/python3.4 -m test -w -uall -j2
== CPython 3.4.2 (v3.4.2:ab2c023a9432, Oct 5 2014, 20:42:22) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
==   Darwin-14.0.0-x86_64-i386-64bit little-endian
==   hash algorithm: siphash24 64bit
==   /private/var/folders/fm/9wjgctqx61n796zt88qmmnxc0000gq/T/test_python_89951
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0)
[...] test_ttk_guionly
Fatal Python error: Segmentation fault

Current thread 0x00007fff747dd300 (most recent call first):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tkinter/__init__.py", line 496 in wait_visibility
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tkinter/test/test_ttk/test_extensions.py", line 121 in test_horizontal_range
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest/case.py", line 577 in run
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest/case.py", line 625 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest/suite.py", line 125 in run
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest/suite.py", line 87 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest/suite.py", line 125 in run
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest/suite.py", line 87 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest/runner.py", line 168 in run
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/support/__init__.py", line 1750 in _run_suite
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/support/__init__.py", line 1784 in run_unittest
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/test_ttk_guionly.py", line 34 in test_main
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/regrtest.py", line 1280 in runtest_inner
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/regrtest.py", line 978 in runtest
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/regrtest.py", line 532 in main
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/regrtest.py", line 1564 in main_in_temp_cwd
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/regrtest.py", line 1589 in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85 in _run_code
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170 in _run_module_as_main
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/__main__.py", line 3, in <module>
    regrtest.main_in_temp_cwd()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/regrtest.py", line 1564, in main_in_temp_cwd
    main()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/test/regrtest.py", line 738, in main
    raise Exception("Child error on {}: {}".format(test, result[1]))
Exception: Child error on test_ttk_guionly: Exit code -11

and the system crash report will look something like this:

[...]
Process:               Python [89955]
Path:                  /Library/Frameworks/Python.framework/Versions/3.4/Resources/Python.app/Contents/MacOS/Python
Identifier:            org.python.python
Version:               3.4.2 (3.4.2)
Code Type:             X86-64 (Native)
Parent Process:        Python [89951]
Responsible:           iTerm [64155]
User ID:               503

Date/Time:             2014-10-30 12:47:52.051 -0700
OS Version:            Mac OS X 10.10 (14A389)
Report Version:        11
Anonymous UUID:        CFED52E3-698C-835B-D61C-F4B1F18D2CB6


Time Awake Since Boot: 600000 seconds

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000170

VM Regions Near 0x170:
--> 
    __TEXT                 0000000100000000-0000000100001000 [    4K] r-x/rwx SM=COW  /Library/Frameworks/Python.framework/Versions/3.4.2_10_6/Resources/Python.app/Contents/MacOS/Python

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	0x00007fff90cadc7e __kill + 10
1   ???                           	000000000000000000 0 + 0
2   Tcl                           	0x0000000101810430 Tcl_GetCommandInfo + 21
3   Tk                            	0x00000001019c0ae0 OappHandler + 55
4   com.apple.AE                  	0x00007fff88d6899c aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned int, unsigned char*) + 531
5   com.apple.AE                  	0x00007fff88d68719 dispatchEventAndSendReply(AEDesc const*, AEDesc*) + 31
6   com.apple.AE                  	0x00007fff88d68623 aeProcessAppleEvent + 295
7   com.apple.HIToolbox           	0x00007fff8aa6e37e AEProcessAppleEvent + 56
8   com.apple.AppKit              	0x00007fff900d9ec6 _DPSNextEvent + 2665
9   com.apple.AppKit              	0x00007fff900d8fd0 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 194
10  Tk                            	0x00000001019c9bc0 -[TKApplication(TKNotify) nextEventMatchingMask:untilDate:inMode:dequeue:] + 169
11  Tk                            	0x00000001019ca194 TkMacOSXEventsSetupProc + 209
12  Tcl                           	0x0000000101876991 Tcl_DoOneEvent + 220
13  Tk                            	0x000000010191793f Tk_TkwaitObjCmd + 398
14  Tcl                           	0x000000010181085a TclEvalObjvInternal + 773
15  Tcl                           	0x0000000101811a2d Tcl_EvalObjv + 66
16  _tkinter.so                   	0x000000010163cabe Tkapp_Call + 190
17  org.python.python             	0x00000001000e4745 PyEval_EvalFrameEx + 29589
[...]
msg230318 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-10-31 01:58
> when -j > 0 is used in an OS X framework build

Did you mean -j > 1? I checked the buildbots and the x86 Tiger is run -j1 while the AMD Snow Leopard is not running.
msg230320 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-10-31 05:36
> Did you mean -j > 1?

No.

> I checked the buildbots and the x86 Tiger is run -j1 while the AMD Snow Leopard is not running.

I'm not sure what you are referring to here.  This crash would not be seen on either of those buildbots because (1) they are not configured as framework builds, (2) they don't install Python and run the tests from there (rather they run from the build directory), (3) the user name under which the buildbot tests run would have to be logged in as the current interactive user (e.g. owning the window server) to run test_tk or test_ttk_guionly; (4) Tiger (OS X 10.4) is too old to run Cocoa Tk.
msg230321 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-31 08:00
What if add root.update_idletasks() before root.destroy() in Lib/test/support/__init__.py? Does this fix the problem?
msg230322 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-10-31 08:20
Adding root.update_idletasks() before root.destroy() doesn't help by itself.  As I noted in the Tk issue, to avoid the crash without changing Tk and while still destroying and creating roots/Tcl instances, it is necessary to get the "open application" Apple event from OS X to Tk to trigger while the initial Tcl interpreter instance is still valid.  As far as I can tell, the way to do that is to ensure that the initial Tcl interpreter actually draws something on the screen, not just create a test button object.  The drawing triggers the promotion of the bundle process to a full GUI process and that (I believe) causes the "open application" event to be sent.  As long as that event (which contains the unchecked pointer to the Tcl interpreter instance) is processed by the interpreter that created it, there is no problem.  Once the first root is destroyed (in each process) without having received the "open application" event, when the event does arrive the interp address will be invalid and a segfault can result.
msg230325 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-31 09:02
Ah, this code doesn't run on Mac OS at all (see issue18604).

Does this patch fixes the problem? May be just root.update() would be enough.
msg230444 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-11-01 09:57
Ah, Issue18604! Thanks for the reminder (and for the patch), Serhly! Focusing on these more recent occurrences of segfaults, I hadn't been thinking about those earlier ones.  Now, after digging into Tk and identifying the problem there, it's clear that those earlier segfaults reported in Issue18604 were a result of the same Tk bug, just caused slightly differently from the tkinter tests.  The #18604 changes also caused a Tcl interpreter instance to be created and destroyed without actually causing Tk to do enough to cause the "open application" Apple event to be fired and received. _is_gui_available is the right place and adding the additional Tk activity there does it at the right time.  And it appears that root.update() is sufficient, as you suggested.  I want to do some additional testing but, assuming no problems are found, I'll plan to apply the simplified patch.
msg230474 - (view) Author: Roundup Robot (python-dev) Date: 2014-11-02 02:34
New changeset bd4dc351d670 by Ned Deily in branch '2.7':
Issue #22770: Prevent some Tk segfaults on OS X when running gui tests.
https://hg.python.org/cpython/rev/bd4dc351d670

New changeset 121517deb318 by Ned Deily in branch '3.4':
Issue #22770: Prevent some Tk segfaults on OS X when running gui tests.
https://hg.python.org/cpython/rev/121517deb318

New changeset e119343bc3ec by Ned Deily in branch 'default':
Issue #22770: merge from 3.4
https://hg.python.org/cpython/rev/e119343bc3ec
msg230475 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-11-02 02:36
Applied for release in 2.7.9, 3.4.3, and 3.5.0.
History
Date User Action Args
2014-11-02 02:36:39ned.deilysetstatus: open -> closed
resolution: fixed
messages: + msg230475

stage: patch review -> resolved
2014-11-02 02:34:03python-devsetnosy: + python-dev
messages: + msg230474
2014-11-01 09:57:38ned.deilysetassignee: ned.deily
messages: + msg230444
stage: patch review
2014-10-31 09:02:32serhiy.storchakasetfiles: + issue22770.patch
keywords: + patch
messages: + msg230325
2014-10-31 08:20:48ned.deilysetassignee: ned.deily -> (no value)
messages: + msg230322
2014-10-31 08:00:03serhiy.storchakasetassignee: ned.deily
type: crash
messages: + msg230321
2014-10-31 05:36:55ned.deilysetmessages: + msg230320
2014-10-31 01:58:32terry.reedysetnosy: + terry.reedy
messages: + msg230318
2014-10-30 23:16:59ned.deilysetmessages: + msg230309
2014-10-30 23:05:44ned.deilycreate