Issue11870
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011-04-18 20:17 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
test_threading_fork.diff | neologix, 2011-06-26 20:55 | review | ||
debug_stuck.diff | neologix, 2011-12-10 12:03 | review | ||
threading_reinit_lock.diff | neologix, 2011-12-18 16:05 | review |
Messages (19) | |||
---|---|---|---|
msg133990 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-04-18 20:17 | |
test_3_join_in_forked_from_thread() of test_threading failed on "x86 Ubuntu Shared 3.x" buildbot: ----------------------------------- [201/354] test_threading [41179 refs] [40407 refs] [40407 refs] [40407 refs] Timeout (1:00:00)! Thread 0x404218c0: File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 466 in _eintr_retry_call File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1486 in _try_wait File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1528 in wait File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 455 in _run_and_join File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 518 in test_3_join_in_forked_from_thread File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 387 in _executeTestPart File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 442 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 494 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1078 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1166 in _run_suite File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1192 in run_unittest File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 728 in test_main File "./Lib/test/regrtest.py", line 1041 in runtest_inner File "./Lib/test/regrtest.py", line 835 in runtest File "./Lib/test/regrtest.py", line 659 in main File "./Lib/test/regrtest.py", line 1619 in <module> make: *** [buildbottest] Error 1 program finished with exit code 2 elapsedTime=4426.776675 http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/3577/steps/test/logs/stdio ----------------------------------- Code of the test: ---------------------------------- def _run_and_join(self, script): script = """if 1: import sys, os, time, threading # a thread, which waits for the main program to terminate def joiningfunc(mainthread): mainthread.join() print('end of thread') # stdout is fully buffered because not a tty, we have to flush # before exit. sys.stdout.flush() \n""" + script p = subprocess.Popen([sys.executable, "-c", script], stdout=subprocess.PIPE) rc = p.wait() <~~~ HANG HERE ~~~~ data = p.stdout.read().decode().replace('\r', '') p.stdout.close() self.assertEqual(data, "end of main\nend of thread\n") self.assertFalse(rc == 2, "interpreter was blocked") self.assertTrue(rc == 0, "Unexpected error") @unittest.skipUnless(hasattr(os, 'fork'), "needs os.fork()") def test_3_join_in_forked_from_thread(self): # Like the test above, but fork() was called from a worker thread # In the forked process, the main Thread object must be marked as stopped. # Skip platforms with known problems forking from a worker thread. # See http://bugs.python.org/issue3863. if sys.platform in ('freebsd4', 'freebsd5', 'freebsd6', 'netbsd5', 'os2emx'): raise unittest.SkipTest('due to known OS bugs on ' + sys.platform) script = """if 1: main_thread = threading.current_thread() def worker(): childpid = os.fork() if childpid != 0: os.waitpid(childpid, 0) sys.exit(0) t = threading.Thread(target=joiningfunc, args=(main_thread,)) print('end of main') t.start() t.join() # Should not block: main_thread is already stopped w = threading.Thread(target=worker) w.start() """ self._run_and_join(script) ---------------------------------- |
|||
msg139076 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-06-25 14:51 | |
test_2_join_in_forked_process fails on FreeBSD 6.4 buildbot. http://www.python.org/dev/buildbot/all/builders/x86 FreeBSD 6.4 3.x/builds/1606/steps/test/logs/stdio """ ====================================================================== FAIL: test_2_join_in_forked_process (test.test_threading.ThreadJoinOnShutdown) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_threading.py", line 464, in test_2_join_in_forked_process self._run_and_join(script) File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_threading.py", line 436, in _run_and_join self.assertEqual(data, "end of main\nend of thread\n") AssertionError: '' != 'end of main\nend of thread\n' + end of main + end of thread """ I think it's the same problem as issue #12316: in the child process, even calling pthread_create can segfault/abort on FreeBSD6 (async-safe blahblah...). Tests creating a thread from a fork()ed process should be skipped on FreeBSD6. Patch attached, along with some refactoring to use the skipIf idiom. As for test_3_join_in_forked_from_thread, well, it could be more or less the same problem. We're really doing something prohibited by POSIX, so things might break in unexpected ways. For example, calling pthread_condition_destroy from the child process can deadlock (see http://bugs.python.org/issue6721#msg136047). Victor: to debug this kind of problem, it would be great if faulthandler could also dump tracebacks of children processes. Do you mind if I create a new issue? |
|||
msg139121 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-25 21:50 | |
> Victor: to debug this kind of problem, it would be great > if faulthandler could also dump tracebacks of children processes. > Do you mind if I create a new issue? Please open a new issue. |
|||
msg139129 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-25 23:23 | |
+ @unittest.skipIf(sys.platform in ('freebsd4', 'freebsd5', 'freebsd6', + 'netbsd5', 'os2emx'), "due to known OS bug") This skip gives very few information, and it is duplicated for each function. I would prefer a constant of the "broken OSes" with your following comment attached to the constant: + # Between fork() and exec(), only async-safe functions are allowed (issues + # #12316 and #11870), and fork() from a worker thread is known to trigger + # problems with some operating systems: skip problematic tests on platforms + # known to behave badly. Or split the test case into two testcases: one using fork and skipped on broken platforms, one not using fork? --- > As for test_3_join_in_forked_from_thread, well, it could be more > or less the same problem. We're really doing something prohibited > by POSIX, so things might break in unexpected ways. If the creation of a thread after a fork is reliable on some systems, we should not deny the creation of new threads after a fork. You can replace "creation of new threads" by any other non async-safe function in my previous sentence. Therefore I agree that the good answer to this issue is to skip the test on "broken systems" (or should we call them "POSIX compliant systems?" :-)). |
|||
msg139131 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-06-25 23:40 | |
> This skip gives very few information, and it is duplicated for each > function. I would prefer a constant of the "broken OSes" with your > following comment attached to the constant: Ok, I'll try to write something along those lines. > If the creation of a thread after a fork is reliable on some systems, > we should not deny the creation of new threads after a fork. Well, the problem is that it is not reliable on any platform, but happens to work "most of the time" on some platforms ;-) |
|||
msg139219 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-06-26 20:55 | |
Patch attached. |
|||
msg139337 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-27 23:14 | |
Your patch is linux3 compliant, go ahead! |
|||
msg139574 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-07-01 12:53 | |
New changeset 0ed5e6ff10f8 by Victor Stinner in branch '3.2': Issue #11870: Skip test_threading.test_2_join_in_forked_process() on platforms http://hg.python.org/cpython/rev/0ed5e6ff10f8 New changeset f43dee86fffd by Victor Stinner in branch 'default': (merge 3.2) Issue #11870: Skip test_threading.test_2_join_in_forked_process() http://hg.python.org/cpython/rev/f43dee86fffd |
|||
msg139576 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-07-01 13:05 | |
New changeset ff36b8cadfd6 by Victor Stinner in branch '2.7': Issue #11870: Skip test_threading.test_2_join_in_forked_process() on platforms http://hg.python.org/cpython/rev/ff36b8cadfd6 |
|||
msg139583 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-07-01 14:06 | |
The initial problem was test_3_join_in_forked_from_thread() and the hangs does still happen: [324/356] test_threading Timeout (1:00:00)! Thread 0x404248c0: File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1498 in _communicate_with_poll File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1423 in _communicate File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 836 in communicate File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 32 in _assert_python File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 50 in assert_python_ok File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 434 in _run_and_join File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 493 in test_3_join_in_forked_from_thread File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 407 in _executeTestPart File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 462 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 514 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/runner.py", line 168 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1259 in _run_suite File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1285 in run_unittest File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 774 in test_main File "./Lib/test/regrtest.py", line 1070 in runtest_inner File "./Lib/test/regrtest.py", line 861 in runtest File "./Lib/test/regrtest.py", line 669 in main File "./Lib/test/regrtest.py", line 1648 in <module> http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/4081/steps/test/logs/stdio (neologix's patch doesn't change anything for x86 Ubuntu Shared 3.x buildbot, which is a Linux, not a FreeBSD) I don't know why it only hangs on this Linux buildbot. It's maybe an old Linux kernel, an old GNU libc version, or something like that? |
|||
msg139602 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-07-01 20:48 | |
> The initial problem was test_3_join_in_forked_from_thread() and the hangs does still happen: > Yes, the patch was there to fix test_2_join_in_forked_from_thread. > [324/356] test_threading > Timeout (1:00:00)! > Thread 0x404248c0: > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1498 in _communicate_with_poll > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1423 in _communicate > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 836 in communicate > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 32 in _assert_python > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 50 in assert_python_ok > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 434 in _run_and_join > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 493 in test_3_join_in_forked_from_thread > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 407 in _executeTestPart > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 462 in run > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 514 in __call__ > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/runner.py", line 168 in run > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1259 in _run_suite > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1285 in run_unittest > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 774 in test_main > File "./Lib/test/regrtest.py", line 1070 in runtest_inner > File "./Lib/test/regrtest.py", line 861 in runtest > File "./Lib/test/regrtest.py", line 669 in main > File "./Lib/test/regrtest.py", line 1648 in <module> > > http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/4081/steps/test/logs/stdio > This means that the subprocess hangs, but without a backtrace of the child process (issue #12413), it's hard to analyse it further. I've had a look at the code, and couldn't find anything obviously wrong; Gregory's patches to sanitize threading's lock should have fixed this. I also tried running this test in a loop for 48 hours but couldn't reproduce it. One possible explanation (but it's just a wild guess) is that with a certain kernel/libc configuration, the lock deallocation code can deadlock (I've seen pthread_cond_destroy() block, this could maybe happen with sem_destroy()). So I suggest to try to come up with a solution to #12413, which should help analyzing this - and similar - issues. |
|||
msg148698 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-12-01 00:43 | |
> Gregory's patches to sanitize threading's lock should have fixed this The subprocess hang still occurs something, it just happened: http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/4898/steps/test/logs/stdio [110/363] test_threading Timeout (1:00:00)! Thread 0x404888c0: File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1513 in _communicate_with_poll File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1438 in _communicate File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 850 in communicate File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 32 in _assert_python File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 50 in assert_python_ok File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 441 in _run_and_join File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 497 in test_3_join_in_forked_from_thread ... |
|||
msg148993 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-12-07 21:35 | |
I removed my two previous message (msg148991 and msg148992), there were unrelated to this issue: the test hangs in debug mode in the IO module because of a deadleak in IO related to the fork... |
|||
msg148996 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-12-07 22:04 | |
To debug this, we should probably make use of faulthandler (but not dump_tracebacks_later, since it creates a new thread). The way to go could be to make the parent process send a fatal signal to the child process if the latter takes too long to complete. |
|||
msg149148 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-12-10 12:02 | |
Here's a patch to help nail this down. |
|||
msg149773 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-12-18 16:05 | |
Victor, could you try the patch attached? |
|||
msg149784 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-12-18 17:47 | |
New changeset 775319cebaa3 by Charles-François Natali in branch '2.7': Issue #11870: threading: Properly reinitialize threads internal locks and http://hg.python.org/cpython/rev/775319cebaa3 New changeset de962ec0faaa by Charles-François Natali in branch '3.2': Issue #11870: threading: Properly reinitialize threads internal locks and http://hg.python.org/cpython/rev/de962ec0faaa New changeset cec0d77d01c4 by Charles-François Natali in branch 'default': Issue #11870: threading: Properly reinitialize threads internal locks and http://hg.python.org/cpython/rev/cec0d77d01c4 |
|||
msg149788 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-12-18 18:12 | |
Should be fixed now. |
|||
msg180158 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-01-17 22:37 | |
New changeset cd54b48946ca by Stefan Krah in branch '3.3': Issue #11870: Skip test_3_join_in_forked_from_thread() on HP-UX. http://hg.python.org/cpython/rev/cd54b48946ca |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:16 | admin | set | github: 56079 |
2013-01-17 22:37:26 | python-dev | set | messages: + msg180158 |
2011-12-18 18:12:47 | neologix | set | status: open -> closed resolution: fixed messages: + msg149788 stage: resolved |
2011-12-18 17:47:32 | python-dev | set | messages: + msg149784 |
2011-12-18 16:05:27 | neologix | set | files:
+ threading_reinit_lock.diff messages: + msg149773 |
2011-12-10 12:03:00 | neologix | set | files:
+ debug_stuck.diff messages: + msg149148 |
2011-12-07 22:04:19 | neologix | set | messages: + msg148996 |
2011-12-07 21:35:34 | vstinner | set | messages: + msg148993 |
2011-12-07 21:34:25 | vstinner | set | messages: - msg148992 |
2011-12-07 21:34:23 | vstinner | set | messages: - msg148991 |
2011-12-07 21:26:20 | vstinner | set | messages: + msg148992 |
2011-12-07 21:19:07 | vstinner | set | messages: + msg148991 |
2011-12-01 00:43:31 | vstinner | set | messages: + msg148698 |
2011-07-01 20:48:47 | neologix | set | messages: + msg139602 |
2011-07-01 14:06:49 | vstinner | set | messages: + msg139583 |
2011-07-01 13:05:01 | python-dev | set | messages: + msg139576 |
2011-07-01 12:53:42 | python-dev | set | nosy:
+ python-dev messages: + msg139574 |
2011-06-27 23:14:10 | vstinner | set | messages: + msg139337 |
2011-06-26 20:55:31 | neologix | set | files: - test_threading_fork.diff |
2011-06-26 20:55:08 | neologix | set | files:
+ test_threading_fork.diff messages: + msg139219 |
2011-06-25 23:40:27 | neologix | set | messages: + msg139131 |
2011-06-25 23:23:05 | vstinner | set | messages: + msg139129 |
2011-06-25 21:50:24 | vstinner | set | messages: + msg139121 |
2011-06-25 14:51:09 | neologix | set | files:
+ test_threading_fork.diff nosy: + neologix messages: + msg139076 keywords: + patch |
2011-04-19 00:43:03 | pitrou | set | nosy:
+ gps |
2011-04-18 20:17:59 | vstinner | create |