|
msg91265 - (view) |
Author: Reid Kleckner (rnk)  |
Date: 2009-08-04 18:56 |
This bug is similar to the importlock deadlock, and it's really part of
a larger problem that you should release all locks before you fork.
However, we can fix this in the threading module directly by freeing and
resetting the locks on the main thread after a fork.
I've attached a test case that inserts calls to sleep at the right
places to make the following occur:
- Main thread spawns a worker thread.
- Main thread joins worker thread.
- To join, the main thread acquires the lock on the condition variable
(worker.__block.acquire()).
== switch to worker ==
- Worker thread forks.
== switch to child process ==
- Worker thread, which is now the only thread in the process, returns.
- __bootstrap_inner calls self.__stop() to notify any other threads
waiting for it that it returned.
- __stop() tries to acquire self.__block, which has been left in an
acquired state, so the child process hangs here.
== switch to worker in parent process ==
- Worker thread calls os.waitpid(), which hangs, since the child never
returns.
So there's the deadlock.
I think I should be able to fix it just by resetting the condition
variable lock and any other locks hanging off the only thread left
standing after the fork.
|
|
msg91273 - (view) |
Author: Reid Kleckner (rnk)  |
Date: 2009-08-04 21:18 |
Here's a patch for 3.2 which adds the fix and a test case. I also
verified that the problem exists in 3.1, 2.7, and 2.6 and backported the
patch to those versions, but someone should review this one before I
upload those.
|
|
msg109914 - (view) |
Author: Reid Kleckner (rnk)  |
Date: 2010-07-10 19:35 |
Here's an updated patch for py3k (3.2). The test still fails without the fix, and passes with the fix.
Thinking more about this, I'll try summarizing the bug more coherently:
When the main thread joins the child threads, it acquires some locks. If a fork in a child thread occurs while those locks are held, they remain locked in the child process. My solution is to do here what we do elsewhere in CPython: abandon radioactive locks and allocate fresh ones.
|
|
msg109933 - (view) |
Author: Reid Kleckner (rnk)  |
Date: 2010-07-10 20:39 |
I realized that in a later fix for unladen-swallow, we also cleared the condition variable waiters list, since it has radioactive synchronization primitives in it as well.
Here's an updated patch that simplifies the fix by just using __init__() to completely reinitialize the condition variables and adds a test.
This corresponds to unladen-swallow revisions r799 and r834.
|
|
msg110071 - (view) |
Author: Adam Olsen (Rhamphoryncus) |
Date: 2010-07-12 06:34 |
I don't have any direct opinions on this, as it is just a bandaid. fork, as defined by POSIX, doesn't allow what we do with it, so we're reliant on great deal of OS and library implementation details. The only portable and robust solution would be to replace it with a unified fork-and-exec API that's implemented directly in C.
|
|
msg110092 - (view) |
Author: Reid Kleckner (rnk)  |
Date: 2010-07-12 15:11 |
I completely agree, but the cat is out of the bag on this one. I don't see how we could get rid of fork until Py4K, and even then I'm sure there will be people who don't want to see it go, and I'd rather not spend my time arguing this point.
The only application of fork that doesn't use exec that I've heard of is pre-forked Python servers. But those don't seem like they would be very useful, since with refcounting the copy-on-write behavior doesn't get you very many wins.
The problem that this bandaid solves for me is that test_threading.py already tests thread+fork behaviors, and can fail non-deterministically.
This problem was exacerbated while I was working on making the compilation thread.
I don't think we can un-support fork and threads in the near future either, because subprocess.py uses fork, and libraries can use fork behind the user's back.
|
|
msg125236 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2011-01-03 20:48 |
fwiw a unified fork-and-exec API implemented in C is what I added in Modules/_posixsubprocess.c to at least avoid this issue as much as possible when using subprocess.
|
|
msg125240 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2011-01-03 21:07 |
patch looks good. committed in r87710 for 3.2. needs back porting to 3.1 and 2.7 and optionally 2.6.
|
|
msg125270 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2011-01-04 01:10 |
r87726 for release31-maint
r87727 for release27-maint - this required a bit more fiddling as _block and _started and _cond were __ private.
|
|
msg125273 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2011-01-04 01:25 |
Attached is a patch for Python 2.6 release26_maint for reference incase someone wants it. That branch is closed - security fixes only.
|
|
msg125338 - (view) |
Author: Nadeem Vawda (nadeem.vawda) *  |
Date: 2011-01-04 16:30 |
r87710 introduces an AttributeError in test_thread's TestForkInThread test case. If os.fork() is called from a thread created by the _thread module, threading._after_fork() will get a _DummyThread (with no _block attribute) as the current thread.
I've attached a patch that checks whether the thread has a _block attribute before trying to reinitialize it.
|
|
msg125346 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2011-01-04 18:34 |
eek, thanks for noticing that!
r87740 fixes this in py3k. backporting to 3.1 and 2.7 now.
|
|
msg125350 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2011-01-04 18:44 |
r87741 3.1
r87742 2.7
|
|
| Date |
User |
Action |
Args |
| 2011-06-25 10:43:10 | neologix | link | issue5114 superseder |
| 2011-01-04 18:44:17 | gregory.p.smith | set | status: open -> closed
messages:
+ msg125350 resolution: accepted -> fixed nosy:
barry, collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, nadeem.vawda, rnk |
| 2011-01-04 18:34:43 | gregory.p.smith | set | priority: normal -> release blocker nosy:
+ barry messages:
+ msg125346
|
| 2011-01-04 16:40:56 | pitrou | set | status: closed -> open nosy:
collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, nadeem.vawda, rnk |
| 2011-01-04 16:30:45 | nadeem.vawda | set | files:
+ test_thread.diff nosy:
+ nadeem.vawda messages:
+ msg125338
|
| 2011-01-04 01:25:20 | gregory.p.smith | set | status: open -> closed files:
+ issue6643-release26_maint_gps01.diff versions:
- Python 2.7 nosy:
collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk messages:
+ msg125273
keywords:
+ patch |
| 2011-01-04 01:10:40 | gregory.p.smith | set | nosy:
collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk messages:
+ msg125270 versions:
- Python 3.1, Python 3.2 |
| 2011-01-03 21:07:41 | gregory.p.smith | set | assignee: rnk -> gregory.p.smith messages:
+ msg125240 resolution: accepted nosy:
collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk |
| 2011-01-03 20:48:25 | gregory.p.smith | set | nosy:
collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk messages:
+ msg125236 |
| 2010-07-18 14:50:49 | rnk | link | issue6642 dependencies |
| 2010-07-18 14:49:23 | rnk | set | keywords:
+ needs review, - patch assignee: rnk |
| 2010-07-12 15:11:29 | rnk | set | messages:
+ msg110092 |
| 2010-07-12 06:34:26 | Rhamphoryncus | set | messages:
+ msg110071 |
| 2010-07-11 14:49:42 | pitrou | set | nosy:
+ gregory.p.smith, Rhamphoryncus
|
| 2010-07-11 13:22:50 | rnk | set | title: joining a child that forks can deadlock in the forked child process -> Throw away more radioactive locks that could be held across a fork in threading.py |
| 2010-07-10 20:39:02 | rnk | set | files:
+ thread-fork-join.diff
messages:
+ msg109933 |
| 2010-07-10 19:35:50 | rnk | set | files:
+ thread-fork-join.diff
messages:
+ msg109914 |
| 2009-08-11 18:19:33 | collinwinter | set | nosy:
+ jyasskin, collinwinter components:
+ Interpreter Core
|
| 2009-08-04 21:18:43 | rnk | set | files:
+ forkdeadlock.diff keywords:
+ patch messages:
+ msg91273
versions:
+ Python 3.1, Python 2.7, Python 3.2 |
| 2009-08-04 18:56:48 | rnk | create | |