msg192780 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-07-09 22:13 |
In threading.Thread.__bootstrap_inner(), the thread sets self.__started before adding itself to the _active collection. If a different thread forks between these two operations, we're left with a thread that believes it's alive although it isn't. _after_fork() doesn't call __stop() on this thread because the thread isn't in _active.
In the attached patch, Thread adds itself to _active before setting self.__started. This way, if it's interrupted by a fork, it may end up stopped without being started, rather than the reverse. In this case isAlive() correctly returns False.
One concern about the patch: my new test in test_threading.py only reveals the bug in Python 2.7.5 very occasionally. I don't know how to force the test to fail more frequently.
Looking at the threading.py on the 3.3 branch, it appears to have the same bug, but I can't reproduce it there, only on the 2.7 branch.
|
msg192933 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-07-12 08:22 |
This is just a special-case of the more general problem of forking() in a multi-threaded program (see #16500 and #6721).
Since fork() can occur at any time, even though your data structures are protected by locks, they can end up in an inconsistent state in the child process. There's nothing we can do about it, and it affects *every code that's not atomic*.
For example, even though your patch fixes this specific issue, you can still end up with the thread both in _active and _limbo at the same time (which shouldn't be a problem since both are cleared after fork(), but still).
OTOH, the patch is simple and safe, so it might be interesting (another maybe better approach would be to also iterate over _limbo in _after_fork()).
As for the reproduction in 3.3, you're probably having trouble to reproduce it because of the GIL change. You might want to play with sys.setswitchinterval().
|
msg193104 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-07-15 14:17 |
Charles-François, I agree that many guarantees are impossible to enforce in a multithreaded application that calls fork().
But the threading module does try to guarantee, after a fork, that isAlive() is False for all threads but one. I claim that it can more reliably provide this guarantee with my patch applied.
|
msg193124 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2013-07-15 19:15 |
This patch makes sense to me. I've gone over the code and I cannot spot any adverse effects. I was wondering in particular if anything would be surprised to find a non-started thread in _active within the short window where that will be true but nothing appears to care about that.
You should add a comment to the code mentioning this issue and why the .set() is done after the _active insertion. After that, I'd say commit it.
If you want a consistent reproducible test case for this I believe you will need to replace the Thread object's __started with a test wrapper who's set() method blocks waiting for for the fork to have happened before doing the actual set(). That is a bit tricky and may not be worth it.
|
msg193127 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-07-15 20:30 |
> If you want a consistent reproducible test case for this I believe you will need to replace the Thread object's __started with a test wrapper who's set() method blocks waiting for for the fork to have happened before doing the actual set(). That is a bit tricky and may not be worth it.
Actually, setting sys.setswitchinterval(1e-6) makes it fairly easy to reproduce.
|
msg193137 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-07-15 23:51 |
Patch #2:
* Add comment before .set() as requested.
* setcheckinterval(0) and try 20 times to repro the bug, inspired by test_enumerate_after_join. This reliably repros in 2.7.5.
|
msg193539 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-07-22 14:56 |
Bump.
|
msg193689 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-07-25 09:02 |
> Bump.
Did you see my review at http://bugs.python.org/review/18418/#ps8668 ?
|
msg193693 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-07-25 11:13 |
Oh, no I didn't. I'm learning how this system works. I'll respond there
soon.
Thanks
On Thursday, July 25, 2013, Charles-François Natali wrote:
>
> Charles-François Natali added the comment:
>
> > Bump.
>
> Did you see my review at http://bugs.python.org/review/18418/#ps8668 ?
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org <javascript:;>>
> <http://bugs.python.org/issue18418>
> _______________________________________
>
|
msg193709 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-07-25 18:00 |
New patch for 3.3 branch after Charles-François's critique: instead of changing startup sequence in Thread._bootstrap_inner, stop all threads in _limbo after a fork.
|
msg193710 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-07-25 18:24 |
(Sorry about the extraneous XML file from my IDE, I made a mistake and allowed the diff to include it.)
|
msg193770 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-07-27 01:03 |
By the way, I'm going on vacation through August 4. When I return I'll
check if there's anything else I need to do to help resolve this.
On Thursday, July 25, 2013, A. Jesse Jiryu Davis wrote:
>
> A. Jesse Jiryu Davis added the comment:
>
> (Sorry about the extraneous XML file from my IDE, I made a mistake and
> allowed the diff to include it.)
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org <javascript:;>>
> <http://bugs.python.org/issue18418>
> _______________________________________
>
|
msg194160 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-08-02 07:07 |
I've posted another review (not sure you receive notifications).
|
msg194385 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-08-04 15:42 |
I'm back from a Zen retreat--no email!--will address your comments
momentarily.
On Fri, Aug 2, 2013 at 3:07 AM, Charles-François Natali <
report@bugs.python.org> wrote:
>
> Charles-François Natali added the comment:
>
> I've posted another review (not sure you receive notifications).
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue18418>
> _______________________________________
>
|
msg194387 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-08-04 15:50 |
New patch for 3.3 branch after Charles-François's critique: use _enumerate() where appropriate, and join the test thread before finishing the test.
|
msg194792 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-08-10 14:09 |
Bump.
I think my most recent patch (August 4) addresses all of Charles-François's comments.
|
msg195580 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-08-18 22:09 |
FYI I'm going on vacation again 8/20 through 8/25. I'll check in again as soon as I return to see if there's anything else I should do.
|
msg196576 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-08-30 21:14 |
Well... I can't say anything except that the patch looks harmless.
It's a pity the test failure is hard to trigger, though.
|
msg196581 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-08-30 21:34 |
New changeset ba54011aa295 by Charles-François Natali in branch '2.7':
Issue #18418: After fork(), reinit all threads states, not only active ones.
http://hg.python.org/cpython/rev/ba54011aa295
New changeset 29fce7f31539 by Charles-François Natali in branch '3.3':
Issue #18418: After fork(), reinit all threads states, not only active ones.
http://hg.python.org/cpython/rev/29fce7f31539
New changeset 8f39e2f987fb by Charles-François Natali in branch 'default':
Issue #18418: After fork(), reinit all threads states, not only active ones.
http://hg.python.org/cpython/rev/8f39e2f987fb
|
msg196612 - (view) |
Author: A. Jesse Jiryu Davis (emptysquare) * |
Date: 2013-08-31 02:49 |
Thanks for accepting this patch!
Antoine, I initially had trouble writing a test that reliably reproduced the bug, especially in Python 3.3. But I read other threading tests and got smarter. The final patch includes tests that are very reliable at revealing the bug in 2.7 and 3.3.
|
msg196622 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-08-31 07:37 |
> The final patch includes tests that are very reliable at
> revealing the bug in 2.7 and 3.3.
Indeed, I could reproduce it systematically without the patch.
> Thanks for accepting this patch!
Well, thanks for the patch!
|
msg197252 - (view) |
Author: Kubilay Kocak (koobs) |
Date: 2013-09-08 11:37 |
For reference, this test is successfuly identifying failures on koobs-freebsd and koobs-freebsd10 buildbots:
======================================================================
FAIL: test_is_alive_after_fork (test.test_threading.ThreadTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/test_threading.py", line 478, in test_is_alive_after_fork
self.assertEqual(0, status)
AssertionError: 0 != 256
|
msg197254 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-09-08 11:40 |
I've identified a possible cause (and fix) for the sporadic test failures in 74dc664ad699. If the solution holds, it should be backported to 3.3 and 2.7.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:47 | admin | set | github: 62618 |
2013-09-08 11:40:19 | pitrou | set | messages:
+ msg197254 |
2013-09-08 11:37:04 | koobs | set | nosy:
+ koobs messages:
+ msg197252
|
2013-08-31 07:37:57 | neologix | set | status: open -> closed type: behavior messages:
+ msg196622
resolution: fixed stage: commit review -> resolved |
2013-08-31 02:49:17 | emptysquare | set | messages:
+ msg196612 |
2013-08-30 21:34:38 | python-dev | set | nosy:
+ python-dev messages:
+ msg196581
|
2013-08-30 21:14:53 | pitrou | set | messages:
+ msg196576 |
2013-08-30 16:34:20 | neologix | set | nosy:
+ pitrou stage: patch review -> commit review
versions:
+ Python 3.3, Python 3.4 |
2013-08-18 22:09:10 | emptysquare | set | messages:
+ msg195580 |
2013-08-18 15:12:04 | neologix | set | stage: patch review |
2013-08-10 14:09:41 | emptysquare | set | messages:
+ msg194792 |
2013-08-04 15:50:59 | emptysquare | set | files:
+ issue18418-3.patch
messages:
+ msg194387 |
2013-08-04 15:42:49 | emptysquare | set | messages:
+ msg194385 |
2013-08-02 07:07:35 | neologix | set | messages:
+ msg194160 |
2013-07-27 01:03:29 | emptysquare | set | messages:
+ msg193770 |
2013-07-25 18:24:49 | emptysquare | set | messages:
+ msg193710 |
2013-07-25 18:00:42 | emptysquare | set | files:
+ issue18418-2.patch
messages:
+ msg193709 |
2013-07-25 11:13:08 | emptysquare | set | messages:
+ msg193693 |
2013-07-25 09:02:24 | neologix | set | messages:
+ msg193689 |
2013-07-22 14:56:02 | emptysquare | set | messages:
+ msg193539 |
2013-07-16 00:05:05 | emptysquare | set | files:
+ fix_is_alive_and_fork_python_33.patch |
2013-07-15 23:51:03 | emptysquare | set | files:
+ fix_is_alive_and_fork_2.patch
messages:
+ msg193137 |
2013-07-15 20:30:57 | neologix | set | messages:
+ msg193127 |
2013-07-15 19:15:57 | gregory.p.smith | set | messages:
+ msg193124 |
2013-07-15 14:17:39 | emptysquare | set | messages:
+ msg193104 |
2013-07-12 08:22:36 | neologix | set | nosy:
+ neologix messages:
+ msg192933
|
2013-07-09 23:03:08 | pitrou | set | nosy:
+ gregory.p.smith, sbt
|
2013-07-09 22:13:50 | emptysquare | create | |