classification
Title: test_no_refcycle_through_target sometimes fails in test_threading
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jyasskin Nosy List: jyasskin, nnorwitz, pitrou
Priority: normal Keywords: patch

Created on 2008-03-27 12:04 by pitrou, last changed 2008-03-28 04:12 by jyasskin. This issue is now closed.

Files
File name Uploaded Description Edit
test_threading.patch pitrou, 2008-03-27 12:35
test_threading2.patch pitrou, 2008-03-27 16:32
Messages (8)
msg64584 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-27 12:04
This is a reminder for the failing test which is affecting some buildbots.
I can't reproduce it right now (under Linux), even by surrounding the
test code with a pair of gc.disable() / gc.enable().
msg64586 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-27 12:35
This is a tentative patch. I can't verify it fixes anything but at least
it shouldn't do any harm ;)
If it doesn't fix it I see two possible explanations:
- the buildbots are running some kind of debug build which keeps
references to local variables, preventing them to be deallocated
- the C thread implementation needs fixing on some platforms
msg64596 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-27 16:06
Hmm, even with a Py_DEBUG build I can't reproduce the bug.
msg64600 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-27 16:22
Hmm, I think I know what happens. t_bootstrap() in threadmodule.c calls
the self.__bootstrap() method in the Thread object, and it is this
method which sets the __stopped flag at its end, which in turns wakes up
the join() method.

The problem is that at this point, t_bootstrap() still (rightly) holds a
reference to the Thread object, since it has a reference to its
__bootstrap() method which is still running. Depending on how the
operating system switches threads, this reference may or may not be
released when the join() method returns.

So I think it's the test that is flaky. Instead of calling the join()
method, it should wait for the OS-level thread to finish. Or it should
find another way of testing for the reference cycle.
msg64602 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-27 16:32
I'm attaching a patch which tries to make the test a bit less flaky
(well, it still is, since I introduce a time.sleep() :-)).
msg64608 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008-03-27 21:29
I'll look at this tonight.
msg64612 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008-03-28 03:36
I think I've confirmed your diagnosis. If I add a _sleep(.01) to
Thread.__bootstrap_inner() just after the call to self.__stop(), the
test fails reliably. Very good catch! Given that, I think just adding a
short sleep to the test before counting references will fix it nearly
every time, but I'd like to kill the race dead if we can.
msg64614 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008-03-28 04:12
Fixed in r61984. I believe the exception info was actually keeping the
object alive. The thread itself didn't have any references to it, but
the traceback did.
History
Date User Action Args
2008-03-28 04:12:52jyasskinsetstatus: open -> closed
resolution: fixed
messages: + msg64614
2008-03-28 03:36:55jyasskinsetmessages: + msg64612
2008-03-27 21:29:40jyasskinsetassignee: jyasskin
messages: + msg64608
nosy: + jyasskin
2008-03-27 16:32:05pitrousetfiles: + test_threading2.patch
messages: + msg64602
2008-03-27 16:22:39pitrousetmessages: + msg64600
2008-03-27 16:06:53pitrousetmessages: + msg64596
2008-03-27 12:36:00pitrousetfiles: + test_threading.patch
keywords: + patch
messages: + msg64586
2008-03-27 12:04:29pitroucreate