classification
Title: test_threaded_import fails with -j4
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, christian.heimes, eric.snow, georg.brandl, haypo, jkloth, ncoghlan, pitrou, python-dev, skrah
Priority: release blocker Keywords: patch

Created on 2012-08-25 08:33 by skrah, last changed 2012-09-09 09:18 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
importrace.patch pitrou, 2012-08-27 20:51 review
Messages (22)
msg169116 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-25 08:33
test_parallel_module_init() fails on Windows-64:

http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/553/steps/test/logs/stdio

======================================================================
FAIL: test_parallel_module_init (test.test_threaded_import.ThreadedImportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_threaded_import.py", line 125, in test_parallel_module_init
    self.check_parallel_module_init()
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_threaded_import.py", line 120, in check_parallel_module_init
    self.assertFalse(errors)
AssertionError: [AttributeError("'module' object has no attribute 'randrange'",)] is not false

----------------------------------------------------------------------
msg169118 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-25 09:07
The buildbot in question uses -j4, and I can reproduce this on Linux
with:

./python -m test -uall -F -j4 -v test_threaded_import


======================================================================
FAIL: test_parallel_meta_path (test.test_threaded_import.ThreadedImportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/stefan/hg/cpython/Lib/test/test_threaded_import.py", line 131, in test_parallel_meta_path
    self.check_parallel_module_init()
  File "/home/stefan/hg/cpython/Lib/test/test_threaded_import.py", line 120, in check_parallel_module_init
    self.assertFalse(errors)
AssertionError: [AttributeError("'module' object has no attribute 'randrange'",), AttributeError("'module' object has no attribute 'randrange'",)] is not false

----------------------------------------------------------------------
Ran 6 tests in 4.221s
msg169122 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-25 10:01
With
$ ./python -m test -uall -F -j16 -v test_threaded_import
I couldn't reproduce after 1000 test passes.
msg169124 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-25 10:22
No joy here either - currently at 704 successful runs and counting with Stefan's command line.

$ uname -a
Linux lancre 3.5.1-1.fc17.x86_64 #1 SMP Thu Aug 9 17:50:43 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/cpuinfo | grep "model name"
model name      : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz
model name      : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz
model name      : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz
model name      : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz
msg169129 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-25 10:51
I can only reproduce this on a Core i7. Native OS is Debian Wheezy,
also reproduced earlier on a FreeBSD VM hosted on that machine
(see msg167989). The machine has 4 physical cores, perhaps we
can blame hyper-threading. ;)


$ cat /proc/cpuinfo | grep "model name"
model name      : Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz
model name      : Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz
model name      : Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz
model name      : Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz
model name      : Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz
model name      : Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz
model name      : Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz
model name      : Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz



I can't reproduce it on a Core 2 Duo running Ubuntu natively.
msg169132 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-25 12:55
Up to 2279 runs without a failure here - yay for elusive hardware specific threading bugs :P

I'll leave it running overnight and see what happens.
msg169133 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-25 13:17
I can also reproduce it on the Core 2 machine with a ridiculuously
low switch interval:


 diff --git a/Lib/test/regrtest.py b/Lib/test/regrtest.py
--- a/Lib/test/regrtest.py
+++ b/Lib/test/regrtest.py
@@ -198,6 +198,8 @@
 except ImportError:
     multiprocessing = None
 
+sys.setswitchinterval(0.00000001)
+
 
 # Some times __path__ and __file__ are not absolute (e.g. while running from
 # Lib/) and, if we change the CWD to run the tests in a temporary dir, some
msg169134 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-25 13:19
Oh, and as usual, my machines have all CPUs at 100% load.
msg169150 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-08-25 18:31
So this was a random one-off (hardly even intermittent) failed test that passed on the retry.  Ah, threads.  Is there an easy way to find all other failures in test_threaded_import in the past, for this and for all buildbots?  Do we have information about load during tests?  Is there an easy way to run the test suite under load on a build bot?

With a large enough sample size we can make more sense of this.  For instance, do failures predate the addition of per-module import locks or the importlib migration in 3.3?  Are failures limited to particular platforms?  Is the problem load related?

BTW, test_threaded_import had another (different) failure a week ago on FreeBSD: issue15599.
msg169151 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-08-25 18:34
(sorry, only saw the first couple messages)
msg169154 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-25 22:02
On the i7 machine with hyper-threading the issue occurs since
edb9ce3a6c2e.


The tests also got slower:

3430d7329a3b:

$ time ./python -m test -uall -v  test_threaded_import

real    0m3.195s
user    0m0.656s
sys     0m0.284s


edb9ce3a6c2e:

$ time ./python -m test -uall -v  test_threaded_import

real    0m24.032s
user    0m0.880s
sys     0m0.744s


Here most of the time is spent in test_parallel_meta_path.


DISCLAIMER: This machine runs Debian Wheezy, which is not
stable yet! On the other hand the Windows bot also reproduced
the issue.
msg169217 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-27 20:51
Ok, I can now reproduce with setswitchinterval(). Here is a patch. There was a race between putting the new module in sys.modules and setting its __initializing__ attribute, so now __initializing__ is set before putting the module in sys.modules.

Also, there was another race when retrieving a module lock from the locks dict (the weakref could be destroyed between the __contains__ test and the actual fetch).
msg169218 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-08-27 20:59
Didn't patch it in to evaluate the surrounding code, but otherwise the patch LGTM.
msg169220 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-08-27 21:15
Looks good to me.  Please commit to default and I'll cherry-pick to rc2.
msg169223 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-27 21:27
If possible I would like to wait for Stefan's confirmation that it fixes the failures for him.
msg169225 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-08-27 21:30
Sure.
msg169229 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-27 22:21
Nice, on the finicky i7 machine test_threaded_import is fixed.

I'm getting another error in test_importlib, both with and without
the patch. I can open another issue for that if you think it's
completely unrelated:

$ ./python -m test -uall -F test_importlib      
[  1] test_importlib
test test_importlib failed -- Traceback (most recent call last):
  File "/home/stefan/hg/cpython/Lib/test/test_importlib/test_locks.py", line 80, in test_deadlock
    self.assertEqual(results.count((True, False)), 1)
AssertionError: 2 != 1

1 test failed:
    test_importlib
msg169230 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-27 22:26
> I'm getting another error in test_importlib, both with and without
> the patch. I can open another issue for that if you think it's
> completely unrelated:

Please do. Thanks!
msg169231 - (view) Author: Roundup Robot (python-dev) Date: 2012-08-27 22:29
New changeset 7fa6336e9864 by Antoine Pitrou in branch 'default':
Issue #15781: Fix two small race conditions in import's module locking.
http://hg.python.org/cpython/rev/7fa6336e9864
msg169423 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-29 22:54
Looks fixed!
msg170028 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-09-08 05:49
Now picked into 3.3.0 release clone as 85070f284fd2.
msg170087 - (view) Author: Roundup Robot (python-dev) Date: 2012-09-09 09:18
New changeset 85070f284fd2 by Antoine Pitrou in branch 'default':
Issue #15781: Fix two small race conditions in import's module locking.
http://hg.python.org/cpython/rev/85070f284fd2
History
Date User Action Args
2012-09-09 09:18:54python-devsetmessages: + msg170087
2012-09-08 05:49:02georg.brandlsetmessages: + msg170028
2012-08-29 22:54:16pitrousetstatus: open -> closed
resolution: fixed
messages: + msg169423

stage: commit review -> resolved
2012-08-27 22:29:38python-devsetnosy: + python-dev
messages: + msg169231
2012-08-27 22:26:28pitrousetmessages: + msg169230
2012-08-27 22:21:36skrahsetmessages: + msg169229
2012-08-27 21:30:53georg.brandlsetmessages: + msg169225
2012-08-27 21:27:32pitrousetmessages: + msg169223
2012-08-27 21:15:58georg.brandlsetmessages: + msg169220
2012-08-27 20:59:47brett.cannonsetmessages: + msg169218
stage: patch review -> commit review
2012-08-27 20:51:32pitrousetfiles: + importrace.patch
priority: normal -> release blocker


keywords: + patch
nosy: + georg.brandl
messages: + msg169217
stage: patch review
2012-08-26 12:08:20christian.heimessetnosy: + christian.heimes
2012-08-25 22:02:08skrahsetmessages: + msg169154
2012-08-25 18:34:09eric.snowsetmessages: + msg169151
2012-08-25 18:31:50eric.snowsetnosy: + eric.snow
messages: + msg169150
2012-08-25 13:19:49skrahsetmessages: + msg169134
2012-08-25 13:17:13skrahsetmessages: + msg169133
2012-08-25 12:55:39ncoghlansetmessages: + msg169132
2012-08-25 10:51:12skrahsetmessages: + msg169129
2012-08-25 10:22:44ncoghlansetmessages: + msg169124
2012-08-25 10:01:05pitrousetmessages: + msg169122
2012-08-25 09:07:59skrahsetmessages: + msg169118
title: test_threaded_import fails on Windows -> test_threaded_import fails with -j4
2012-08-25 08:53:34ncoghlansetnosy: + ncoghlan
2012-08-25 08:33:39skrahcreate