classification
Title: test_threaded_import fails sporadically on Windows and FreeBSD
Type: Stage:
Components: Versions:
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, db3l, eric.snow, koobs, ncoghlan, ned.deily, pitrou, python-dev, vstinner
Priority: normal Keywords: buildbot

Created on 2012-08-09 00:59 by vstinner, last changed 2015-03-18 11:03 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
issue15599.diff skrah, 2012-09-01 14:57
issue15599_osx_10_4.patch ned.deily, 2012-09-24 03:55 review
Messages (39)
msg167747 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-08-09 00:59
http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%203.x/builds/3231/steps/test/logs/stdio

[111/369] test_threaded_import
test_circular_imports (test.test_threaded_import.ThreadedImportTests) ... Exception in thread Thread-586:
Traceback (most recent call last):
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 639, in _bootstrap_inner
    self.run()
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 596, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_threaded_import.py", line 194, in import_ab
    import A
ImportError: No module named 'A'

Exception in thread Thread-587:
Traceback (most recent call last):
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 639, in _bootstrap_inner
    self.run()
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 596, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_threaded_import.py", line 197, in import_ba
    import B
ImportError: No module named 'B'

FAIL
test_import_hangers (test.test_threaded_import.ThreadedImportTests) ... ok
test_parallel_meta_path (test.test_threaded_import.ThreadedImportTests) ... Trying 20 threads ... OK.
Trying 50 threads ... OK.
Trying 20 threads ... OK.
Trying 50 threads ... OK.
Trying 20 threads ... OK.
Trying 50 threads ... OK.
ok
test_parallel_module_init (test.test_threaded_import.ThreadedImportTests) ... Trying 20 threads ... OK.
Trying 50 threads ... OK.
Trying 20 threads ... OK.
Trying 50 threads ... OK.
Trying 20 threads ... OK.
Trying 50 threads ... OK.
ok
test_parallel_path_hooks (test.test_threaded_import.ThreadedImportTests) ... Trying 20 threads ... OK.
Trying 50 threads ... OK.
Trying 20 threads ... OK.
Trying 50 threads ... OK.
Trying 20 threads ... OK.
Trying 50 threads ... OK.
ok
test_side_effect_import (test.test_threaded_import.ThreadedImportTests) ... ok

======================================================================
FAIL: test_circular_imports (test.test_threaded_import.ThreadedImportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_threaded_import.py", line 205, in test_circular_imports
    self.assertEqual(set(results), {'a', 'b'})
AssertionError: Items in the second set but not the first:
'a'
'b'

----------------------------------------------------------------------
msg167769 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-09 08:08
The buildbot is running inside kvm on a heavily loaded machine. Perhaps
some timeout is too low.
msg167795 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-09 17:06
> The buildbot is running inside kvm on a heavily loaded machine. Perhaps
> some timeout is too low.

Could you try to run the tests manually after having upped said timeouts? (look for "sleep" in the test file)
msg167814 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-09 18:54
Antoine Pitrou <report@bugs.python.org> wrote:
> Could you try to run the tests manually after having upped said timeouts? (look for "sleep" in the test file)

No luck there: The tests pass unmodified (100 times with the -F option).
msg167972 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-11 15:11
> No luck there: The tests pass unmodified (100 times with the -F option).

You could try "-F -j16" or something similar.
msg167989 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-11 18:44
With -F -j16 I get these failures both on FreeBSD and Linux, but not
the original failure:

======================================================================
FAIL: test_parallel_meta_path (test.test_threaded_import.ThreadedImportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/stefan/hg/cpython/Lib/test/test_threaded_import.py", line 131, in test_parallel_meta_path
    self.check_parallel_module_init()
  File "/home/stefan/hg/cpython/Lib/test/test_threaded_import.py", line 120, in check_parallel_module_init
    self.assertFalse(errors)
AssertionError: [AttributeError("'module' object has no attribute 'randrange'",)] is not false
msg168481 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-17 21:52
Reproduced by another FreeBSD bot:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%20dtrace%203.x/builds/238/steps/test/logs/stdio
msg169635 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 14:20
The FreeBSD machine decided to hang for 1h:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%203.x/builds/3340/steps/test/logs/stdio


If I'm running the tests manually, they take over half an hour and I get:

======================================================================
FAIL: test_parallel_meta_path (test.test_threaded_import.ThreadedImportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/stefan/cpython/Lib/test/test_threaded_import.py", line 133, in test_parallel_meta_path
    self.assertEqual(finder.x, finder.numcalls)
AssertionError: 41 != 42

----------------------------------------------------------------------
Ran 6 tests in 2090.042s


[I hope this is not a threading bug in FreeBSD that appears under
high load.]
msg169637 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 14:31
I see that in the following build test_threading also fails, so I
wouldn't spend too much time on debugging test_threaded_import:

 ======================================================================
FAIL: test_waitfor_timeout (test.test_threading.ConditionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/lock_tests.py", line 531, in test_waitfor_timeout
    self.assertEqual(len(success), 1)
AssertionError: 0 != 1
msg169639 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-01 14:47
> I see that in the following build test_threading also fails, so I
> wouldn't spend too much time on debugging test_threaded_import:

Did something change recently on that machine? It's supposed to be a
stable buildbot.
msg169640 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 14:57
Nothing changed on the buildbot. -- It appears that FreeBSD is unable to
handle the low switchinterval. With the patch test_threaded_import runs in
10s.


Since Linux has no problems whatsoever, I'm inclined to think that's
a FreeBSD issue.


I've yet to check if the other problem in test_threading is related.
msg169661 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-01 18:49
> Since Linux has no problems whatsoever, I'm inclined to think that's
> a FreeBSD issue.

Maybe. If you believe it's so, I guess the patch can be committed.
msg169670 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 20:29
The extreme slowness with a low switch interval is already present in
3430d7329a3b, so it's not related to the finer-grained import lock.

The other FreeBSD 9.0 bots don't seem to have that problem, so I'm not
sure any more if the patch is a good idea.

FWIW, http://buildbot.python.org/all/buildslaves/koobs-freebsd runs in
VirtualBox, my system runs in kvm.
msg169711 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-02 17:57
One of the Windows bots has the same failure in test_parallel_meta_path:

http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/5643/steps/test/logs/stdio
msg169713 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-02 18:02
I'm convinced now that some systems just can't handle a low switch
interval gracefully. Look at:

 http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/5642/steps/test/logs/stdio


======================================================================
FAIL: test_pending_calls_race (test.test_concurrent_futures.ThreadPoolWaitTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_concurrent_futures.py", line 71, in tearDown
    self.assertLess(dt, 60, "synchronization issue: test lasted too long")
AssertionError: 152.02985382080078 not less than 60 : synchronization issue: test lasted too long

----------------------------------------------------------------------


That test also uses sys.setswitchinterval(1e-6).
msg169714 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-02 18:05
> I'm convinced now that some systems just can't handle a low switch
> interval gracefully. Look at:
[...]
> 
> That test also uses sys.setswitchinterval(1e-6).

Ok, let's bump it to 1e-5 then.

As for the parallel_meta_path failure, it could be a genuine issue
(albeit a rather unimportant one; I originally added those tests out of
completion).
msg169721 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-02 20:26
The Lion bot also hangs in test_threading:

http://buildbot.python.org/all/builders/AMD64%20Lion%203.x/builds/383/steps/test/logs/stdio



At least the test_threading failures seem to be recent.


The test_threaded_import *slowness* (40 min!) on FreeBSD/KVM with low switch interval goes back a long way. I checked revisions way before
importlib was bootstrapped.
msg170400 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-12 16:09
Stefan, do you plan to commit this patch (or a similar one)? The FreeBSD buildbot is crashing again.
msg170406 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-12 17:24
I was hesitating because the other FreeBSD bots don't have that problem.
The other option would be to kick out the FreeBSD/kvm bot in favor of
FreeBSD/Virtualbox.

http://buildbot.python.org/all/buildslaves/koobs-freebsd-clang

seems to be very stable. I could take down FreeBSD/kvm altogether.


Is Python supposed to work with gil_interval = 1 or is some
potential flakiness expected?
msg170409 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-12 17:35
> I was hesitating because the other FreeBSD bots don't have that problem.
> The other option would be to kick out the FreeBSD/kvm bot in favor of
> FreeBSD/Virtualbox.

Well, if the test is just running too slow, changing the switch interval
sounds easier to me than deploying a whole new setup.

> Is Python supposed to work with gil_interval = 1 or is some
> potential flakiness expected?

I don't know. It's not supposed to be widely tunable, though. Most
legitimate uses would be to lower or raise the interval a bit, not
choose insane values.
msg170412 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-12 17:41
OK, I'll commit the patch soon.
msg170416 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-09-12 19:37
New changeset 9b40d018e4cf by Stefan Krah in branch 'default':
Issue #15599: FreeBSD on KVM cannot handle a very low switch interval.
http://hg.python.org/cpython/rev/9b40d018e4cf
msg171092 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012-09-24 03:55
OS X 10.4 (and presumably earlier versions) also does not handle the low switchinterval gracefully.  Monitoring system activity and running with -v, test_threaded_import does seem to progress but *very* slowly (I quit after 45 minutes with the test only partially complete) at 100% CPU (mainly sys time).  10.5 and later systems are fine.  Here's a patch that extends the FreeBSD solution to OS X 10.4 systems.
msg171105 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-24 09:22
test_threaded_import also takes around 5 min on Windows. I think
we should go for 0.00001 on all platforms.
msg171661 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-30 19:28
For once a Linux bot shows the failure that is the topic of this
issue:

http://buildbot.python.org/all/builders/x86%20Gentoo%20Non-Debug%203.x/builds/3101/steps/test/logs/stdio

======================================================================
FAIL: test_circular_imports (test.test_threaded_import.ThreadedImportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/test_threaded_import.py", line 205, in test_circular_imports
    self.assertEqual(set(results), {'a', 'b'})
AssertionError: Items in the second set but not the first:
'b'
'a'
msg171670 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-30 20:30
> For once a Linux bot shows the failure that is the topic of this
> issue:

Hmm, so there may be a small race condition after all?

The problem is that I can't reproduce, on a non-debug build as on that
buildbot, after more than 43000 (!) test runs. Either this failure is
incredibly sporadic, or it needs certain conditions my system doesn't
meet.
msg171740 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-10-01 18:10
I can't reproduce it either even on the machine that magically caught
every problem in #15781.

FWIW, the Gentoo bot also had a completely isolated segfault in
test_ssl lately.



Antoine, are you OK with setting the switch interval to 1e-5 for
all platforms? Otherwise we'll probably have many platform specific
workarounds.
msg171741 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-01 18:10
> Antoine, are you OK with setting the switch interval to 1e-5 for
> all platforms? Otherwise we'll probably have many platform specific
> workarounds.

Yes, I'm ok with it.
msg171750 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-10-01 21:39
New changeset 484a4b9349af by Stefan Krah in branch '3.3':
Issue #15599: Increase the switch interval. Several systems cannot handle
http://hg.python.org/cpython/rev/484a4b9349af
msg175488 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-11-13 06:37
Is this still an issue?
msg175496 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-13 10:47
Well, there was a test_circular_imports() failure on (I think) the
Windows-amd64 bot last week, but I can't access it any more.
msg175939 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-19 00:16
Not that it helps much, but here are recent failures on
Windows and FreeBSD:

http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/5938/steps/test/logs/stdio

http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/5936/steps/test/logs/stdio

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%203.x/builds/3737/steps/test/logs/stdio
msg177717 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-12-18 21:22
The occasional test_parallel_meta_path failure is really weird. The sys.meta_path module finding in _bootstrap.py is protected by the global import lock, so there shouldn't be any failure here. I'm starting to suspect some issue in the lock code itself.
msg177718 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-12-18 21:23
New changeset 1ee9cf5a5351 by Antoine Pitrou in branch '3.3':
Add sanity assertions in some import lock code (issue #15599).
http://hg.python.org/cpython/rev/1ee9cf5a5351

New changeset 7cfca9ede861 by Antoine Pitrou in branch 'default':
Add sanity assertions in some import lock code (issue #15599).
http://hg.python.org/cpython/rev/7cfca9ede861
msg177720 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-12-18 22:06
New changeset 67d6bb557d67 by Antoine Pitrou in branch '3.3':
Try to make test_threaded_import faster and safer (issue #15599).
http://hg.python.org/cpython/rev/67d6bb557d67

New changeset 30c7cc62303a by Antoine Pitrou in branch 'default':
Try to make test_threaded_import faster and safer (issue #15599).
http://hg.python.org/cpython/rev/30c7cc62303a
msg206682 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-20 14:32
Another recent instance of the parallel meta path failure:

http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7740/steps/test/logs/stdio
msg219733 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-03 23:42
A recent failure. As usual, the test passed at the second run.

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%2010.0%203.x/builds/2180/steps/test/logs/stdio

======================================================================
FAIL: test_parallel_meta_path (test.test_threaded_import.ThreadedImportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/test_threaded_import.py", line 135, in test_parallel_meta_path
    self.assertEqual(finder.x, finder.numcalls)
AssertionError: 89 != 90


Re-running test 'test_threaded_import' in verbose mode
(...)
Ran 6 tests in 3.109s
msg221377 - (view) Author: David Bolen (db3l) Date: 2014-06-23 21:10
I've been experimenting with setting up a Windows 8.1 buildbot, and found this ticket after finding a problem with test_threaded_import, testing against the 3.4 branch.

I seem to be have a low syscheckinterval issue similar to that discussed here on some platforms, though I run into it sooner than 0.00001.

If I change the syscheckinterval adjustment to 0.001 the tests run in about 4s.  Just slightly below that, 0.0009, it can take well over an hour when run manually, always getting killed due to a timeout when running the buildbot test batch file.  Each use of check_module_parallel_init in the test takes 20-30 minutes.  During this time the CPU remains pegged at 100%.  

I don't see any additional slow-down between 0.0009 and the default of 0.00001, so it feels more like crossing a threshold somewhere around 1ms than scaling equally around and through that point.

While the machine is not tremendously powerful (it's an Azure VM - single core ~2GHz), everything else is working well and build times and the remainder of the full test suite run in a reasonable time.

-- David
msg238414 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-18 11:03
No real activity on this issue. Since the issue is sporadic and I'm not interested to work on it, I just close the issue as out of date.
History
Date User Action Args
2015-03-18 11:03:32vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg238414
2014-10-14 15:50:31skrahsetnosy: - skrah
2014-06-23 21:10:32db3lsetnosy: + db3l
messages: + msg221377
2014-06-03 23:42:29vstinnersetnosy: + vstinner
messages: + msg219733
2013-12-20 14:32:36ncoghlansetkeywords: + buildbot, - patch
nosy: + ncoghlan
messages: + msg206682

2013-01-03 02:45:31vstinnersetnosy: - vstinner
2012-12-18 22:06:01python-devsetmessages: + msg177720
2012-12-18 21:26:59pitrousettitle: test_circular_imports() of test_threaded_import fails on FreeBSD 9.0 -> test_threaded_import fails sporadically on Windows and FreeBSD
2012-12-18 21:23:41python-devsetmessages: + msg177718
2012-12-18 21:22:45pitrousetmessages: + msg177717
2012-12-04 09:46:37koobssetnosy: + koobs
2012-12-04 09:30:04pitroulinkissue16604 superseder
2012-11-19 00:16:11skrahsetmessages: + msg175939
2012-11-13 10:47:28skrahsetstatus: pending -> open

messages: + msg175496
2012-11-13 06:37:58eric.snowsetstatus: open -> pending
nosy: + eric.snow
messages: + msg175488

2012-10-01 21:39:24python-devsetmessages: + msg171750
2012-10-01 18:10:58pitrousetmessages: + msg171741
2012-10-01 18:10:17skrahsetmessages: + msg171740
2012-09-30 20:30:08pitrousetmessages: + msg171670
2012-09-30 19:28:03skrahsetmessages: + msg171661
2012-09-24 09:22:36skrahsetmessages: + msg171105
2012-09-24 03:55:03ned.deilysetfiles: + issue15599_osx_10_4.patch
nosy: + ned.deily
messages: + msg171092

2012-09-12 19:37:54python-devsetnosy: + python-dev
messages: + msg170416
2012-09-12 17:41:13skrahsetmessages: + msg170412
2012-09-12 17:35:20pitrousetmessages: + msg170409
2012-09-12 17:24:51skrahsetmessages: + msg170406
2012-09-12 16:09:49pitrousetmessages: + msg170400
2012-09-02 20:26:08skrahsetmessages: + msg169721
2012-09-02 18:05:52pitrousetmessages: + msg169714
2012-09-02 18:02:47skrahsetmessages: + msg169713
2012-09-02 17:57:43skrahsetmessages: + msg169711
2012-09-01 20:29:12skrahsetmessages: + msg169670
2012-09-01 18:49:20pitrousetmessages: + msg169661
2012-09-01 14:57:56skrahsetfiles: + issue15599.diff
keywords: + patch
messages: + msg169640
2012-09-01 14:47:37pitrousetmessages: + msg169639
2012-09-01 14:31:20skrahsetmessages: + msg169637
2012-09-01 14:20:06skrahsetmessages: + msg169635
2012-08-17 21:52:13skrahsetmessages: + msg168481
2012-08-11 18:44:50skrahsetmessages: + msg167989
2012-08-11 15:11:38pitrousetmessages: + msg167972
2012-08-09 18:54:02skrahsetmessages: + msg167814
2012-08-09 17:06:59pitrousetnosy: + pitrou
messages: + msg167795
2012-08-09 08:08:07skrahsetnosy: + skrah
messages: + msg167769
2012-08-09 00:59:33vstinnercreate