Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_threaded_import fails sporadically on Windows and FreeBSD #59804

Closed
vstinner opened this issue Aug 9, 2012 · 39 comments
Closed

test_threaded_import fails sporadically on Windows and FreeBSD #59804

vstinner opened this issue Aug 9, 2012 · 39 comments

Comments

@vstinner
Copy link
Member

vstinner commented Aug 9, 2012

BPO 15599
Nosy @brettcannon, @db3l, @ncoghlan, @pitrou, @vstinner, @ned-deily, @ericsnowcurrently, @koobs
Files
  • issue15599.diff
  • issue15599_osx_10_4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-03-18.11:03:32.480>
    created_at = <Date 2012-08-09.00:59:33.465>
    labels = []
    title = 'test_threaded_import fails sporadically on Windows and FreeBSD'
    updated_at = <Date 2015-03-18.11:03:32.479>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2015-03-18.11:03:32.479>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-03-18.11:03:32.480>
    closer = 'vstinner'
    components = []
    creation = <Date 2012-08-09.00:59:33.465>
    creator = 'vstinner'
    dependencies = []
    files = ['27087', '27271']
    hgrepos = []
    issue_num = 15599
    keywords = ['buildbot']
    message_count = 39.0
    messages = ['167747', '167769', '167795', '167814', '167972', '167989', '168481', '169635', '169637', '169639', '169640', '169661', '169670', '169711', '169713', '169714', '169721', '170400', '170406', '170409', '170412', '170416', '171092', '171105', '171661', '171670', '171740', '171741', '171750', '175488', '175496', '175939', '177717', '177718', '177720', '206682', '219733', '221377', '238414']
    nosy_count = 9.0
    nosy_names = ['brett.cannon', 'db3l', 'ncoghlan', 'pitrou', 'vstinner', 'ned.deily', 'python-dev', 'eric.snow', 'koobs']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue15599'
    versions = []

    @vstinner
    Copy link
    Member Author

    vstinner commented Aug 9, 2012

    http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%203.x/builds/3231/steps/test/logs/stdio

    [111/369] test_threaded_import
    test_circular_imports (test.test_threaded_import.ThreadedImportTests) ... Exception in thread Thread-586:
    Traceback (most recent call last):
      File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 639, in _bootstrap_inner
        self.run()
      File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 596, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_threaded_import.py", line 194, in import_ab
        import A
    ImportError: No module named 'A'
    
    Exception in thread Thread-587:
    Traceback (most recent call last):
      File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 639, in _bootstrap_inner
        self.run()
      File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 596, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_threaded_import.py", line 197, in import_ba
        import B
    ImportError: No module named 'B'

    FAIL
    test_import_hangers (test.test_threaded_import.ThreadedImportTests) ... ok
    test_parallel_meta_path (test.test_threaded_import.ThreadedImportTests) ... Trying 20 threads ... OK.
    Trying 50 threads ... OK.
    Trying 20 threads ... OK.
    Trying 50 threads ... OK.
    Trying 20 threads ... OK.
    Trying 50 threads ... OK.
    ok
    test_parallel_module_init (test.test_threaded_import.ThreadedImportTests) ... Trying 20 threads ... OK.
    Trying 50 threads ... OK.
    Trying 20 threads ... OK.
    Trying 50 threads ... OK.
    Trying 20 threads ... OK.
    Trying 50 threads ... OK.
    ok
    test_parallel_path_hooks (test.test_threaded_import.ThreadedImportTests) ... Trying 20 threads ... OK.
    Trying 50 threads ... OK.
    Trying 20 threads ... OK.
    Trying 50 threads ... OK.
    Trying 20 threads ... OK.
    Trying 50 threads ... OK.
    ok
    test_side_effect_import (test.test_threaded_import.ThreadedImportTests) ... ok

    ======================================================================
    FAIL: test_circular_imports (test.test_threaded_import.ThreadedImportTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_threaded_import.py", line 205, in test_circular_imports
        self.assertEqual(set(results), {'a', 'b'})
    AssertionError: Items in the second set but not the first:
    'a'
    'b'

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 9, 2012

    The buildbot is running inside kvm on a heavily loaded machine. Perhaps
    some timeout is too low.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 9, 2012

    The buildbot is running inside kvm on a heavily loaded machine. Perhaps
    some timeout is too low.

    Could you try to run the tests manually after having upped said timeouts? (look for "sleep" in the test file)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 9, 2012

    Antoine Pitrou <report@bugs.python.org> wrote:

    Could you try to run the tests manually after having upped said timeouts? (look for "sleep" in the test file)

    No luck there: The tests pass unmodified (100 times with the -F option).

    @pitrou
    Copy link
    Member

    pitrou commented Aug 11, 2012

    No luck there: The tests pass unmodified (100 times with the -F option).

    You could try "-F -j16" or something similar.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 11, 2012

    With -F -j16 I get these failures both on FreeBSD and Linux, but not
    the original failure:

    ======================================================================
    FAIL: test_parallel_meta_path (test.test_threaded_import.ThreadedImportTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/stefan/hg/cpython/Lib/test/test_threaded_import.py", line 131, in test_parallel_meta_path
        self.check_parallel_module_init()
      File "/home/stefan/hg/cpython/Lib/test/test_threaded_import.py", line 120, in check_parallel_module_init
        self.assertFalse(errors)
    AssertionError: [AttributeError("'module' object has no attribute 'randrange'",)] is not false

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 17, 2012

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 1, 2012

    The FreeBSD machine decided to hang for 1h:

    http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%203.x/builds/3340/steps/test/logs/stdio

    If I'm running the tests manually, they take over half an hour and I get:

    ======================================================================
    FAIL: test_parallel_meta_path (test.test_threaded_import.ThreadedImportTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/stefan/cpython/Lib/test/test_threaded_import.py", line 133, in test_parallel_meta_path
        self.assertEqual(finder.x, finder.numcalls)
    AssertionError: 41 != 42

    Ran 6 tests in 2090.042s

    [I hope this is not a threading bug in FreeBSD that appears under
    high load.]

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 1, 2012

    I see that in the following build test_threading also fails, so I
    wouldn't spend too much time on debugging test_threaded_import:

    ======================================================================
    FAIL: test_waitfor_timeout (test.test_threading.ConditionTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/lock_tests.py", line 531, in test_waitfor_timeout
        self.assertEqual(len(success), 1)
    AssertionError: 0 != 1

    @pitrou
    Copy link
    Member

    pitrou commented Sep 1, 2012

    I see that in the following build test_threading also fails, so I
    wouldn't spend too much time on debugging test_threaded_import:

    Did something change recently on that machine? It's supposed to be a
    stable buildbot.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 1, 2012

    Nothing changed on the buildbot. -- It appears that FreeBSD is unable to
    handle the low switchinterval. With the patch test_threaded_import runs in
    10s.

    Since Linux has no problems whatsoever, I'm inclined to think that's
    a FreeBSD issue.

    I've yet to check if the other problem in test_threading is related.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 1, 2012

    Since Linux has no problems whatsoever, I'm inclined to think that's
    a FreeBSD issue.

    Maybe. If you believe it's so, I guess the patch can be committed.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 1, 2012

    The extreme slowness with a low switch interval is already present in
    3430d7329a3b, so it's not related to the finer-grained import lock.

    The other FreeBSD 9.0 bots don't seem to have that problem, so I'm not
    sure any more if the patch is a good idea.

    FWIW, http://buildbot.python.org/all/buildslaves/koobs-freebsd runs in
    VirtualBox, my system runs in kvm.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 2, 2012

    One of the Windows bots has the same failure in test_parallel_meta_path:

    http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/5643/steps/test/logs/stdio

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 2, 2012

    I'm convinced now that some systems just can't handle a low switch
    interval gracefully. Look at:

    http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/5642/steps/test/logs/stdio

    ======================================================================
    FAIL: test_pending_calls_race (test.test_concurrent_futures.ThreadPoolWaitTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_concurrent_futures.py", line 71, in tearDown
        self.assertLess(dt, 60, "synchronization issue: test lasted too long")
    AssertionError: 152.02985382080078 not less than 60 : synchronization issue: test lasted too long

    That test also uses sys.setswitchinterval(1e-6).

    @pitrou
    Copy link
    Member

    pitrou commented Sep 2, 2012

    I'm convinced now that some systems just can't handle a low switch
    interval gracefully. Look at:
    [...]

    That test also uses sys.setswitchinterval(1e-6).

    Ok, let's bump it to 1e-5 then.

    As for the parallel_meta_path failure, it could be a genuine issue
    (albeit a rather unimportant one; I originally added those tests out of
    completion).

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 2, 2012

    The Lion bot also hangs in test_threading:

    http://buildbot.python.org/all/builders/AMD64%20Lion%203.x/builds/383/steps/test/logs/stdio

    At least the test_threading failures seem to be recent.

    The test_threaded_import *slowness* (40 min!) on FreeBSD/KVM with low switch interval goes back a long way. I checked revisions way before
    importlib was bootstrapped.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 12, 2012

    Stefan, do you plan to commit this patch (or a similar one)? The FreeBSD buildbot is crashing again.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 12, 2012

    I was hesitating because the other FreeBSD bots don't have that problem.
    The other option would be to kick out the FreeBSD/kvm bot in favor of
    FreeBSD/Virtualbox.

    http://buildbot.python.org/all/buildslaves/koobs-freebsd-clang

    seems to be very stable. I could take down FreeBSD/kvm altogether.

    Is Python supposed to work with gil_interval = 1 or is some
    potential flakiness expected?

    @pitrou
    Copy link
    Member

    pitrou commented Sep 12, 2012

    I was hesitating because the other FreeBSD bots don't have that problem.
    The other option would be to kick out the FreeBSD/kvm bot in favor of
    FreeBSD/Virtualbox.

    Well, if the test is just running too slow, changing the switch interval
    sounds easier to me than deploying a whole new setup.

    Is Python supposed to work with gil_interval = 1 or is some
    potential flakiness expected?

    I don't know. It's not supposed to be widely tunable, though. Most
    legitimate uses would be to lower or raise the interval a bit, not
    choose insane values.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 12, 2012

    OK, I'll commit the patch soon.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 12, 2012

    New changeset 9b40d018e4cf by Stefan Krah in branch 'default':
    Issue bpo-15599: FreeBSD on KVM cannot handle a very low switch interval.
    http://hg.python.org/cpython/rev/9b40d018e4cf

    @ned-deily
    Copy link
    Member

    OS X 10.4 (and presumably earlier versions) also does not handle the low switchinterval gracefully. Monitoring system activity and running with -v, test_threaded_import does seem to progress but *very* slowly (I quit after 45 minutes with the test only partially complete) at 100% CPU (mainly sys time). 10.5 and later systems are fine. Here's a patch that extends the FreeBSD solution to OS X 10.4 systems.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 24, 2012

    test_threaded_import also takes around 5 min on Windows. I think
    we should go for 0.00001 on all platforms.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 30, 2012

    For once a Linux bot shows the failure that is the topic of this
    issue:

    http://buildbot.python.org/all/builders/x86%20Gentoo%20Non-Debug%203.x/builds/3101/steps/test/logs/stdio

    ======================================================================
    FAIL: test_circular_imports (test.test_threaded_import.ThreadedImportTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/test_threaded_import.py", line 205, in test_circular_imports
        self.assertEqual(set(results), {'a', 'b'})
    AssertionError: Items in the second set but not the first:
    'b'
    'a'

    @pitrou
    Copy link
    Member

    pitrou commented Sep 30, 2012

    For once a Linux bot shows the failure that is the topic of this
    issue:

    Hmm, so there may be a small race condition after all?

    The problem is that I can't reproduce, on a non-debug build as on that
    buildbot, after more than 43000 (!) test runs. Either this failure is
    incredibly sporadic, or it needs certain conditions my system doesn't
    meet.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Oct 1, 2012

    I can't reproduce it either even on the machine that magically caught
    every problem in bpo-15781.

    FWIW, the Gentoo bot also had a completely isolated segfault in
    test_ssl lately.

    Antoine, are you OK with setting the switch interval to 1e-5 for
    all platforms? Otherwise we'll probably have many platform specific
    workarounds.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 1, 2012

    Antoine, are you OK with setting the switch interval to 1e-5 for
    all platforms? Otherwise we'll probably have many platform specific
    workarounds.

    Yes, I'm ok with it.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 1, 2012

    New changeset 484a4b9349af by Stefan Krah in branch '3.3':
    Issue bpo-15599: Increase the switch interval. Several systems cannot handle
    http://hg.python.org/cpython/rev/484a4b9349af

    @ericsnowcurrently
    Copy link
    Member

    Is this still an issue?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 13, 2012

    Well, there was a test_circular_imports() failure on (I think) the
    Windows-amd64 bot last week, but I can't access it any more.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 18, 2012

    The occasional test_parallel_meta_path failure is really weird. The sys.meta_path module finding in _bootstrap.py is protected by the global import lock, so there shouldn't be any failure here. I'm starting to suspect some issue in the lock code itself.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 18, 2012

    New changeset 1ee9cf5a5351 by Antoine Pitrou in branch '3.3':
    Add sanity assertions in some import lock code (issue bpo-15599).
    http://hg.python.org/cpython/rev/1ee9cf5a5351

    New changeset 7cfca9ede861 by Antoine Pitrou in branch 'default':
    Add sanity assertions in some import lock code (issue bpo-15599).
    http://hg.python.org/cpython/rev/7cfca9ede861

    @pitrou pitrou changed the title test_circular_imports() of test_threaded_import fails on FreeBSD 9.0 test_threaded_import fails sporadically on Windows and FreeBSD Dec 18, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 18, 2012

    New changeset 67d6bb557d67 by Antoine Pitrou in branch '3.3':
    Try to make test_threaded_import faster and safer (issue bpo-15599).
    http://hg.python.org/cpython/rev/67d6bb557d67

    New changeset 30c7cc62303a by Antoine Pitrou in branch 'default':
    Try to make test_threaded_import faster and safer (issue bpo-15599).
    http://hg.python.org/cpython/rev/30c7cc62303a

    @ncoghlan
    Copy link
    Contributor

    Another recent instance of the parallel meta path failure:

    http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7740/steps/test/logs/stdio

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 3, 2014

    A recent failure. As usual, the test passed at the second run.

    http://buildbot.python.org/all/builders/AMD64%20FreeBSD%2010.0%203.x/builds/2180/steps/test/logs/stdio

    ======================================================================
    FAIL: test_parallel_meta_path (test.test_threaded_import.ThreadedImportTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/test_threaded_import.py", line 135, in test_parallel_meta_path
        self.assertEqual(finder.x, finder.numcalls)
    AssertionError: 89 != 90

    Re-running test 'test_threaded_import' in verbose mode
    (...)
    Ran 6 tests in 3.109s

    @db3l
    Copy link
    Contributor

    db3l commented Jun 23, 2014

    I've been experimenting with setting up a Windows 8.1 buildbot, and found this ticket after finding a problem with test_threaded_import, testing against the 3.4 branch.

    I seem to be have a low syscheckinterval issue similar to that discussed here on some platforms, though I run into it sooner than 0.00001.

    If I change the syscheckinterval adjustment to 0.001 the tests run in about 4s. Just slightly below that, 0.0009, it can take well over an hour when run manually, always getting killed due to a timeout when running the buildbot test batch file. Each use of check_module_parallel_init in the test takes 20-30 minutes. During this time the CPU remains pegged at 100%.

    I don't see any additional slow-down between 0.0009 and the default of 0.00001, so it feels more like crossing a threshold somewhere around 1ms than scaling equally around and through that point.

    While the machine is not tremendously powerful (it's an Azure VM - single core ~2GHz), everything else is working well and build times and the remainder of the full test suite run in a reasonable time.

    -- David

    @vstinner
    Copy link
    Member Author

    No real activity on this issue. Since the issue is sporadic and I'm not interested to work on it, I just close the issue as out of date.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants