classification
Title: test_default_timeout() of test_threading.BarrierTests failure: BrokenBarrierError
Type: Stage: resolved
Components: Tests Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Ivan.Pozdeev, db3l, neologix, python-dev, skrah, vstinner
Priority: normal Keywords: needs review, patch

Created on 2011-04-18 21:59 by vstinner, last changed 2019-07-09 09:38 by vstinner.

Files
File name Uploaded Description Edit
test_barrier_timeout.diff neologix, 2011-07-25 20:43 review
Messages (12)
msg133995 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-18 21:59
While trying to reproduce issue #11870 using "gdb -args ./python Lib/test/regrtest.py -F -v --timeout=600 test_threading", I had the following error on Linux:
----------------------
test_default_timeout (test.test_threading.BarrierTests) ... [Thread 0x7ffff1acf700 (LWP 27178) exited]
[New Thread 0x7ffff1acf700 (LWP 27181)]
[New Thread 0x7ffff12ce700 (LWP 27182)]
[New Thread 0x7ffff3c99700 (LWP 27183)]
[New Thread 0x7ffff325a700 (LWP 27184)]
Unhandled exception in thread started by <function task at 0x1302340>
Unhandled exception in thread started by <function task at 0x1302340>
Unhandled exception in thread started by <function task at 0x1302340>
Traceback (most recent call last):
Unhandled exception in thread started by <function task at 0x1302340>
  File "/home/haypo/prog/HG/cpython/Lib/test/lock_tests.py", line 37, in task
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/haypo/prog/HG/cpython/Lib/test/lock_tests.py", line 37, in task
Traceback (most recent call last):
  File "/home/haypo/prog/HG/cpython/Lib/test/lock_tests.py", line 37, in task
    f()
    f()
  File "/home/haypo/prog/HG/cpython/Lib/test/lock_tests.py", line 37, in task
  File "/home/haypo/prog/HG/cpython/Lib/test/lock_tests.py", line 838, in f
ERROR
----------------------

----------------------
ERROR: test_default_timeout (test.test_threading.BarrierTests)

Traceback (most recent call last):
  File "/home/haypo/prog/HG/cpython/Lib/test/lock_tests.py", line 843, in test_default_timeout
    self.run_threads(f)
  File "/home/haypo/prog/HG/cpython/Lib/test/lock_tests.py", line 672, in run_threads
    f()
  File "/home/haypo/prog/HG/cpython/Lib/test/lock_tests.py", line 838, in f
    i = barrier.wait()
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 472, in wait
    self._enter() # Block while the barrier drains.
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 496, in _enter
    raise BrokenBarrierError
threading.BrokenBarrierError
----------------------

The error occured on:

 * Ubuntu 10.04
 * Python 3.3 (2127df2c972e)
 * Intel(R) Core(TM)2 Quad  CPU   Q9300  @ 2.50GHz
 * 4 GB of memory
msg134583 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-04-27 15:37
The most obvious explanation for that failure is that the barrier's timeout is too low.

   def test_default_timeout(self):
       """
       Test the barrier's default timeout
       """
       #create a barrier with a low default timeout
       barrier = self.barriertype(self.N, timeout=0.1)

If the last thread waits on the barrier more than 0.1s after the first thread, then you'll get a BrokenBarrierError.
A 0.1s delay is not that much, 100ms was the default quantum with Linux O(1) scheduler...
msg141124 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-07-25 20:43
The attached patch bumps the barrier's default timeout to 300ms: it should be more than enough (unless you got a really crappy scheduler, or a really heavily loaded machine), especially since this problem doesn't seem to occur often (AFAICT).
msg141241 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-07-27 16:44
Victor, can I commit it?
msg141263 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-27 18:50
YES YOU CAN
msg141270 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-27 19:28
New changeset aa9c0fdf2143 by Charles-François Natali in branch '3.2':
Issue #11871: In test_threading.BarrierTests, bump the default barrier timeout
http://hg.python.org/cpython/rev/aa9c0fdf2143

New changeset e8da570d29a8 by Charles-François Natali in branch 'default':
Issue #11871: In test_threading.BarrierTests, bump the default barrier timeout
http://hg.python.org/cpython/rev/e8da570d29a8
msg141274 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-07-27 20:35
> YES YOU CAN

:-)
msg200564 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-10-20 10:56
It looks like it happened again:

http://buildbot.python.org/all/builders/x86%20Gentoo%20Non-Debug%203.x/builds/5223/steps/test/logs/stdio


======================================================================
ERROR: test_default_timeout (test.test_threading.BarrierTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/lock_tests.py", line 876, in test_default_timeout
    self.run_threads(f)
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/lock_tests.py", line 705, in run_threads
    f()
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/lock_tests.py", line 871, in f
    i = barrier.wait()
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/threading.py", line 616, in wait
    self._wait(timeout)
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/threading.py", line 654, in _wait
    raise BrokenBarrierError
threading.BrokenBarrierError

----------------------------------------------------------------------
Ran 138 tests in 22.880s
msg340889 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-26 07:35
The test still fails randomly:
https://buildbot.python.org/all/#/builders/3/builds/2469
msg340949 - (view) Author: David Bolen (db3l) Date: 2019-04-26 22:07
I should mention that a high level of test parallelism on the part of my worker might have be a contributing factor in this most recent case.

The worker was recently upgraded to a faster 4-core VM, but with limited I/O.  In a test run the test processes invariably end up stuck on I/O heavy tests, idling the CPUs.

So I've been running the tests under -j8, as I found it the most effective combination of supporting tests stuck on I/O while keeping the CPUs busy, but it does mean that in some cases there's a lot pending on the CPUs, and depending on the exact test ordering in a run presumably some more sensitive tests could be impacted.

I have in fact seen an increase in random tests generating warnings (fail, then pass) than the worker had previously.  I suspect the benefits of the extra parallelism on total test time (-j8 is about 20% faster than -j4) probably isn't valuable enough and will most likely be reducing it a bit.
msg344118 - (view) Author: Ivan Pozdeev (Ivan.Pozdeev) * Date: 2019-05-31 20:24
Got this issue today in AppVeyor's PR check: https://ci.appveyor.com/project/python/cpython/builds/24945165, so it's not local to David's worker.
(At rerun, the test succeeeded, so the check status was not affected.)
msg347533 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-09 09:38
test_threading failed on AMD64 Windows8.1 Refleaks 3.x, the unittest errors are missing, but unraisable exceptions look similar:
https://buildbot.python.org/all/#/builders/80/builds/646

0:21:11 load avg: 7.30 [156/419/3] test_threading failed (2 min 56 sec) -- running: test_concurrent_futures (7 min 40 sec), test_email (1 min 34 sec)
beginning 6 repetitions
123456
.....Warning -- Unraisable exceptionWarning -- Unraisable exception

Exception ignored in thread started byException ignored in thread started by: : <function Bunch.__init__.<locals>.task at 0x000000C7CAC57AF0><function Bunch.__init__.<locals>.task at 0x000000C7CAC57AF0>

Traceback (most recent call last):
Traceback (most recent call last):
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\test\lock_tests.py", line 41, in task
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\test\lock_tests.py", line 41, in task
    f()    
f()  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\test\lock_tests.py", line 939, in f

  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\test\lock_tests.py", line 939, in f
    i = barrier.wait()    
i = barrier.wait()  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\threading.py", line 620, in wait

  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\threading.py", line 620, in wait
    self._wait(timeout)
      File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\threading.py", line 660, in _wait
self._wait(timeout)
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\threading.py", line 660, in _wait
    raise BrokenBarrierError
threading.    BrokenBarrierErrorraise BrokenBarrierError: 

threading.BrokenBarrierError: 
Warning -- Unraisable exception
Exception ignored in thread started by: <function Bunch.__init__.<locals>.task at 0x000000C7CAC57AF0>
Traceback (most recent call last):
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\test\lock_tests.py", line 41, in task
    f()
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\test\lock_tests.py", line 939, in f
    i = barrier.wait()
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\threading.py", line 611, in wait
    self._enter() # Block while the barrier drains.
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\threading.py", line 635, in _enter
    raise BrokenBarrierError
threading.BrokenBarrierError: 
Warning -- Unraisable exception
Exception ignored in thread started by: <function Bunch.__init__.<locals>.task at 0x000000C7CAC57AF0>
Traceback (most recent call last):
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\test\lock_tests.py", line 41, in task
    f()
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\test\lock_tests.py", line 939, in f
    i = barrier.wait()
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\threading.py", line 611, in wait
    self._enter() # Block while the barrier drains.
  File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\threading.py", line 635, in _enter
    raise BrokenBarrierError
threading.BrokenBarrierError: 
test test_threading failed -- multiple errors occurred; run in verbose mode for details
History
Date User Action Args
2019-07-09 09:38:49vstinnersetmessages: + msg347533
2019-05-31 20:24:46Ivan.Pozdeevsetnosy: + Ivan.Pozdeev

messages: + msg344118
versions: + Python 3.7, Python 3.8, Python 3.9, - Python 3.2, Python 3.3
2019-04-26 22:07:11db3lsetnosy: + db3l
messages: + msg340949
2019-04-26 07:35:47vstinnersetstatus: closed -> open
resolution: fixed ->
messages: + msg340889
2013-10-20 10:56:04skrahsetnosy: + skrah
messages: + msg200564
2011-07-27 20:35:11neologixsetstatus: open -> closed
versions: + Python 3.2
messages: + msg141274

resolution: fixed
stage: patch review -> resolved
2011-07-27 19:28:01python-devsetnosy: + python-dev
messages: + msg141270
2011-07-27 18:50:23vstinnersetmessages: + msg141263
2011-07-27 16:44:39neologixsetmessages: + msg141241
2011-07-25 20:43:31neologixsetkeywords: + patch, needs review
files: + test_barrier_timeout.diff
messages: + msg141124

stage: patch review
2011-04-27 15:37:25neologixsetnosy: + neologix
messages: + msg134583
2011-04-18 21:59:03vstinnercreate