classification
Title: test_semaphore_tracker() of test_multiprocessing_spawn fails randomly (race condition?)
Type: Stage:
Components: Tests Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ncoghlan, stevezh, vstinner
Priority: normal Keywords:

Created on 2017-10-04 09:36 by vstinner, last changed 2018-01-28 14:28 by ncoghlan.

Messages (6)
msg303678 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-04 09:36
Recent example of the slow and busy (load of 8.05 with 2 CPUs) x86 Gentoo Refleaks 3.6 buildbot:

http://buildbot.python.org/all/builders/x86%20Gentoo%20Refleaks%203.6/builds/111/steps/test/logs/stdio

== CPU count: 2
...
1:41:09 load avg: 8.05 [170/405/1] test_multiprocessing_spawn failed -- running: test_asyncio (1118 sec)
beginning 6 repetitions
123456
/buildbot/buildarea/3.6.ware-gentoo-x86.refleak/build/Lib/unittest/case.py:633: ResourceWarning: unclosed file <_io.BufferedReader name=22>
  outcome.errors.clear()
test test_multiprocessing_spawn failed -- Traceback (most recent call last):
  File "/buildbot/buildarea/3.6.ware-gentoo-x86.refleak/build/Lib/test/_test_multiprocessing.py", line 4091, in test_semaphore_tracker
    _multiprocessing.sem_unlink(name2)
AssertionError: OSError not raised
1:41:12 load avg: 7.97 [171/405/1] test_posixpath passed -- running: test_asyncio (1122 sec)
beginning 6 repetitions
123456
......
msg306557 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-11-20 15:39
Failure on x86 Gentoo Refleaks 3.6, two days ago:

http://buildbot.python.org/all/#/builders/72/builds/41

======================================================================
FAIL: test_semaphore_tracker (test.test_multiprocessing_spawn.TestSemaphoreTracker)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/buildbot/buildarea/3.6.ware-gentoo-x86.refleak/build/Lib/test/_test_multiprocessing.py", line 4161, in test_semaphore_tracker
    _multiprocessing.sem_unlink(name2)
AssertionError: OSError not raised
======================================================================
FAIL: test_sys_exit (test.test_multiprocessing_spawn.WithProcessesTestSubclassingProcess)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/buildbot/buildarea/3.6.ware-gentoo-x86.refleak/build/Lib/test/_test_multiprocessing.py", line 584, in test_sys_exit
    self.assertEqual(p.exitcode, 1)
AssertionError: None != 1
msg307355 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-12-01 01:21
x86 Tiger 3.x:

http://buildbot.python.org/all/#/builders/30/builds/260

======================================================================
FAIL: test_semaphore_tracker (test.test_multiprocessing_forkserver.TestSemaphoreTracker)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/_test_multiprocessing.py", line 4380, in test_semaphore_tracker
    _multiprocessing.sem_unlink(name2)
AssertionError: OSError not raised

======================================================================
FAIL: test_forkserver_sigkill (test.test_multiprocessing_forkserver.WithProcessesTestProcess)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/_test_multiprocessing.py", line 653, in test_forkserver_sigkill
    self.check_forkserver_death(signal.SIGKILL)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/_test_multiprocessing.py", line 643, in check_forkserver_death
    self.assertTrue(evt.is_set())
AssertionError: False is not true
msg307356 - (view) Author: stevezh (stevezh) * Date: 2017-12-01 02:38
It maybe the semaphore tracker process doesn't unlink the semaphore as soon as possible when child process exit when the system is slow. Increase test case's time in the child and before assertRaises will help.
msg310360 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-20 21:20
To reproduce the issue, someone should increase the system load to make the system slower and make timeouts more likely, and change the order in which processes are run. I'm using the "stress" utility on Linux for that. But sometimes, the best stress tool... is the Python test suite, since running the full test suite in multiple processes in parallel (./python -m test -j0 -r) runs various workloads, and so sometimes trigger bugs which would be very hard to trigger othewise.

Terminal 1: run "./python -m test -j0 -r", stress or anything else to make the system slower

Terminal 2: run "./python -m test test_multiprocessing_fork -m test_semaphore_tracker -F -v" which runs the test in a loop until it fails.
msg310949 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-01-28 14:28
A warning I'm sometimes seeing currently on successful test runs is as follows:

------
/home/ncoghlan/devel/cpython/Lib/multiprocessing/semaphore_tracker.py:55: UserWarning: semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.
  warnings.warn('semaphore_tracker: process died unexpectedly, '
------

So it may be that the system has to be under load for the test suite to notice that failure.
History
Date User Action Args
2018-01-28 14:28:32ncoghlansetnosy: + ncoghlan
messages: + msg310949
2018-01-20 21:20:10vstinnersetmessages: + msg310360
2017-12-01 02:38:08stevezhsetnosy: + stevezh
messages: + msg307356
2017-12-01 01:21:37vstinnersetmessages: + msg307355
2017-11-20 15:39:14vstinnersetmessages: + msg306557
2017-10-04 09:36:35vstinnercreate