classification
Title: test_eintr randomly fails on FreeBSD
Type: Stage:
Components: Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, cem, python-dev, vstinner
Priority: normal Keywords:

Created on 2015-09-15 10:09 by vstinner, last changed 2015-10-11 20:10 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
eintr_bug.tar.gz vstinner, 2015-09-16 22:06
Messages (17)
msg250753 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-15 10:09
I'm unable to reproduce the hang. It's probably a race condition since sometimes the test pass. It may depend on the system load, the number of CPU cores, and other factors.

I tried to use faulthandler.dump_traceback_later() in the changeset ebccac60b9e7, but it didn't print the traceback of the blocked test. So I don't know which eintr test is blocked.

In the changeset ed0e6a9c11af, I rewrote test_eintr.py to use subprocess instead of os.fork(). I was supposed to reduce the risk of race condition, but the hang still occurs.

On FreeBSD, I know that any thread can receive a signal, and most Python tests only expect that the signal is received from a specific thread. But test_eintr.py runs Lib/test/eintrdata/eintr_tester.py in a subprocess to avoid threads, and Lib/test/eintrdata/eintr_tester.py itself uses subprocess to spawn child processes, so it should also avoid threads. So I don't think that the issue is related to threads.

At least, it would help to know which test hangs.

Example:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.x/builds/3337/steps/test/logs/stdio

[398/398] test_eintr
Timeout (1:00:00)!
Thread 0x0000000801807400 (most recent call first):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/selectors.py", line 375 in select
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/subprocess.py", line 1698 in _communicate
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/subprocess.py", line 1068 in communicate
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/support/script_helper.py", line 86 in run_python_until_end
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/support/script_helper.py", line 96 in _assert_python
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/support/script_helper.py", line 135 in assert_python_ok
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/test_eintr.py", line 17 in test_all
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/case.py", line 600 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/case.py", line 648 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/runner.py", line 176 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/support/__init__.py", line 1775 in _run_suite
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/support/__init__.py", line 1809 in run_unittest
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/regrtest.py", line 1280 in test_runner
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/regrtest.py", line 1281 in runtest_inner
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/regrtest.py", line 968 in runtest
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/regrtest.py", line 532 in main
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/regrtest.py", line 1565 in main_in_temp_cwd
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/regrtest.py", line 1590 in <module>
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/runpy.py", line 85 in _run_code
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/runpy.py", line 170 in _run_module_as_main
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/__main__.py", line 3, in <module>
    regrtest.main_in_temp_cwd()
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/regrtest.py", line 1565, in main_in_temp_cwd
    main()
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/regrtest.py", line 738, in main
    raise Exception("Child error on {}: {}".format(test, result[1]))
Exception: Child error on test_eintr: Exit code 1
*** [buildbottest] Error code 1
msg250754 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-15 10:19
New changeset 3d9164aecc6f by Victor Stinner in branch 'default':
Issue #25122: try to debug test_eintr hang on FreeBSD
https://hg.python.org/cpython/rev/3d9164aecc6f
msg250804 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-15 20:40
New changeset edbc35d8babb by Victor Stinner in branch 'default':
Issue #25122: Fix test_eintr, kill child process on error
https://hg.python.org/cpython/rev/edbc35d8babb
msg250809 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-15 21:00
New changeset 24dbca4e746c by Victor Stinner in branch 'default':
Issue #25122: test_eintr: don't redirect stdout to stderr
https://hg.python.org/cpython/rev/24dbca4e746c
msg250810 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-15 22:02
New changeset 5388fa98f7a3 by Victor Stinner in branch 'default':
Issue #25122: optimize test_eintr
https://hg.python.org/cpython/rev/5388fa98f7a3
msg250818 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-16 06:55
Ok, I got the first interesting result thanks to my enhancements of test_eintr:

[383/398/1] test_eintr
test_read (__main__.OSEINTRTest) ... ok
test_wait (__main__.OSEINTRTest) ... ok
test_wait3 (__main__.OSEINTRTest) ... ok
test_wait4 (__main__.OSEINTRTest) ... ok
test_waitpid (__main__.OSEINTRTest) ... ok
test_write (__main__.OSEINTRTest) ... ok
test_accept (__main__.SocketEINTRTest) ... ok
test_open (__main__.SocketEINTRTest) ...
Timeout (0:05:00)!
Thread 0x0000000801807400 (most recent call first):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/eintrdata/eintr_tester.py", line 340 in <lambda>
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/eintrdata/eintr_tester.py", line 334 in _test_open
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/eintrdata/eintr_tester.py", line 340 in test_open
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/case.py", line 600 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/case.py", line 648 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/runner.py", line 176 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/support/__init__.py", line 1775 in _run_suite
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/support/__init__.py", line 1809 in run_unittest
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/eintrdata/eintr_tester.py", line 461 in test_main
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/eintrdata/eintr_tester.py", line 465 in <module>
test_all (test.test_eintr.EINTRTests) ... FAIL

======================================================================
FAIL: test_all (test.test_eintr.EINTRTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/test_eintr.py", line 25, in test_all
    self.assertEqual(exitcode, 0)
AssertionError: 1 != 0

----------------------------------------------------------------------
Ran 1 test in 306.939s

FAILED (failures=1)
Warning -- files was modified by test_eintr
test test_eintr failed
msg250822 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-16 07:23
New changeset 30bc256f2346 by Victor Stinner in branch 'default':
Issue #25122: add debug traces to test_eintr.test_open()
https://hg.python.org/cpython/rev/30bc256f2346
msg250828 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-16 07:59
More debug traces:


test_open (__main__.SocketEINTRTest) ... try to open '@test_57236_tmp' fifo for writing, pid 57236
try to open '@test_57236_tmp' fifo for reading, pid 57305, ppid 57236
'@test_57236_tmp' fifo opened for reading and closed, pid 57305, ppid 57236
Timeout (0:05:00)!
Thread 0x0000000802006400 (most recent call first):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/test/eintrdata/eintr_tester.py", line 354 in <lambda>
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/test/eintrdata/eintr_tester.py", line 345 in _test_open
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/test/eintrdata/eintr_tester.py", line 354 in test_open
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/unittest/case.py", line 600 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/unittest/case.py", line 648 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/unittest/runner.py", line 176 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/test/support/__init__.py", line 1775 in _run_suite
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/test/support/__init__.py", line 1809 in run_unittest
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/test/eintrdata/eintr_tester.py", line 475 in test_main
  File "/usr/home/buildbot/python/3.x.koobs-freebsd10/build/Lib/test/eintrdata/eintr_tester.py", line 479 in <module>
msg250831 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-16 08:51
Syscalls traced by truss.


Parent process:

 1436: unlink("fifo_5950861521")                 ERR#2 'No such file or directory'
 1436: open("fifo_5950861521",O_WRONLY|O_CLOEXEC,00) ERR#4 'Interrupted system call'
 1436: open("fifo_5950861521",O_WRONLY|O_CLOEXEC,00) ERR#4 'Interrupted system call'
 ...
 1436: open("fifo_5950861521",O_WRONLY|O_CLOEXEC,00) ERR#4 'Interrupted system call'


Child process:

 2689: open("fifo_5950861521",O_CLOEXEC,00)      = 3 (0x3)
 2689: close(3)                                  ERR#4 'Interrupted system call'


Hum, it looks like the child process gets signals, many syscalls are interrupted by signals. In my simple test, 7 syscalls were interrupted by signals in the child process. I didn't expect "setitimer" to be inherited by subprocess.Popen.

The suspicious thing is that close() fails with EINTR in the child process and it is not retried.
msg250865 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-16 22:06
Ok, I'm now quite sure that it's a bug in the FreeBSD kernel or in the BSD C library.

The C library ignores EINTR: if the close() syscalls fails with EINTR, the C close() function returns a success.

When the close() syscall fails with EINTR, the test hangs. It's unclear to me if the close() syscall fails with EINTR in the parent or in the child process. I'm quite sure that the FreeBSD truss tool (tool to trace syscalls) has bugs too, so it's hard to be sure what happens exactly.

Attached tarball eintr_bug.tar.gz reproduces the bug in a program written in pure C language. So it's not a bug in Python.

For test_eintr, we should skip the test on FreeBSD (until the bug is fixed in FreeBSD).
msg250867 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-16 22:38
I reported the bug on the FreeBSD bug tracker:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203162
msg250958 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-18 09:31
New changeset 90722634f211 by Victor Stinner in branch 'default':
Issue #25122: Fix test_eintr.test_open() on FreeBSD
https://hg.python.org/cpython/rev/90722634f211

New changeset f347ea4391f3 by Victor Stinner in branch '3.5':
Issue #25122: sync test_eintr with Python 3.6
https://hg.python.org/cpython/rev/f347ea4391f3
msg250979 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-18 13:19
test_eintr should now be fixed in Python 3.5 and 3.6: I skipped the test_open() and test_os_open() tests of test_eintr on FreeBSD.
msg251219 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-21 12:05
New changeset 3184c43627f5 by Victor Stinner in branch '3.5':
Issue #25122: test_eintr: the FreeBSD fix will be released in FreeBSD 10.3
https://hg.python.org/cpython/rev/3184c43627f5
msg251764 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-28 13:04
New changeset e48ec5f5b82b by Victor Stinner in branch 'default':
Issue #25122: Remove verbose mode of test_eintr
https://hg.python.org/cpython/rev/e48ec5f5b82b
msg252797 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2015-10-11 10:38
> New changeset e48ec5f5b82b by Victor Stinner in branch 'default':
> Issue #25122: Remove verbose mode of test_eintr
> https://hg.python.org/cpython/rev/e48ec5f5b82b

This could be backported to 3.5 branch. (Verbose mode was added there just 10 days earlier.)
msg252824 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-10-11 20:10
We should remove it, but test_eintr still hangs randomly on freebsd.
History
Date User Action Args
2015-10-11 20:10:41vstinnersetmessages: + msg252824
2015-10-11 10:38:10Arfreversetnosy: + Arfrever
messages: + msg252797
2015-09-28 13:04:31python-devsetmessages: + msg251764
2015-09-21 12:05:27python-devsetmessages: + msg251219
2015-09-18 13:19:59vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg250979
2015-09-18 09:31:35python-devsetmessages: + msg250958
2015-09-16 23:14:17cemsetnosy: + cem
2015-09-16 22:38:23vstinnersetmessages: + msg250867
2015-09-16 22:06:48vstinnersetfiles: + eintr_bug.tar.gz

messages: + msg250865
2015-09-16 08:51:46vstinnersetmessages: + msg250831
2015-09-16 07:59:11vstinnersetmessages: + msg250828
2015-09-16 07:23:46python-devsetmessages: + msg250822
2015-09-16 06:55:30vstinnersetmessages: + msg250818
2015-09-15 22:02:00python-devsetmessages: + msg250810
2015-09-15 21:00:20python-devsetmessages: + msg250809
2015-09-15 20:40:15python-devsetmessages: + msg250804
2015-09-15 10:19:37python-devsetnosy: + python-dev
messages: + msg250754
2015-09-15 10:09:38vstinnercreate