classification
Title: [2.7] test_io: race condition in test_interrupted_write_text() (test_io hangs on x86 Gentoo Refleaks 2.7)
Type: Stage: resolved
Components: Tests Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: miss-islington, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2017-10-09 07:13 by vstinner, last changed 2018-12-18 23:10 by miss-islington. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 7459 merged vstinner, 2018-06-06 21:18
PR 11225 merged vstinner, 2018-12-18 22:22
PR 11226 merged miss-islington, 2018-12-18 22:52
Messages (16)
msg303924 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-09 07:13
test_io is running since longer than 5 hours on x86 Gentoo Refleaks 2.7:

http://buildbot.python.org/all/#/builders/78/builds/1/steps/4/logs/stdio

2:12:11 load avg: 3.26 [402/403] test_contains passed -- running: test_io (1739 sec)
beginning 6 repetitions
123456
......
running: test_io (1769 sec)
running: test_io (1799 sec)
(...)
running: test_io (19529 sec)

http://buildbot.python.org/all/#/builders/78/builds/1/steps/4/logs/stdio
msg303925 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-09 07:14
I interrupted the build.
msg304879 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 09:13
test_io was blocked a second time:

http://buildbot.python.org/all/#/builders/78/builds/3

running: test_io (68517 sec)
running: test_io (68547 sec)
running: test_io (68577 sec)
command interrupted, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=69935.628661
msg318111 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-29 21:58
I didn't see the failure recently. I close the issue.
msg318121 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2018-05-29 22:10
test_io is currently hung on that builder:

buildbot 13541  0.0  0.1   4920  2508 pts/1    Ss+  May28   0:00  \_ make buildbottest TESTOPTS=-j2 -R 3:3 -u-cpu TESTPYTHONOPTS= TESTTIMEOUT=11700
buildbot 13556  0.2  0.4  33620  9952 pts/1    Sl+  May28   2:45  |   \_ ./python -Wd -3 -E -tt -R ./Lib/test/regrtest.py -uall --slowest -rwW -j2 -R 3:3 -u-cpu
buildbot 28750  0.0  1.1  65412 24428 pts/1    S+   May28   0:54  |       \_ /buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/python -E -3 -R -Wd -m test.regrtest --slaveargs [["test_io", 0, false], {"match_tests": null, "pgo": false, "huntrleaks": [3, 3, "reflog.txt"], "failfast": false, "use_resources": ["audio", "curses", "largefile", "network", "bsddb", "decimal", "subprocess", "urlfetch", "gui", "xpickle"]}]
msg318783 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-05 21:43
I succeeded to attach gdb to a regrtest slave process stuck in test_io... but somehow attaching the process into gdb unblocked the process... and the test completed.


Sadly, the first time that I attached the process, I forgot to allow a directory to load python-gdb.py. So I had to detach and attach again the process to get a working "py-bt" command. Maybe this action unblocked Python. It's hard to say.

(gdb) py-bt
Traceback (most recent call first):
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/_pyio.py", line 1126, in _flush_unlocked
    def _flush_unlocked(self):
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/_pyio.py", line 1104, in write
    self._flush_unlocked()
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/_pyio.py", line 1302, in write
    return BufferedWriter.write(self, b)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/test_io.py", line 1186, in check_writes
    self.assertEqual(bufio.write(contents[n:n+size]), size)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/test_io.py", line 1723, in test_writes_and_peek
    self.check_writes(_peek)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 329, in run
    testMethod()
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 108, in run
    test(result)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 108, in run
    test(result)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 1461, in run
    test(result)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 1535, in _run_suite
    result = runner.run(suite)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 1626, in run_unittest
    _run_suite(suite)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/test_io.py", line 3367, in test_main
    support.run_unittest(*tests)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1361, in run_the_test
    indirect_test()
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1375, in dash_R
    run_the_test()
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1239, in runtest_inner
    huntrleaks)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1044, in runtest
    return runtest_inner(test, verbose, quiet, huntrleaks, pgo, testdir)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 513, in main
    result = runtest(*args, **kwargs)
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 2025, in main_in_temp_cwd
    main()
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 2038, in <module>
    main_in_temp_cwd()
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)

(gdb) where
#0  0x0050b6fd in PyEval_EvalFrameEx (
    f=Frame 0xb6bd1634, for file /buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/_pyio.py, line 1126, in _flush_unlocked (self=<BufferedRandom(_raw=<PyMockRawIO(_write_stack=['\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07', '\x08', '\t', '\n', '\x0b', '\x0c', '\r', '\x0e', '\x0f\x10', '\x11\x12', '\x13\x14', '\x15\x16', '\x17\x18', '\x19\x1a', '\x1b\x1c', '\x1d\x1e', '\x1f ', '!"', '#$', '%&', "'(", ')*', '+,', '-./', '012', '345', '678', '9:;', '<=>', '?@A', 'BCD', 'EFG', 'HIJ', 'KLM', 'NOP', 'QRS', 'TUV', 'WXY', 'Z[\\]', '^_`a', 'bcde', 'fghi', 'jklm', 'nopq', 'rstu', 'vwxy', 'z{|}', '~\x7f\x80\x81', '\x82\x83\x84\x85', '\x86\x87\x88\x89', '\x8a\x8b\x8c\x8d', '\x8e\x8f\x90\x91', '\x92\x93\x94\x95', '\x96\x97\x98\x99\x9a', '\x9b\x9c\x9d\x9e\x9f', '\xa0\xa1\xa2\xa3\xa4', '\xa5\xa6\xa7\xa8\xa9', '\xaa\xab\xac\xad\xae', '\xaf\xb0\xb1\xb2\xb3', '\xb4\xb5\xb6\xb7\xb8', '\xb9\xba\xbb\xbc\xbd', '\xbe\xbf\xc0\xc1\xc2', '\xc3\xc4\xc5\xc6\xc7', '\xc8\xc9\xca\xcb\xcc', '\xcd\xce\xcf\xd0\xd1', '\xd2\x...(truncated), 
    throwflag=0) at Python/ceval.c:1094
#1  0x0051a9e8 in PyEval_EvalCodeEx (co=0xb6ccdda8, 
    globals={'BufferedReader': <ABCMeta(_abc_cache=<WeakSet(data=set([<weakref at remote 0xb6c21434>, <weakref at remote 0xb6bcc8b4>]), _pending_removals=[], _remove=<function at remote 0xb6687c64>, _iterating=set([])) at remote 0xb668a4cc>, peek=<function at remote 0xb6687ad4>, _abc_negative_cache_version=26, tell=<function at remote 0xb6687bc4>, read=<function at remote 0xb6687a34>, __doc__=u'BufferedReader(raw[, buffer_size])\n\n    A buffer for a readable, sequential BaseRawIO object.\n\n    The constructor creates a BufferedReader for the given readable raw\n    stream and buffer_size. If buffer_size is omitted, DEFAULT_BUFFER_SIZE\n    is used.\n    ', __init__=<function at remote 0xb6687994>, _reset_read_buf=<function at remote 0xb66879e4>, _abc_negative_cache=<WeakSet(data=set([]), _pending_removals=[], _remove=<function at remote 0xb66b7da4>, _iterating=set([])) at remote 0xb668a0a4>, __module__='_pyio', __abstractmethods__=frozenset([]), _peek_unlocked=<function at remote 0xb6687b24>, _read_unlocked=<function a...(truncated), 
    locals=0x0, args=0xb6c1cb50, argcount=1, kws=0xb6c1cb54, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3604
#2  0x0051d618 in fast_function (func=<function at remote 0xb6687e94>, pp_stack=0xbfbcc660, n=1, na=1, nk=0) at Python/ceval.c:4467
#3  0x0051d283 in call_function (pp_stack=0xbfbcc660, oparg=0) at Python/ceval.c:4392
(...)
msg318785 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-05 22:17
Hum, I succeeded to reproduce a hang in test_io using the command:

***
buildbot@stormageddon ~/python27 $ ./python -m test -v -m 'test_interrupted*' -F test_io
(...)
0:04:38 load avg: 3.36 [ 44] test_io
test_interrupted_write_buffered (test.test_io.CSignalsTest) ... ok
test_interrupted_write_text (test.test_io.CSignalsTest) ... ok
test_interrupted_write_unbuffered (test.test_io.CSignalsTest) ... ok
test_interrupted_write_buffered (test.test_io.PySignalsTest) ... ok
test_interrupted_write_text (test.test_io.PySignalsTest) ... ok
test_interrupted_write_unbuffered (test.test_io.PySignalsTest) ... ok

----------------------------------------------------------------------
Ran 6 tests in 6.167s

OK
0:04:45 load avg: 3.65 [ 45] test_io
test_interrupted_write_buffered (test.test_io.CSignalsTest) ... ok
test_interrupted_write_text (test.test_io.CSignalsTest) ... ok
test_interrupted_write_unbuffered (test.test_io.CSignalsTest) ... ok
test_interrupted_write_buffered (test.test_io.PySignalsTest) ... ok
test_interrupted_write_text (test.test_io.PySignalsTest) ... 
***

Traces in gdb of the stuck process:

***
(gdb) py-bt
Traceback (most recent call first):
  Waiting for the GIL
  <built-in method acquire of thread.lock object at remote 0xb6c09258>
  File "/buildbot/python27/Lib/threading.py", line 340, in wait
    waiter.acquire()
  File "/buildbot/python27/Lib/threading.py", line 940, in join
    self.__block.wait()
  File "/buildbot/python27/Lib/test/test_io.py", line 3161, in check_interrupted_write
    t.join()
  File "/buildbot/python27/Lib/test/test_io.py", line 3186, in test_interrupted_write_text
    self.check_interrupted_write("xy", b"xy", mode="w", encoding="ascii")
  File "/buildbot/python27/Lib/unittest/case.py", line 329, in run
    testMethod()
  File "/buildbot/python27/Lib/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/buildbot/python27/Lib/unittest/suite.py", line 108, in run
    test(result)
  File "/buildbot/python27/Lib/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/buildbot/python27/Lib/unittest/suite.py", line 108, in run
    test(result)
  File "/buildbot/python27/Lib/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/buildbot/python27/Lib/unittest/runner.py", line 151, in run
    test(result)
  File "/buildbot/python27/Lib/test/support/__init__.py", line 1535, in _run_suite
    result = runner.run(suite)
  File "/buildbot/python27/Lib/test/support/__init__.py", line 1626, in run_unittest
    _run_suite(suite)
  File "/buildbot/python27/Lib/test/test_io.py", line 3367, in test_main
    support.run_unittest(*tests)
  File "/buildbot/python27/Lib/test/regrtest.py", line 1236, in runtest_inner
    indirect_test()
  File "/buildbot/python27/Lib/test/regrtest.py", line 1044, in runtest
    return runtest_inner(test, verbose, quiet, huntrleaks, pgo, testdir)
  File "/buildbot/python27/Lib/test/regrtest.py", line 837, in local_runtest
    testdir=testdir)
  File "/buildbot/python27/Lib/test/regrtest.py", line 851, in main
    result = local_runtest()
  File "/buildbot/python27/Lib/test/regrtest.py", line 2025, in main_in_temp_cwd
    main()
  File "/buildbot/python27/Lib/test/__main__.py", line 3, in <module>
    regrtest.main_in_temp_cwd()
  File "/buildbot/python27/Lib/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/buildbot/python27/Lib/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)

(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0xb7518700 (LWP 14442) "python" 0xb7772b49 in __kernel_vsyscall ()
  2    Thread 0xb6b36b40 (LWP 18865) "python" 0xb7772b49 in __kernel_vsyscall ()

(gdb) t 2
[Switching to thread 2 (Thread 0xb6b36b40 (LWP 18865))]
#0  0xb7772b49 in __kernel_vsyscall ()

(gdb) py-bt
Traceback (most recent call first):
  <built-in function read>
  File "/buildbot/python27/Lib/test/test_io.py", line 3141, in _read
    s = os.read(r, 1)
  File "/buildbot/python27/Lib/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/buildbot/python27/Lib/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/buildbot/python27/Lib/threading.py", line 774, in __bootstrap
    self.__bootstrap_inner()
***

And now I recall how much I hate this test. It made me crazy in Python 2 for a long time.

This specific test was my motiviation to add signal.pthread_sigmask() to Python 3.3.

Note: I also implemented the PEP 475 in Python 3.5 to make Python more reliable when getting signals.
msg318786 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-05 22:27
Sadly, issues with test_io.test_interrupted*() are old. See for example the bpo-23680.

Two years ago,  Martin Panter saw test_io.test_interrupted_write_text() hangs on Python 3.6 and on AMD64 FreeBSD 9.x 3.5:
https://bugs.python.org/issue22331#msg266688
https://bugs.python.org/issue22331#msg266725

Martin proposed a fix using signal.pthread_kill() rather than scheduling a SIGALRM signal in 1 second, but the fix was not merged and I closed the issue since I didn't see the issue recently.
msg318788 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-05 22:59
> buildbot@stormageddon ~/python27 $ ./python -m test -v -m 'test_interrupted*' -F test_io

I'm unable to reproduce this issue on my Fedora 28 laptop (Linux kernel 4.16.11-300.fc28.x86_64). In the previous message, I reproduced the issue directly on Zach's Gentoo buildbot builder.
msg318859 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-06 21:33
New changeset 02b4d67a38b0c17d87749eeefc51e91baef10857 by Victor Stinner in branch '2.7':
bpo-31731: test_io hangs with --huntrleaks: exclude it (GH-7459)
https://github.com/python/cpython/commit/02b4d67a38b0c17d87749eeefc51e91baef10857
msg318860 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-06 21:35
test_io hangs often on Gentoo Refleaks 2.7 buildbot: Zachary Ware, who owns the builder, has to interrupt the test regularly. So I disabled test_io in regrtest when --huntrleaks is used, since at least one bug has been identified in test_io in test_interrupted*().
msg319158 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-09 16:54
AMD64 FreeBSD 10.x Shared 3.7 issue:

http://buildbot.python.org/all/#/builders/124/builds/380

0:15:14 load avg: 0.18 [415/415/1] test_io crashed (Exit code 1)
Timeout (0:15:00)!
Thread 0x0000000806943c00 (most recent call first):
  File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/test/test_io.py", line 4090 in _read
  File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/threading.py", line 865 in run
  File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/threading.py", line 917 in _bootstrap_inner
  File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/threading.py", line 885 in _bootstrap

Thread 0x0000000802006400 (most recent call first):
  File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/threading.py", line 1048 in _wait_for_tstate_lock
  File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/threading.py", line 1032 in join
  File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/test/test_io.py", line 4111 in check_interrupted_write
  File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/test/test_io.py", line 4135 in test_interrupted_write_text
msg332093 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-18 22:36
> AMD64 FreeBSD 10.x Shared 3.7 issue:

I wrote PR 11225 which may fix it.
msg332096 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-18 22:52
New changeset 05c9d31eb62cc45dc3c55a5cdb7cbc713d0421db by Victor Stinner in branch 'master':
bpo-31731: Fix test_io.check_interrupted_write() (GH-11225)
https://github.com/python/cpython/commit/05c9d31eb62cc45dc3c55a5cdb7cbc713d0421db
msg332098 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-18 22:57
The test can be fixed in Python 2.7 by exposing pthread_sigmask() somehow, or at least pthread_sigmask(SIG_BLOCK, [SIGALARM]), but honestly, I don't think that it's worth it. The test only very rarely hangs, and the bug has already been fixed in Python 3.

I close the issue. I fixed a race condition in test_io of Python 3. (The bot will shortly backport the fix to 3.7).
msg332099 - (view) Author: miss-islington (miss-islington) Date: 2018-12-18 23:10
New changeset 729fc5d2acab9eb672c4804092236af5362a4c66 by Miss Islington (bot) in branch '3.7':
bpo-31731: Fix test_io.check_interrupted_write() (GH-11225)
https://github.com/python/cpython/commit/729fc5d2acab9eb672c4804092236af5362a4c66
History
Date User Action Args
2018-12-18 23:10:49miss-islingtonsetnosy: + miss-islington
messages: + msg332099
2018-12-18 22:57:04vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg332098

stage: patch review -> resolved
2018-12-18 22:52:53miss-islingtonsetpull_requests: + pull_request10461
2018-12-18 22:52:43vstinnersetmessages: + msg332096
2018-12-18 22:36:41vstinnersetmessages: + msg332093
2018-12-18 22:22:38vstinnersetpull_requests: + pull_request10460
2018-06-20 14:05:03vstinnersettitle: [2.7] test_io hangs on x86 Gentoo Refleaks 2.7 -> [2.7] test_io: race condition in test_interrupted_write_text() (test_io hangs on x86 Gentoo Refleaks 2.7)
2018-06-09 16:54:24vstinnersetmessages: + msg319158
2018-06-06 21:35:32vstinnersetmessages: + msg318860
2018-06-06 21:33:38vstinnersetmessages: + msg318859
2018-06-06 21:18:11vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request7081
2018-06-05 22:59:31vstinnersetmessages: + msg318788
2018-06-05 22:27:43vstinnersetmessages: + msg318786
2018-06-05 22:17:56vstinnersetmessages: + msg318785
2018-06-05 21:43:03vstinnersetmessages: + msg318783
2018-05-29 22:10:06zach.waresetstatus: closed -> open
resolution: out of date -> (no value)
messages: + msg318121

stage: resolved -> (no value)
2018-05-29 21:58:02vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg318111

stage: resolved
2017-10-24 09:13:28vstinnersetmessages: + msg304879
2017-10-09 07:14:58vstinnersettitle: test_io hangs on -> [2.7] test_io hangs on x86 Gentoo Refleaks 2.7
nosy: + zach.ware

messages: + msg303925

versions: + Python 2.7
components: + Tests
2017-10-09 07:13:25vstinnercreate