Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.7] test_io: race condition in test_interrupted_write_text() (test_io hangs on x86 Gentoo Refleaks 2.7) #75912

Closed
vstinner opened this issue Oct 9, 2017 · 16 comments
Labels
tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

vstinner commented Oct 9, 2017

BPO 31731
Nosy @vstinner, @zware, @miss-islington
PRs
  • [2.7] bpo-31731: test_io hangs with --huntrleaks: exclude it #7459
  • bpo-31731: Fix test_io.check_interrupted_write() #11225
  • [3.7] bpo-31731: Fix test_io.check_interrupted_write() (GH-11225) #11226
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-12-18.22:57:04.637>
    created_at = <Date 2017-10-09.07:13:25.905>
    labels = ['tests']
    title = '[2.7] test_io: race condition in test_interrupted_write_text() (test_io hangs on x86 Gentoo Refleaks 2.7)'
    updated_at = <Date 2018-12-18.23:10:49.894>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2018-12-18.23:10:49.894>
    actor = 'miss-islington'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-12-18.22:57:04.637>
    closer = 'vstinner'
    components = ['Tests']
    creation = <Date 2017-10-09.07:13:25.905>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 31731
    keywords = ['patch']
    message_count = 16.0
    messages = ['303924', '303925', '304879', '318111', '318121', '318783', '318785', '318786', '318788', '318859', '318860', '319158', '332093', '332096', '332098', '332099']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'zach.ware', 'miss-islington']
    pr_nums = ['7459', '11225', '11226']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue31731'
    versions = ['Python 2.7']

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 9, 2017

    test_io is running since longer than 5 hours on x86 Gentoo Refleaks 2.7:

    http://buildbot.python.org/all/#/builders/78/builds/1/steps/4/logs/stdio

    2:12:11 load avg: 3.26 [402/403] test_contains passed -- running: test_io (1739 sec)
    beginning 6 repetitions
    123456
    ......
    running: test_io (1769 sec)
    running: test_io (1799 sec)
    (...)
    running: test_io (19529 sec)

    http://buildbot.python.org/all/#/builders/78/builds/1/steps/4/logs/stdio

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 9, 2017

    I interrupted the build.

    @vstinner vstinner added the tests Tests in the Lib/test dir label Oct 9, 2017
    @vstinner vstinner changed the title test_io hangs on [2.7] test_io hangs on x86 Gentoo Refleaks 2.7 Oct 9, 2017
    @vstinner
    Copy link
    Member Author

    test_io was blocked a second time:

    http://buildbot.python.org/all/#/builders/78/builds/3

    running: test_io (68517 sec)
    running: test_io (68547 sec)
    running: test_io (68577 sec)
    command interrupted, attempting to kill
    process killed by signal 9
    program finished with exit code -1
    elapsedTime=69935.628661

    @vstinner
    Copy link
    Member Author

    I didn't see the failure recently. I close the issue.

    @zware
    Copy link
    Member

    zware commented May 29, 2018

    test_io is currently hung on that builder:

    buildbot 13541 0.0 0.1 4920 2508 pts/1 Ss+ May28 0:00 \_ make buildbottest TESTOPTS=-j2 -R 3:3 -u-cpu TESTPYTHONOPTS= TESTTIMEOUT=11700
    buildbot 13556 0.2 0.4 33620 9952 pts/1 Sl+ May28 2:45 | \_ ./python -Wd -3 -E -tt -R ./Lib/test/regrtest.py -uall --slowest -rwW -j2 -R 3:3 -u-cpu
    buildbot 28750 0.0 1.1 65412 24428 pts/1 S+ May28 0:54 | \_ /buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/python -E -3 -R -Wd -m test.regrtest --slaveargs [["test_io", 0, false], {"match_tests": null, "pgo": false, "huntrleaks": [3, 3, "reflog.txt"], "failfast": false, "use_resources": ["audio", "curses", "largefile", "network", "bsddb", "decimal", "subprocess", "urlfetch", "gui", "xpickle"]}]

    @zware zware reopened this May 29, 2018
    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 5, 2018

    I succeeded to attach gdb to a regrtest slave process stuck in test_io... but somehow attaching the process into gdb unblocked the process... and the test completed.

    Sadly, the first time that I attached the process, I forgot to allow a directory to load python-gdb.py. So I had to detach and attach again the process to get a working "py-bt" command. Maybe this action unblocked Python. It's hard to say.

    (gdb) py-bt
    Traceback (most recent call first):
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/_pyio.py", line 1126, in _flush_unlocked
        def _flush_unlocked(self):
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/_pyio.py", line 1104, in write
        self._flush_unlocked()
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/_pyio.py", line 1302, in write
        return BufferedWriter.write(self, b)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/test_io.py", line 1186, in check_writes
        self.assertEqual(bufio.write(contents[n:n+size]), size)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/test_io.py", line 1723, in test_writes_and_peek
        self.check_writes(_peek)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 329, in run
        testMethod()
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 393, in __call__
        return self.run(*args, **kwds)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 108, in run
        test(result)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 70, in __call__
        return self.run(*args, **kwds)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 108, in run
        test(result)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 70, in __call__
        return self.run(*args, **kwds)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 1461, in run
        test(result)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 1535, in _run_suite
        result = runner.run(suite)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 1626, in run_unittest
        _run_suite(suite)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/test_io.py", line 3367, in test_main
        support.run_unittest(*tests)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1361, in run_the_test
        indirect_test()
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1375, in dash_R
        run_the_test()
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1239, in runtest_inner
        huntrleaks)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1044, in runtest
        return runtest_inner(test, verbose, quiet, huntrleaks, pgo, testdir)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 513, in main
        result = runtest(*args, **kwargs)
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 2025, in main_in_temp_cwd
        main()
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 2038, in <module>
        main_in_temp_cwd()
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/runpy.py", line 72, in _run_code
        exec code in run_globals
      File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/runpy.py", line 174, in _run_module_as_main
        "__main__", fname, loader, pkg_name)

    (gdb) where
    #0 0x0050b6fd in PyEval_EvalFrameEx (
    f=Frame 0xb6bd1634, for file /buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/pyio.py, line 1126, in _flush_unlocked (self=<BufferedRandom(raw=<PyMockRawIO(write_stack=['\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07', '\x08', '\t', '\n', '\x0b', '\x0c', '\r', '\x0e', '\x0f\x10', '\x11\x12', '\x13\x14', '\x15\x16', '\x17\x18', '\x19\x1a', '\x1b\x1c', '\x1d\x1e', '\x1f ', '!"', '#$', '%&', "'(", ')*', '+,', '-./', '012', '345', '678', '9:;', '<=>', '?@A', 'BCD', 'EFG', 'HIJ', 'KLM', 'NOP', 'QRS', 'TUV', 'WXY', 'Z[\\]', '^`a', 'bcde', 'fghi', 'jklm', 'nopq', 'rstu', 'vwxy', 'z{|}', '~\x7f\x80\x81', '\x82\x83\x84\x85', '\x86\x87\x88\x89', '\x8a\x8b\x8c\x8d', '\x8e\x8f\x90\x91', '\x92\x93\x94\x95', '\x96\x97\x98\x99\x9a', '\x9b\x9c\x9d\x9e\x9f', '\xa0\xa1\xa2\xa3\xa4', '\xa5\xa6\xa7\xa8\xa9', '\xaa\xab\xac\xad\xae', '\xaf\xb0\xb1\xb2\xb3', '\xb4\xb5\xb6\xb7\xb8', '\xb9\xba\xbb\xbc\xbd', '\xbe\xbf\xc0\xc1\xc2', '\xc3\xc4\xc5\xc6\xc7', '\xc8\xc9\xca\xcb\xcc', '\xcd\xce\xcf\xd0\xd1', '\xd2\x...(truncated),
    throwflag=0) at Python/ceval.c:1094
    #1 0x0051a9e8 in PyEval_EvalCodeEx (co=0xb6ccdda8,
    globals={'BufferedReader': <ABCMeta(abc_cache=<WeakSet(data=set([<weakref at remote 0xb6c21434>, <weakref at remote 0xb6bcc8b4>]), _pending_removals=[], _remove=<function at remote 0xb6687c64>, _iterating=set([])) at remote 0xb668a4cc>, peek=<function at remote 0xb6687ad4>, _abc_negative_cache_version=26, tell=<function at remote 0xb6687bc4>, read=<function at remote 0xb6687a34>, __doc
    =u'BufferedReader(raw[, buffer_size])\n\n A buffer for a readable, sequential BaseRawIO object.\n\n The constructor creates a BufferedReader for the given readable raw\n stream and buffer_size. If buffer_size is omitted, DEFAULT_BUFFER_SIZE\n is used.\n ', __init
    _=<function at remote 0xb6687994>, _reset_read_buf=<function at remote 0xb66879e4>, _abc_negative_cache=<WeakSet(data=set([]), _pending_removals=[], _remove=<function at remote 0xb66b7da4>, _iterating=set([])) at remote 0xb668a0a4>, __module__='_pyio', __abstractmethods__=frozenset([]), _peek_unlocked=<function at remote 0xb6687b24>, _read_unlocked=<function a...(truncated),
    locals=0x0, args=0xb6c1cb50, argcount=1, kws=0xb6c1cb54, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3604
    #2 0x0051d618 in fast_function (func=<function at remote 0xb6687e94>, pp_stack=0xbfbcc660, n=1, na=1, nk=0) at Python/ceval.c:4467
    #3 0x0051d283 in call_function (pp_stack=0xbfbcc660, oparg=0) at Python/ceval.c:4392
    (...)

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 5, 2018

    Hum, I succeeded to reproduce a hang in test_io using the command:


    buildbot@stormageddon ~/python27 $ ./python -m test -v -m 'test_interrupted*' -F test_io
    (...)
    0:04:38 load avg: 3.36 [ 44] test_io
    test_interrupted_write_buffered (test.test_io.CSignalsTest) ... ok
    test_interrupted_write_text (test.test_io.CSignalsTest) ... ok
    test_interrupted_write_unbuffered (test.test_io.CSignalsTest) ... ok
    test_interrupted_write_buffered (test.test_io.PySignalsTest) ... ok
    test_interrupted_write_text (test.test_io.PySignalsTest) ... ok
    test_interrupted_write_unbuffered (test.test_io.PySignalsTest) ... ok

    ----------------------------------------------------------------------
    Ran 6 tests in 6.167s

    OK
    0:04:45 load avg: 3.65 [ 45] test_io
    test_interrupted_write_buffered (test.test_io.CSignalsTest) ... ok
    test_interrupted_write_text (test.test_io.CSignalsTest) ... ok
    test_interrupted_write_unbuffered (test.test_io.CSignalsTest) ... ok
    test_interrupted_write_buffered (test.test_io.PySignalsTest) ... ok
    test_interrupted_write_text (test.test_io.PySignalsTest) ...
    ***

    Traces in gdb of the stuck process:

    ***
    (gdb) py-bt
    Traceback (most recent call first):
      Waiting for the GIL
      <built-in method acquire of thread.lock object at remote 0xb6c09258>
      File "/buildbot/python27/Lib/threading.py", line 340, in wait
        waiter.acquire()
      File "/buildbot/python27/Lib/threading.py", line 940, in join
        self.__block.wait()
      File "/buildbot/python27/Lib/test/test_io.py", line 3161, in check_interrupted_write
        t.join()
      File "/buildbot/python27/Lib/test/test_io.py", line 3186, in test_interrupted_write_text
        self.check_interrupted_write("xy", b"xy", mode="w", encoding="ascii")
      File "/buildbot/python27/Lib/unittest/case.py", line 329, in run
        testMethod()
      File "/buildbot/python27/Lib/unittest/case.py", line 393, in __call__
        return self.run(*args, **kwds)
      File "/buildbot/python27/Lib/unittest/suite.py", line 108, in run
        test(result)
      File "/buildbot/python27/Lib/unittest/suite.py", line 70, in __call__
        return self.run(*args, **kwds)
      File "/buildbot/python27/Lib/unittest/suite.py", line 108, in run
        test(result)
      File "/buildbot/python27/Lib/unittest/suite.py", line 70, in __call__
        return self.run(*args, **kwds)
      File "/buildbot/python27/Lib/unittest/runner.py", line 151, in run
        test(result)
      File "/buildbot/python27/Lib/test/support/__init__.py", line 1535, in _run_suite
        result = runner.run(suite)
      File "/buildbot/python27/Lib/test/support/__init__.py", line 1626, in run_unittest
        _run_suite(suite)
      File "/buildbot/python27/Lib/test/test_io.py", line 3367, in test_main
        support.run_unittest(*tests)
      File "/buildbot/python27/Lib/test/regrtest.py", line 1236, in runtest_inner
        indirect_test()
      File "/buildbot/python27/Lib/test/regrtest.py", line 1044, in runtest
        return runtest_inner(test, verbose, quiet, huntrleaks, pgo, testdir)
      File "/buildbot/python27/Lib/test/regrtest.py", line 837, in local_runtest
        testdir=testdir)
      File "/buildbot/python27/Lib/test/regrtest.py", line 851, in main
        result = local_runtest()
      File "/buildbot/python27/Lib/test/regrtest.py", line 2025, in main_in_temp_cwd
        main()
      File "/buildbot/python27/Lib/test/__main__.py", line 3, in <module>
        regrtest.main_in_temp_cwd()
      File "/buildbot/python27/Lib/runpy.py", line 72, in _run_code
        exec code in run_globals
      File "/buildbot/python27/Lib/runpy.py", line 174, in _run_module_as_main
        "__main__", fname, loader, pkg_name)

    (gdb) info threads
    Id Target Id Frame

    • 1 Thread 0xb7518700 (LWP 14442) "python" 0xb7772b49 in __kernel_vsyscall ()
      2 Thread 0xb6b36b40 (LWP 18865) "python" 0xb7772b49 in __kernel_vsyscall ()

    (gdb) t 2
    [Switching to thread 2 (Thread 0xb6b36b40 (LWP 18865))]
    #0 0xb7772b49 in __kernel_vsyscall ()

    (gdb) py-bt
    Traceback (most recent call first):
      <built-in function read>
      File "/buildbot/python27/Lib/test/test_io.py", line 3141, in _read
        s = os.read(r, 1)
      File "/buildbot/python27/Lib/threading.py", line 754, in run
        self.__target(*self.__args, **self.__kwargs)
      File "/buildbot/python27/Lib/threading.py", line 801, in __bootstrap_inner
        self.run()
      File "/buildbot/python27/Lib/threading.py", line 774, in __bootstrap
        self.__bootstrap_inner()
    ***

    And now I recall how much I hate this test. It made me crazy in Python 2 for a long time.

    This specific test was my motiviation to add signal.pthread_sigmask() to Python 3.3.

    Note: I also implemented the PEP-475 in Python 3.5 to make Python more reliable when getting signals.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 5, 2018

    Sadly, issues with test_io.test_interrupted*() are old. See for example the bpo-23680.

    Two years ago, Martin Panter saw test_io.test_interrupted_write_text() hangs on Python 3.6 and on AMD64 FreeBSD 9.x 3.5:
    https://bugs.python.org/issue22331#msg266688
    https://bugs.python.org/issue22331#msg266725

    Martin proposed a fix using signal.pthread_kill() rather than scheduling a SIGALRM signal in 1 second, but the fix was not merged and I closed the issue since I didn't see the issue recently.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 5, 2018

    buildbot@stormageddon ~/python27 $ ./python -m test -v -m 'test_interrupted*' -F test_io

    I'm unable to reproduce this issue on my Fedora 28 laptop (Linux kernel 4.16.11-300.fc28.x86_64). In the previous message, I reproduced the issue directly on Zach's Gentoo buildbot builder.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 6, 2018

    New changeset 02b4d67 by Victor Stinner in branch '2.7':
    bpo-31731: test_io hangs with --huntrleaks: exclude it (GH-7459)
    02b4d67

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 6, 2018

    test_io hangs often on Gentoo Refleaks 2.7 buildbot: Zachary Ware, who owns the builder, has to interrupt the test regularly. So I disabled test_io in regrtest when --huntrleaks is used, since at least one bug has been identified in test_io in test_interrupted*().

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 9, 2018

    AMD64 FreeBSD 10.x Shared 3.7 issue:

    http://buildbot.python.org/all/#/builders/124/builds/380

    0:15:14 load avg: 0.18 [415/415/1] test_io crashed (Exit code 1)
    Timeout (0:15:00)!
    Thread 0x0000000806943c00 (most recent call first):
    File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/test/test_io.py", line 4090 in _read
    File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/threading.py", line 865 in run
    File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/threading.py", line 917 in _bootstrap_inner
    File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/threading.py", line 885 in _bootstrap

    Thread 0x0000000802006400 (most recent call first):
    File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/threading.py", line 1048 in _wait_for_tstate_lock
    File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/threading.py", line 1032 in join
    File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/test/test_io.py", line 4111 in check_interrupted_write
    File "/usr/home/buildbot/python/3.7.koobs-freebsd10/build/Lib/test/test_io.py", line 4135 in test_interrupted_write_text

    @vstinner vstinner changed the title [2.7] test_io hangs on x86 Gentoo Refleaks 2.7 [2.7] test_io: race condition in test_interrupted_write_text() (test_io hangs on x86 Gentoo Refleaks 2.7) Jun 20, 2018
    @vstinner
    Copy link
    Member Author

    AMD64 FreeBSD 10.x Shared 3.7 issue:

    I wrote PR 11225 which may fix it.

    @vstinner
    Copy link
    Member Author

    New changeset 05c9d31 by Victor Stinner in branch 'master':
    bpo-31731: Fix test_io.check_interrupted_write() (GH-11225)
    05c9d31

    @vstinner
    Copy link
    Member Author

    The test can be fixed in Python 2.7 by exposing pthread_sigmask() somehow, or at least pthread_sigmask(SIG_BLOCK, [SIGALARM]), but honestly, I don't think that it's worth it. The test only very rarely hangs, and the bug has already been fixed in Python 3.

    I close the issue. I fixed a race condition in test_io of Python 3. (The bot will shortly backport the fix to 3.7).

    @miss-islington
    Copy link
    Contributor

    New changeset 729fc5d by Miss Islington (bot) in branch '3.7':
    bpo-31731: Fix test_io.check_interrupted_write() (GH-11225)
    729fc5d

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants