This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_io.test_interrupted_write_text() hangs on the buildbot FreeBSD 7.2
Type: Stage: resolved
Components: Tests Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: martin.panter, python-dev, vstinner
Priority: normal Keywords: buildbot, patch

Created on 2014-09-03 21:31 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
int-write.patch martin.panter, 2016-06-03 13:56 review
Messages (10)
msg226326 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-09-03 21:31
http://buildbot.python.org/all/builders/x86%20FreeBSD%207.2%203.4/builds/332/steps/test/logs/stdio

Log:
---
[157/389/1] test_io
Timeout (1:00:00)!
Thread 0x2a4f0790 (most recent call first):
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/test_io.py", line 3257 in _read
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/threading.py", line 869 in run
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/threading.py", line 921 in _bootstrap_inner
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/threading.py", line 889 in _bootstrap

Thread 0x28401040 (most recent call first):
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/threading.py", line 1077 in _wait_for_tstate_lock
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/threading.py", line 1061 in join
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/test_io.py", line 3278 in check_interrupted_write
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/test_io.py", line 3302 in test_interrupted_write_text
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/unittest/case.py", line 577 in run
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/unittest/case.py", line 625 in __call__
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/unittest/suite.py", line 125 in run
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/unittest/suite.py", line 87 in __call__
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/unittest/suite.py", line 125 in run
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/unittest/suite.py", line 87 in __call__
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/unittest/suite.py", line 125 in run
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/unittest/suite.py", line 87 in __call__
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/unittest/runner.py", line 168 in run
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/support/__init__.py", line 1750 in _run_suite
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/support/__init__.py", line 1784 in run_unittest
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/regrtest.py", line 1279 in test_runner
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/regrtest.py", line 1280 in runtest_inner
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/regrtest.py", line 967 in runtest
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/regrtest.py", line 763 in main
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/regrtest.py", line 1564 in main_in_temp_cwd
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/test/__main__.py", line 3 in <module>
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/runpy.py", line 85 in _run_code
  File "/usr/home/db3l/buildarea/3.4.bolen-freebsd7/build/Lib/runpy.py", line 170 in _run_module_as_main
---

See also the issue #11859 (fixed).
msg226327 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-09-03 21:33
New changeset baa372eb731c by Victor Stinner in branch '3.4':
Issue #22331: Skip test_interrupted_write_text() on FreeBSD older than 8.0
http://hg.python.org/cpython/rev/baa372eb731c

New changeset 340d48347295 by Victor Stinner in branch 'default':
(Merge 3.4) Issue #22331: Skip test_interrupted_write_text() on FreeBSD older
http://hg.python.org/cpython/rev/340d48347295
msg228577 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-05 15:12
The test now pass on the buildbot FreeBSD 7.2.
msg266688 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-30 12:02
This recently hung AMD64 FreeBSD 9.x 3.5. The stack trace was different, and there is only one thread:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.5/builds/828/steps/test/logs/stdio
[398/398] test_io
Timeout (0:15:00)!
Thread 0x0000000801807400 (most recent call first):
  File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/unittest/case.py", line 176 in handle
  File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/unittest/case.py", line 727 in assertRaises
  File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/test/test_io.py", line 3714 in check_interrupted_write
  File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/test/test_io.py", line 3743 in test_interrupted_write_text

Also, x86 Ubuntu Shared 2.7 hung, but the only information I have is it was running test_io.

In my Free BSD case, the write() call is stuck, but in Victor’s original case, the background read() call is stuck. I could explain both cases as a race condition with the alarm signal being delivered:

Victor’s case: SIGALRM delivered somewhere inside assertRaises(), but before the write() system call, and Python has a chance to call the Python signal handler. No data is ever written, so the background read() hangs.

My case: SIGALRM delivered just before or as write() starts, so Python has no chance to interrupt the system call and call its own Python handler.

I wonder if we can change the test to only deliver a signal after first reading one byte, i.e. send the signal directly from the background thread. I think that should cover 99% of cases, though in theory it is possible for something else to interrupt write(), and our signal to be delivered while write() was restarting, causing it to hang. But the combination of those two events would be so unlikely it may not be worth worrying about.

Also, if you close the read end of the pipe before the write end, it should protect against flush() hanging without the EBADF hack by raising EPIPE instead.

I suggest rewriting the background thread like:

def _read():
    s = os.read(r, 1)
    read_results.append(s)
    # The main thread is very likely to be inside the write() syscall now, so interrupt it
    os.kill(os.getpid(), SIGALRM)

and cleaning up the pipe like:

finally:
    os.close(r)
    try:
        wio.close()
    except BrokenPipeError:
        pass
msg266691 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-30 12:47
When we talk about signals, we must always be careful of threads. Python unit tests are known (at least by me ;-)) to "leak" threads. IMO the first step to make the test more reliable is to start it in a fresh process to control exactly the number of threads.

A fresh process avoids also to inherit an expected signal and/or signal handler.

I modified test_eintr recently to do something similar.
msg266692 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-30 12:50
I also modified Python 3.6 to run each test file in a fresh process to avoid leaking too much things between tests, to isolate them better.
msg266723 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-31 00:10
Thanks, that is a very good point about threads. I remember that testing tkinter leaves an internal TK or TCL thread running in the background. And I understand Free BSD delivers process signals to arbitrary threads (not necessarily the main thread).

But the FreeBSD 3.5 buildbot looks like it ran test_io in a separate process (-j4 specified). Anyway it is another possible explanation of the hang (signal delivered to wrong thread, so write() call not interrupted).

Perhaps it is better to use signal.pthread_kill() instead of os.kill() in my proposal.
msg266725 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-31 00:29
Found the same hang on the 3.6 branch, which was definitely running separate processes:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.x/builds/4210/steps/test/logs/stdio
== CPython 3.6.0a0 (default:ed6345cb08ab, Apr 29 2016, 18:35:36) [GCC 4.2.1 20070831 patched [FreeBSD]]
Run tests in parallel using 4 child processes
msg267100 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-03 13:56
Here is a patch with my suggestion to send the signal only when read() returns, and also closing the read end of the pipe before the write end.
msg297114 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-28 01:26
Hum, the resolution is already set to Fixed, and I didn't see this bug last months on buildbots, so I consider that yes, it's fixed.

Sorry Martin, I lost track of this issue and so I don't know what to do with your patch :-/ I will just close the issue.
History
Date User Action Args
2022-04-11 14:58:07adminsetgithub: 66527
2017-06-28 01:26:58vstinnersetstatus: open -> closed

messages: + msg297114
stage: patch review -> resolved
2016-06-03 13:56:41martin.pantersetfiles: + int-write.patch
keywords: + patch
messages: + msg267100

stage: patch review
2016-05-31 00:29:39martin.pantersetmessages: + msg266725
2016-05-31 00:10:04martin.pantersetmessages: + msg266723
2016-05-30 12:50:06vstinnersetmessages: + msg266692
2016-05-30 12:47:45vstinnersetmessages: + msg266691
2016-05-30 12:02:49martin.pantersetstatus: closed -> open
versions: + Python 2.7, Python 3.6, - Python 3.4
nosy: + martin.panter

messages: + msg266688
2014-10-05 15:12:18vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg228577
2014-09-03 21:33:53python-devsetnosy: + python-dev
messages: + msg226327
2014-09-03 21:31:48vstinnercreate