classification
Title: regrtest/buildbot: test run marked as failure even when re-run succeeds
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: zach.ware Nosy List: db3l, python-dev, r.david.murray, zach.ware
Priority: low Keywords: buildbot, easy, patch

Created on 2015-07-29 21:25 by zach.ware, last changed 2015-08-09 03:05 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
issue24751.diff zach.ware, 2015-07-31 02:52 review
Messages (7)
msg247632 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2015-07-29 21:25
The buildbots all run the test suite with the '-w', which re-runs any tests that failed in the main test sequence at a higher verbosity level.  More often than not it seems the re-run tests succeed, but the exit code is still 1 so the build is marked as a failure.

The simplest action I'd like would be to exit(0) iff all re-run tests pass on the re-run.  Alternatively, we could try to get a bit fancier and exit with some other return code, and adjust the build master to interpret that return code as "passed, with warnings" and mark the build as amber rather than red.
msg247634 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-07-29 21:37
I think option 1 is to be preferred.  One of the things we've been talking about for the workflow is gating on the buildbots passing, and the way that works with flaky tests is if the check fails, you just run the test again so you get a green and the patch can be gated in.  So from that perspective if the tests pass on rerun the result is most useful if it is green.

Unless we want to say amber is OK for gating...but in terms of cognative load I think green is better.  After all, our current green state is morally equivalent to running the tests again and having them pass..
msg247728 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2015-07-31 02:52
Here's a patch.
msg248013 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-08-05 03:00
New changeset 6987a9c7dde9 by Zachary Ware in branch '2.7':
Issue #24751: When running regrtest with '-w', don't fail if re-run succeeds.
https://hg.python.org/cpython/rev/6987a9c7dde9

New changeset 9964edf2fd1e by Zachary Ware in branch '3.4':
Issue #24751: When running regrtest with '-w', don't fail if re-run succeeds.
https://hg.python.org/cpython/rev/9964edf2fd1e

New changeset 9d1f6022261d by Zachary Ware in branch '3.5':
Issue #24751: Merge with 3.4
https://hg.python.org/cpython/rev/9d1f6022261d

New changeset 6f67c74608b6 by Zachary Ware in branch 'default':
Closes #24751: Merge with 3.5
https://hg.python.org/cpython/rev/6f67c74608b6
msg248291 - (view) Author: David Bolen (db3l) Date: 2015-08-08 19:48
While running a manual test (make buildbottest) on my 2.7 Ubuntu buildbot, I ran into an exception in this patch:

The tail end of the test run:

[401/401/1] test_signal
379 tests OK.
1 test failed:
    test_curses
21 tests skipped:
    test_aepack test_al test_applesingle test_bsddb185 test_cd test_cl
    test_dl test_gl test_imgfile test_kqueue test_linuxaudiodev
    test_macos test_macostools test_msilib test_ossaudiodev
    test_scriptpackages test_startfile test_sunaudiodev test_winreg
    test_winsound test_zipfile64
Those skips are all expected on linux2.
Re-running failed tests in verbose mode
Traceback (most recent call last):
  File "./Lib/test/regrtest.py", line 1598, in <module>
    main()
  File "./Lib/test/regrtest.py", line 655, in main
    for test in bad[:]:
TypeError: 'set' object has no attribute '__getitem__'


The code is attempting to iterate over a sliced copy of bad (bad[:]) due to later possible mutation, but by that point, if you had failures, bad is a set, from the block shortly above where it subtracts out the environment changed list.  I was testing 2.7, but I think the issue affects all branches.

Perhaps list(bad) instead of bad[:]?
msg248308 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2015-08-09 02:24
Ah.  The problem is on 2.7 only; 3.x calls sorted() on the set operation.  The set operation should just go away, though; we don't count ENV_CHANGED as 'bad' anymore.

Will fix shortly.
msg248311 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-08-09 03:05
New changeset 7d69b214e668 by Zachary Ware in branch '2.7':
Issue #24751: Fix running regrtest with '-w' flag in case of test failures.
https://hg.python.org/cpython/rev/7d69b214e668
History
Date User Action Args
2015-08-09 03:05:48python-devsetmessages: + msg248311
2015-08-09 03:05:42zach.waresetstatus: open -> closed
assignee: zach.ware
resolution: fixed
stage: needs patch -> resolved
2015-08-09 02:24:28zach.waresetmessages: + msg248308
2015-08-08 19:59:33r.david.murraysetstatus: closed -> open
resolution: fixed -> (no value)
stage: resolved -> needs patch
2015-08-08 19:48:49db3lsetnosy: + db3l
messages: + msg248291
2015-08-05 03:00:20python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg248013

resolution: fixed
stage: patch review -> resolved
2015-07-31 02:52:37zach.waresetfiles: + issue24751.diff

components: + Tests
versions: + Python 2.7, Python 3.4, Python 3.5, Python 3.6
keywords: + patch
type: behavior
messages: + msg247728
stage: patch review
2015-07-29 21:37:48r.david.murraysetnosy: + r.david.murray
messages: + msg247634
2015-07-29 21:25:56zach.warecreate