Issue 24751: regrtest/buildbot: test run marked as failure even when re-run succeeds

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68939

classification

Title:	regrtest/buildbot: test run marked as failure even when re-run succeeds
Type:	behavior	Stage:	resolved
Components:	Tests	Versions:	Python 3.6, Python 3.4, Python 3.5, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	zach.ware	Nosy List:	db3l, python-dev, r.david.murray, zach.ware
Priority:	low	Keywords:	buildbot, easy, patch

Created on 2015-07-29 21:25 by zach.ware, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue24751.diff	zach.ware, 2015-07-31 02:52		review

Messages (7)
msg247632 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2015-07-29 21:25
The buildbots all run the test suite with the '-w', which re-runs any tests that failed in the main test sequence at a higher verbosity level. More often than not it seems the re-run tests succeed, but the exit code is still 1 so the build is marked as a failure. The simplest action I'd like would be to exit(0) iff all re-run tests pass on the re-run. Alternatively, we could try to get a bit fancier and exit with some other return code, and adjust the build master to interpret that return code as "passed, with warnings" and mark the build as amber rather than red.
msg247634 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-07-29 21:37
I think option 1 is to be preferred. One of the things we've been talking about for the workflow is gating on the buildbots passing, and the way that works with flaky tests is if the check fails, you just run the test again so you get a green and the patch can be gated in. So from that perspective if the tests pass on rerun the result is most useful if it is green. Unless we want to say amber is OK for gating...but in terms of cognative load I think green is better. After all, our current green state is morally equivalent to running the tests again and having them pass..
msg247728 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2015-07-31 02:52
Here's a patch.
msg248013 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-08-05 03:00
New changeset 6987a9c7dde9 by Zachary Ware in branch '2.7': Issue #24751: When running regrtest with '-w', don't fail if re-run succeeds. https://hg.python.org/cpython/rev/6987a9c7dde9 New changeset 9964edf2fd1e by Zachary Ware in branch '3.4': Issue #24751: When running regrtest with '-w', don't fail if re-run succeeds. https://hg.python.org/cpython/rev/9964edf2fd1e New changeset 9d1f6022261d by Zachary Ware in branch '3.5': Issue #24751: Merge with 3.4 https://hg.python.org/cpython/rev/9d1f6022261d New changeset 6f67c74608b6 by Zachary Ware in branch 'default': Closes #24751: Merge with 3.5 https://hg.python.org/cpython/rev/6f67c74608b6
msg248291 - (view)	Author: David Bolen (db3l) *	Date: 2015-08-08 19:48
While running a manual test (make buildbottest) on my 2.7 Ubuntu buildbot, I ran into an exception in this patch: The tail end of the test run: [401/401/1] test_signal 379 tests OK. 1 test failed: test_curses 21 tests skipped: test_aepack test_al test_applesingle test_bsddb185 test_cd test_cl test_dl test_gl test_imgfile test_kqueue test_linuxaudiodev test_macos test_macostools test_msilib test_ossaudiodev test_scriptpackages test_startfile test_sunaudiodev test_winreg test_winsound test_zipfile64 Those skips are all expected on linux2. Re-running failed tests in verbose mode Traceback (most recent call last): File "./Lib/test/regrtest.py", line 1598, in <module> main() File "./Lib/test/regrtest.py", line 655, in main for test in bad[:]: TypeError: 'set' object has no attribute '__getitem__' The code is attempting to iterate over a sliced copy of bad (bad[:]) due to later possible mutation, but by that point, if you had failures, bad is a set, from the block shortly above where it subtracts out the environment changed list. I was testing 2.7, but I think the issue affects all branches. Perhaps list(bad) instead of bad[:]?
msg248308 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2015-08-09 02:24
Ah. The problem is on 2.7 only; 3.x calls sorted() on the set operation. The set operation should just go away, though; we don't count ENV_CHANGED as 'bad' anymore. Will fix shortly.
msg248311 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-08-09 03:05
New changeset 7d69b214e668 by Zachary Ware in branch '2.7': Issue #24751: Fix running regrtest with '-w' flag in case of test failures. https://hg.python.org/cpython/rev/7d69b214e668

History
Date	User	Action	Args
2022-04-11 14:58:19	admin	set	github: 68939
2015-08-09 03:05:48	python-dev	set	messages: + msg248311
2015-08-09 03:05:42	zach.ware	set	status: open -> closed assignee: zach.ware resolution: fixed stage: needs patch -> resolved
2015-08-09 02:24:28	zach.ware	set	messages: + msg248308
2015-08-08 19:59:33	r.david.murray	set	status: closed -> open resolution: fixed -> (no value) stage: resolved -> needs patch
2015-08-08 19:48:49	db3l	set	nosy: + db3l messages: + msg248291
2015-08-05 03:00:20	python-dev	set	status: open -> closed nosy: + python-dev messages: + msg248013 resolution: fixed stage: patch review -> resolved
2015-07-31 02:52:37	zach.ware	set	files: + issue24751.diff components: + Tests versions: + Python 2.7, Python 3.4, Python 3.5, Python 3.6 keywords: + patch type: behavior messages: + msg247728 stage: patch review
2015-07-29 21:37:48	r.david.murray	set	nosy: + r.david.murray messages: + msg247634
2015-07-29 21:25:56	zach.ware	create