classification
Title: regrtest: reseed random with the same seed before running a test file
Type: Stage: resolved
Components: Tests Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, mark.dickinson, michael.foord, pitrou, rhettinger, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2017-08-17 15:32 by vstinner, last changed 2017-10-24 09:27 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3059 closed vstinner, 2017-08-17 15:32
Messages (7)
msg300438 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-17 15:32
Attached PR changes regrtest to reseed the random RNG before each test file. Use also more entropy for the seed: 2**32 (32 bits) rather than
10_000_000 (24 bits).

The change should avoid random failure of test_tools when hunting reference leaks: see bpo-31174.

Maybe it will also reduce false positive when hunting memory leaks, like bpo-31217.
msg300440 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-08-17 15:35
If refleaks depend on the random seed, perhaps it's a bug worth fixing?
msg300441 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-17 15:46
Antoine Pitrou: "If refleaks depend on the random seed, perhaps it's a bug worth fixing?"

I propose to change regrtest behaviour even when -R is not used, to make regrtest more deterministic.

Currently, when you run "./python -m test -r test_xxx test_yyyy", it's hard to guess the state of the RNG in test_yyy: it depends on many bytes were consumed by test_xxx. For example, if test_xxx is run on a buildbot, but skipped when I run it locally: we get a different behaviour.

I would prefer that test_yyy behaves the same when run with "./python -m test -r --randseed=5 test_xxx test_yyyy" (with test_xxx) and with "./python -m test -r --randseed=5 test_yyyy" (without test_xxx).

With my change, "./python -m test -r --randseed=5 test_yyyy test_yyyy" (sequential) and "./python -m test -r --randseed=5 -j2 test_yyyy test_yyyy" (parallel) runs test_yyy twice with the RNG in the same state.

Proposed change is part of a more global project to reduce side effects of tests, to make tests more reproductible and more "isolated".
msg300444 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-17 16:16
I'm not sure if we should use the same RNG seed for all tests, or create one seed per test when the option -r is used.

For example, I expect that "./python -m test -r -F test_tools" will catch a random bug which only occurs for a specific random seed.
msg300446 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-17 17:35
The PRNG is not the only source of the randomness in the tests. The exact behavior depends on the order of files in directories, on string hashes randomization, on address randomization, and on many other things out of our control. Couldn't reseeding the PRNG just add a false promise? The success in making tests deterministic can also narrow down the coverage of the testing. Some branches that lead to failures can be never executed. Our target not just making tests always success, but catch and fix even pretty rare errors.
msg300449 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-17 17:51
> The exact behavior depends on the order of files in directories, on string hashes randomization, on address randomization, and on many other things out of our control.

For hash randomization, maybe we need to generate a PYTHONHASHSEED, as tox test runner does.

For the filesystem: right, it's not possible to get 100% reproductible tests, but IMHO it's worth it to make them more reliable.

> Couldn't reseeding the PRNG just add a false promise?

I'm trying to fix random failures on the Refleaks buildbots, not to promise anything :-) To be honest, at this point, I don't know if it would be enough since I'm unable to reproduce bugs...

> The success in making tests deterministic can also narrow down the coverage of the testing. Some branches that lead to failures can be never executed. Our target not just making tests always success, but catch and fix even pretty rare errors.

I know that it's though question, and that's why I opened this issue, to discuss it :-)

But I see more and more projects to get more reproductible softwares and tests:

* https://reproducible-builds.org/
* systemd big project to get more "stateless" computers, or said differently: to isolate better services
* containers which also want to isolate services, "stateless" containers
* etc.

Other test runners, like tox, also make efforts to get reproductible tests, like setting PYTHONHASHSEED.
msg304888 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 09:27
I didn't get a strong +1 on the issue and I'm not convinced myself by my approach. Moreover, Refleaks buildbots now seem to be reliable thanks to other fixes. For all these reasons, I close the issue.
History
Date User Action Args
2017-10-24 09:27:23vstinnersetstatus: open -> closed
resolution: rejected
stage: resolved
2017-10-24 09:27:09vstinnersetmessages: + msg304888
2017-08-17 17:51:42vstinnersetmessages: + msg300449
2017-08-17 17:35:26serhiy.storchakasetnosy: + rhettinger, mark.dickinson, ezio.melotti, michael.foord
messages: + msg300446
2017-08-17 16:16:26vstinnersetmessages: + msg300444
2017-08-17 15:46:27vstinnersetmessages: + msg300441
2017-08-17 15:35:40pitrousetmessages: + msg300440
2017-08-17 15:33:59vstinnersetnosy: + pitrou, serhiy.storchaka
2017-08-17 15:32:13vstinnersetpull_requests: + pull_request3159
2017-08-17 15:32:04vstinnercreate