Issue 20910: Make sleep configurable in tests

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/65109

classification

Title:	Make sleep configurable in tests
Type:		Stage:
Components:		Versions:	Python 3.4

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	ncoghlan, ned.deily, neologix, pitrou, python-dev, r.david.murray, rhettinger, schwab, serhiy.storchaka, vstinner, zach.ware
Priority:	normal	Keywords:	patch

Created on 2014-03-13 13:18 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_sleep.patch	vstinner, 2014-03-13 13:18		review
more_reliable_tests.patch	vstinner, 2014-03-13 17:55

Messages (24)
msg213395 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-13 13:18
Hi, I'm trying to make Python test suite reliable, but it's hard because the speed of our farm of buildbots varies a lot. The whole test suite takes less than 10 minutes on some buildbots (or event less than 5 minutes), while it took 10 hours on other buildbots. Some tests use time.sleep() as a basic synchronization primitive, because other synchronization primitive cannot be used. For example, lock_tests.py tests locks and so cannot use lock in its tests. The problem is that slow buildbots require long sleep. It makes the test suite slow, whereas fast buildbots could use short lseep. Some tests make sure that an operation takes less than N seconds. It's very hard to choose a N value for all buildbots. To make tests pass on all buildbots, N can be configured for the slowest buildbots. I propose to make "time" configurable in the test suite: - add support.TEST_SLEEP constant: it can be between 50 ms and 1 sec, or longer for very slow buildbots - add support.TEST_SHORT_SLEEP constant: shorter than TEST_SLEEP, it is usually used to yield the control to another thread or another process and it can be very short (1 ms) - add a new check_time_delta() functions to check time deltas - add a new --test-sleep command line option to regrtest.py (python -m test) check_time_delta() uses the resolution of the clock to tolerate a small difference. This is important on Windows where the resolution is usually 15.6 ms whereas tests use timings close to 100 ms. Seen values in tests: - TEST_SLEEP: sleep between 100 ms and 3 sec - TEST_SHORT_SLEEP: 1 ms or 10 ms - check_time_delta(): tolerate between 500 and 800 ms on slow buildbots I chose TEST_SHORT_SLEEP for time < 100 ms, and TEST_SLEEP for the other sleep. Some issues related to slow buildbots: - #11185 - #20101 - #20336 The goal of my changes is to run the test suite as fast as possible, so to reduce the sleep. My plan: - keep current timings for buildbots, but use very short timings by default - configure slow buildbots to use longer sleep - use short sleep for the default buildbot configuration Buildbot config of my patch: - TEST_SLEEP: 500 ms - TEST_SHORT_SLEEP: 10 ms - check_time_delta().slow_buildbot: 1 sec Default config of my patch: - TEST_SLEEP: 50 ms - TEST_SHORT_SLEEP: 1 ms - check_time_delta().slow_buildbot: 0 ms The patch doesn't change timings in tests executed in subprocesses, because test.regrtest only modifies test.support variables in the current process. It can be improved later.
msg213397 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-13 13:33
The test suite pass with test_sleep.patch applied on my fast Linux PC. The test must be tested on Windows (because of the clock resolution of 15.6 ms). I run the unit test using the command "./python -m test -j0 -rW". By the way, do you have an idea to simulate a very slow buildbot? Incomplete list of tests which fail with a sleep of 1 nanosecond: test_asyncore test_socket test_wait4 test_faulthandler test_time test_threading test_wait3 test_fork1 I didn't check if it's a bug in the test or not. But at least, this one looks like a bug in the test: sleep(0.5) takes at least 0.5 seconds, max_dt should be bigger (maybe 0.6 ? or at least 0.5001). ====================================================================== FAIL: test_monotonic (test.test_time.TimeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/haypo/prog/python/default/Lib/test/test_time.py", line 397, in test_monotonic support.check_time_delta(0.5, dt, 0.5, clock='monotonic') File "/home/haypo/prog/python/default/Lib/test/support/__init__.py", line 2241, in check_time_delta message)) AssertionError: timing 500.4 ms seconds > max timing 500.0 ms seconds; the clock 'monotonic' has a resolution of 1.0 ns; tolerate 1.0 ns seconds See also Nick's PEP: "Core development workflow automation for CPython" http://legacy.python.org/dev/peps/pep-0462/
msg213399 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-03-13 14:14
Would it be possible for regrtest to do some operations to estimate the speed and choose "good values" for the constants automatically? If you have the right kind of VM setup, you could simulate a slow buildbot by giving the VM a low CPU share and then running something compute intensive elsewhere on the machine. Since many of the buidbots are VMs, this would probably be a good test :)
msg213404 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-13 15:02
"Would it be possible for regrtest to do some operations to estimate the speed and choose "good values" for the constants automatically?" It might be possible to make the parameters dynamic, for example using the system loads. But I'm not really interested to invest too much time on optimizing the timing parameters for buildbots. I would like to ensure that tests pass with a short "test sleep" on my PC. For example, it looks weird to check that a sleep of 500 ms takes less 1.5 seconds: why not testing that the sleep takes less than 600 ms?
msg213405 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-13 15:25
> The test must be tested on Windows (because of the clock resolution of 15.6 ms). The test suite pass on Windows too ("PCbuild\python_d.exe -m test").
msg213407 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2014-03-13 15:32
> It might be possible to make the parameters dynamic, for example > using > the system loads. But I'm not really interested to invest too much time > on optimizing the timing parameters for buildbots. I would like to > ensure that tests pass with a short "test sleep" on my PC. For example, > it looks weird to check that a sleep of 500 ms takes less 1.5 seconds: > why not testing that the sleep takes less than 600 ms? Because it's too tight for some systems.
msg213418 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2014-03-13 16:56
I got a few failures on my Windows machine running `PCbuild\python_d.exe -m test -j0 -rW` [1], only one of which looks relevant [2]: ====================================================================== FAIL: test_monotonic (test.test_time.TimeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "P:\Projects\OSS\Python\hg.python.org\default\cpython\lib\test\test_time. py", line 397, in test_monotonic support.check_time_delta(0.5, dt, 0.5, clock='monotonic') File "P:\Projects\OSS\Python\hg.python.org\default\cpython\lib\test\support\__ init__.py", line 2241, in check_time_delta message)) AssertionError: timing 608.0 ms seconds > max timing 500.0 ms seconds; the clock 'monotonic' has a resolution of 15.6 ms; tolerate 1.0 ms seconds ---------------------------------------------------------------------- Ran 45 tests in 1.869s FAILED (failures=1, skipped=12) test test_time failed It may be interesting to also run this on a couple of the Windows buildbot 'custom' builders, particularly the XP bot (which has an odd resolution for time.monotonic) and the x86 Server 2003 bot (which is notoriously slow). [1] excluding test_urllib2_localnet and test_asyncio, test_urllib2_localnet doesn't play nice with my network and test_asyncio had an output-eating failure on a previous attempt [2] the other failures were test_os, test_ssl, and test_socket. I believe their failures to be issues with my machine.
msg213423 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-13 17:49
> AssertionError: timing 608.0 ms seconds > max timing 500.0 ms seconds; > the clock 'monotonic' has a resolution of 15.6 ms; > tolerate 1.0 ms seconds Is it a virtual machine or a physical machine? Was your Windows busy? Did you run tests in parallel? Only tolerate a difference of 1.0 ms is maybe too aggressive. A default of 100 ms is maybe more realistic? Currently, this specific test accepts a difference of 500 ms: # Issue #20101: On some Windows machines, dt may be slightly low - self.assertTrue(0.45 <= dt <= 1.0, dt) + support.check_time_delta(0.5, dt, 0.5, clock='monotonic')
msg213424 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-13 17:55
more_reliable_tests.patch: Work-in-progress patch to make some tests more reliable with very short "test sleep" (I tested with 1 ns on my Linux box).
msg213427 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-13 18:01
Other tests should use check_time_delta(): test_wait_for_handle() of Lib/test/test_asyncio/test_windows_events.py test_call_later() of Lib/test/test_asyncio/test_events.py test_run_until_complete() of Lib/test/test_asyncio/test_events.py
msg213441 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2014-03-13 18:49
I don't like these patches at all. It is not obvious that making sleep times configurable improves things at all, and it's one more complication in the test suite that people will have to care about.
msg213442 - (view)	Author: Ned Deily (ned.deily) *	Date: 2014-03-13 19:01
I agree with Antoine. There is never going to be a right value for sleep: the behavior of the buildbots can vary so much from run to run, especially if there are multiple VMs running on one host. If anything, we should be working to remove as many "sleep" dependencies from the test suite as possible, not encourage their use.
msg213445 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-03-13 19:25
How much this patch speeds up testing? Especially interesting results for medium-speed buildbots (about a hour).
msg213473 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-13 21:20
"I don't like these patches at all. It is not obvious that making sleep times configurable improves things at all, and it's one more complication in the test suite that people will have to care about." The default timings should choosen to work on most computers. The idea is to add an option for very slow buildbots. In my experience, sporadic test failures related to time only occur on a few buildbots. Sorry, I don't have the list, but most timing issues occur on between 3 or maybe 4 buildbots. "I agree with Antoine. There is never going to be a right value for sleep: the behavior of the buildbots can vary so much from run to run, especially if there are multiple VMs running on one host. If anything, we should be working to remove as many "sleep" dependencies from the test suite as possible, not encourage their use." Making tests faster is not my first target, it's just a side effect. My first goal is to make tests more reliable. I want all buildbots to be always 100% green.
msg213481 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-13 21:46
"How much this patch speeds up testing? Especially interesting results for medium-speed buildbots (about a hour)." Quick benchmark, I only ran the test once. "Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz" (4 cores with HT: 8 logical cores) and 12 GB of RAM. I tested test_sleep.patch + more_reliable_tests.patch with TEST_SLEEP=1e-5 and TEST_SHORT_SLEEP=1e-6, Sequence tests ============== Command: "time ./python -m test". Original: 18 min 40 sec. real 18m39.936s user 10m43.139s sys 1m1.410s Patched patch: 7 min 52 sec (58% faster). real 7m51.608s user 5m1.324s sys 0m13.376s The speedup would be much lower in practice, a sleep of 10 us cannot be used for buildbots. But you may use such crazy sleeps can be used on your PC to test your patches faster. Parallel tests ============== Command: "time ./python -m test -j10" Original: 3 min 21 sec real 3m20.716s user 18m17.689s sys 0m45.073s Patched patch: 3 min 12 sec but 6 tests failed (test_asyncore, test_ftplib, test_multiprocessing_fork, test_multiprocessing_forkserver, test_multiprocessing_spawn, test_threading). real 3m11.723s user 19m32.329s sys 0m48.024s Since the test suite failed on the patched Python, it's not possible to compare performances, but the total duration is very close to the original python. The speed up looks to be null.
msg213492 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2014-03-13 22:42
I've sped up "wait for something to happen" tests in other contexts and found the best solution to be a polling loop so that success is as fast as possible and failure may be slow. For example, rather than just having a 10 second timeout, it's often better to have a 500 ms timeout, and keep trying again until the overall 10 second deadline has expired. This means the worst case tolerance for slow systems can be increased without hurting test speed on fast systems (assuming you don't break the test). I haven't looked at Victor's cases here to see if any of them are amenable to that approach, though.
msg213493 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-13 22:58
"I've sped up "wait for something to happen" tests in other contexts and found the best solution to be a polling loop so that success is as fast as possible and failure may be slow." more_reliable_tests.patch tries to fix tests to not depend on timings. But it is not always possible to synchronize without "arbitrary sleep". test_sleep.patch helps to detect such unstable test.
msg213497 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2014-03-14 00:08
> Some tests use time.sleep() as a basic synchronization primitive, > because other synchronization primitive cannot be used. For example, > lock_tests.py tests locks and so cannot use lock in its tests. > The problem is that slow buildbots require long sleep. It makes > the test suite slow, whereas fast buildbots could use short lseep Would it be possible to create a decorator to make these test adaptive (run once with a short sleep and retry with a long sleep if the first fails)?
msg213557 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2014-03-14 14:45
Victor Stinner wrote: > Is it a virtual machine or a physical machine? Was your Windows busy? > Did you run tests in parallel? Physical, not really other than the tests, and I ran with -j0 (on a machine with 2 single core CPUs). I'm not sure what I think of the TEST_SLEEP/TEST_SHORT_SLEEP scheme, but I do like the idea behind support.check_time_delta. I've come up with a couple of alternative ideas (which may or may not actually be worth anything :) in the same vein as the TEST_SLEEP constants: 1) define a support.sleep function that multiplies the value given by some definable constant (default of 1) before passing the value to time.sleep. 2) define a support.sleep_until function, which would sleep for a given interval repeatedly until some condition is satisfied or a timeout is reached.
msg213561 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-14 15:10
> Victor Stinner wrote: >> Is it a virtual machine or a physical machine? Was your Windows busy? >> Did you run tests in parallel? > > Physical, not really other than the tests, and I ran with -j0 (on a machine with 2 single core CPUs). Ah yes, I missed the -j0. I didn't try with -j0. Some tests should be fixed to support shorter sleep, or default sleep should be increased. > I'm not sure what I think of the TEST_SLEEP/TEST_SHORT_SLEEP scheme, but I do like the idea behind support.check_time_delta. If we cannot agree on the whole patch, I will split it in two parts: check_time_delta() and TEST_SLEEP/TEST_SHORT_SLEEP. > 1) define a support.sleep function that multiplies the value given by some definable constant (default of 1) before passing the value to time.sleep. Oh, I like this idea. It's closer to what is done currently. > 2) define a support.sleep_until function, which would sleep for a given interval repeatedly until some condition is satisfied or a timeout is reached. I'm not sure that such helper will provide useful debug info if the test fails.
msg213657 - (view)	Author: Andreas Schwab (schwab) *	Date: 2014-03-15 16:02
test_io.CSignalsTest.check_interrupted_write can fail in an emulated environment when the alarm races with the write call.
msg213659 - (view)	Author: Andreas Schwab (schwab) *	Date: 2014-03-15 16:04
Also, sometimes tests fail in non-verbose mode, but succeed in verbose mode.
msg213916 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-03-17 23:23
I created #20964 to add the support.check_time_delta() function.
msg213919 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-03-17 23:39
New changeset d1c80a5b42ad by Victor Stinner in branch 'default': Issue #20910: Make tests more reliable, less dependent on time http://hg.python.org/cpython/rev/d1c80a5b42ad

History
Date	User	Action	Args
2022-04-11 14:57:59	admin	set	github: 65109
2015-10-02 21:07:13	vstinner	set	status: open -> closed resolution: out of date
2014-03-17 23:39:13	python-dev	set	nosy: + python-dev messages: + msg213919
2014-03-17 23:23:51	vstinner	set	messages: + msg213916
2014-03-15 16:04:08	schwab	set	messages: + msg213659
2014-03-15 16:02:05	schwab	set	nosy: + schwab messages: + msg213657
2014-03-14 15:10:46	vstinner	set	messages: + msg213561
2014-03-14 14:45:18	zach.ware	set	messages: + msg213557
2014-03-14 00:08:12	rhettinger	set	nosy: + rhettinger messages: + msg213497
2014-03-13 22:58:29	vstinner	set	messages: + msg213493
2014-03-13 22:42:18	ncoghlan	set	messages: + msg213492
2014-03-13 21:46:17	vstinner	set	messages: + msg213481
2014-03-13 21:20:06	vstinner	set	messages: + msg213473
2014-03-13 19:25:09	serhiy.storchaka	set	messages: + msg213445
2014-03-13 19:01:17	ned.deily	set	nosy: + ned.deily messages: + msg213442
2014-03-13 18:49:53	pitrou	set	nosy: + neologix messages: + msg213441
2014-03-13 18:01:59	vstinner	set	messages: + msg213427
2014-03-13 17:55:01	vstinner	set	files: + more_reliable_tests.patch messages: + msg213424
2014-03-13 17:49:54	vstinner	set	messages: + msg213423
2014-03-13 16:56:05	zach.ware	set	nosy: + zach.ware messages: + msg213418
2014-03-13 15:32:27	pitrou	set	messages: + msg213407
2014-03-13 15:25:17	vstinner	set	messages: + msg213405
2014-03-13 15:02:58	vstinner	set	messages: + msg213404
2014-03-13 14:14:05	r.david.murray	set	nosy: + r.david.murray messages: + msg213399
2014-03-13 13:33:24	vstinner	set	nosy: + ncoghlan, pitrou, serhiy.storchaka messages: + msg213397
2014-03-13 13:18:57	vstinner	create