classification
Title: Make sleep configurable in tests
Type: Stage:
Components: Versions: Python 3.4
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: ncoghlan, ned.deily, neologix, pitrou, python-dev, r.david.murray, rhettinger, schwab, serhiy.storchaka, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2014-03-13 13:18 by vstinner, last changed 2015-10-02 21:07 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
test_sleep.patch vstinner, 2014-03-13 13:18 review
more_reliable_tests.patch vstinner, 2014-03-13 17:55
Messages (24)
msg213395 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-13 13:18
Hi,

I'm trying to make Python test suite reliable, but it's hard because the speed of our farm of buildbots varies a lot. The whole test suite takes less than 10 minutes on some buildbots (or event less than 5 minutes), while it took 10 hours on other buildbots.

Some tests use time.sleep() as a basic synchronization primitive, because other synchronization primitive cannot be used. For example, lock_tests.py tests locks and so cannot use lock in its tests. The problem is that slow buildbots require long sleep. It makes the test suite slow, whereas fast buildbots could use short lseep.

Some tests make sure that an operation takes less than N seconds. It's very hard to choose a N value for all buildbots. To make tests pass on all buildbots, N can be configured for the slowest buildbots.

I propose to make "time" configurable in the test suite:

- add support.TEST_SLEEP constant: it can be between 50 ms and 1 sec, or longer for very slow buildbots
- add support.TEST_SHORT_SLEEP constant: shorter than TEST_SLEEP, it is usually used to yield the control to another thread or another process and it can be very short (1 ms)
- add a new check_time_delta() functions to check time deltas
- add a new --test-sleep command line option to regrtest.py (python -m test)

check_time_delta() uses the resolution of the clock to tolerate a small difference. This is important on Windows where the resolution is usually 15.6 ms whereas tests use timings close to 100 ms.

Seen values in tests:

- TEST_SLEEP: sleep between 100 ms and 3 sec
- TEST_SHORT_SLEEP: 1 ms or 10 ms
- check_time_delta(): tolerate between 500 and 800 ms on slow buildbots

I chose TEST_SHORT_SLEEP for time < 100 ms, and TEST_SLEEP for the other sleep.

Some issues related to slow buildbots:

- #11185
- #20101
- #20336

The goal of my changes is to run the test suite as fast as possible, so to reduce the sleep. My plan:

- keep current timings for buildbots, but use very short timings by default
- configure slow buildbots to use longer sleep
- use short sleep for the default buildbot configuration

Buildbot config of my patch:

- TEST_SLEEP: 500 ms
- TEST_SHORT_SLEEP: 10 ms
- check_time_delta().slow_buildbot: 1 sec

Default config of my patch:

- TEST_SLEEP: 50 ms
- TEST_SHORT_SLEEP: 1 ms
- check_time_delta().slow_buildbot: 0 ms

The patch doesn't change timings in tests executed in subprocesses, because test.regrtest only modifies test.support variables in the current process. It can be improved later.
msg213397 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-13 13:33
The test suite pass with test_sleep.patch applied on my fast Linux PC.

The test must be tested on Windows (because of the clock resolution of 15.6 ms).

I run the unit test using the command "./python -m test -j0 -rW".

By the way, do you have an idea to simulate a very slow buildbot?

Incomplete list of tests which fail with a sleep of 1 nanosecond:

test_asyncore
test_socket
test_wait4
test_faulthandler
test_time
test_threading
test_wait3
test_fork1

I didn't check if it's a bug in the test or not. But at least, this one looks like a bug in the test: sleep(0.5) takes *at least* 0.5 seconds, max_dt should be bigger (maybe 0.6 ? or at least 0.5001).

======================================================================
FAIL: test_monotonic (test.test_time.TimeTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/haypo/prog/python/default/Lib/test/test_time.py", line 397, in test_monotonic
    support.check_time_delta(0.5, dt, 0.5, clock='monotonic')
  File "/home/haypo/prog/python/default/Lib/test/support/__init__.py", line 2241, in check_time_delta
    message))
AssertionError: timing 500.4 ms seconds > max timing 500.0 ms seconds; the clock 'monotonic' has a resolution of 1.0 ns; tolerate 1.0 ns seconds


See also Nick's PEP:
"Core development workflow automation for CPython"
http://legacy.python.org/dev/peps/pep-0462/
msg213399 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-13 14:14
Would it be possible for regrtest to do some operations to estimate the speed and choose "good values" for the constants automatically?

If you have the right kind of VM setup, you could simulate a slow buildbot by giving the VM a low CPU share and then running something compute intensive elsewhere on the machine.  Since many of the buidbots *are* VMs, this would probably be a good test :)
msg213404 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-13 15:02
"Would it be possible for regrtest to do some operations to estimate the speed and choose "good values" for the constants automatically?"

It might be possible to make the parameters dynamic, for example using the system loads. But I'm not really interested to invest too much time on optimizing the timing parameters for buildbots. I would like to ensure that tests pass with a short "test sleep" on my PC. For example, it looks weird to check that a sleep of 500 ms takes less 1.5 seconds: why not testing that the sleep takes less than 600 ms?
msg213405 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-13 15:25
> The test must be tested on Windows (because of the clock resolution of 15.6 ms).

The test suite pass on Windows too ("PCbuild\python_d.exe -m test").
msg213407 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-13 15:32
> It might be possible to make the parameters dynamic, for example
> using
> the system loads. But I'm not really interested to invest too much time
> on optimizing the timing parameters for buildbots. I would like to
> ensure that tests pass with a short "test sleep" on my PC. For example,
> it looks weird to check that a sleep of 500 ms takes less 1.5 seconds:
> why not testing that the sleep takes less than 600 ms?

Because it's too tight for some systems.
msg213418 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-03-13 16:56
I got a few failures on my Windows machine running `PCbuild\python_d.exe -m test -j0 -rW` [1], only one of which looks relevant [2]:

======================================================================
FAIL: test_monotonic (test.test_time.TimeTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "P:\Projects\OSS\Python\hg.python.org\default\cpython\lib\test\test_time.
py", line 397, in test_monotonic
    support.check_time_delta(0.5, dt, 0.5, clock='monotonic')
  File "P:\Projects\OSS\Python\hg.python.org\default\cpython\lib\test\support\__
init__.py", line 2241, in check_time_delta
    message))
AssertionError: timing 608.0 ms seconds > max timing 500.0 ms seconds; the clock
 'monotonic' has a resolution of 15.6 ms; tolerate 1.0 ms seconds

----------------------------------------------------------------------
Ran 45 tests in 1.869s

FAILED (failures=1, skipped=12)
test test_time failed

It may be interesting to also run this on a couple of the Windows buildbot 'custom' builders, particularly the XP bot (which has an odd resolution for time.monotonic) and the x86 Server 2003 bot (which is notoriously slow).

[1] excluding test_urllib2_localnet and test_asyncio, test_urllib2_localnet doesn't play nice with my network and test_asyncio had an output-eating failure on a previous attempt
[2] the other failures were test_os, test_ssl, and test_socket.  I believe their failures to be issues with my machine.
msg213423 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-13 17:49
> AssertionError: timing 608.0 ms seconds > max timing 500.0 ms seconds;
> the clock 'monotonic' has a resolution of 15.6 ms;
> tolerate 1.0 ms seconds

Is it a virtual machine or a physical machine? Was your Windows busy? Did you run tests in parallel?

Only tolerate a difference of 1.0 ms is maybe too aggressive. A default of 100 ms is maybe more realistic?

Currently, this specific test accepts a difference of 500 ms:

         # Issue #20101: On some Windows machines, dt may be slightly low
-        self.assertTrue(0.45 <= dt <= 1.0, dt)
+        support.check_time_delta(0.5, dt, 0.5, clock='monotonic')
msg213424 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-13 17:55
more_reliable_tests.patch: Work-in-progress patch to make some tests more reliable with very short "test sleep" (I tested with 1 ns on my Linux box).
msg213427 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-13 18:01
Other tests should use check_time_delta():

test_wait_for_handle() of Lib/test/test_asyncio/test_windows_events.py
test_call_later() of Lib/test/test_asyncio/test_events.py
test_run_until_complete() of Lib/test/test_asyncio/test_events.py
msg213441 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-13 18:49
I don't like these patches at all. It is not obvious that making sleep times configurable improves things at all, and it's one more complication in the test suite that people will have to care about.
msg213442 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-03-13 19:01
I agree with Antoine.  There is never going to be a right value for sleep: the behavior of the buildbots can vary so much from run to run, especially if there are multiple VMs running on one host.  If anything, we should be working to remove as many "sleep" dependencies from the test suite as possible, not encourage their use.
msg213445 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-13 19:25
How much this patch speeds up testing? Especially interesting results for medium-speed buildbots (about a hour).
msg213473 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-13 21:20
"I don't like these patches at all. It is not obvious that making sleep times configurable improves things at all, and it's one more complication in the test suite that people will have to care about."

The default timings should choosen to work on most computers. The idea is to add an option for very slow buildbots. In my experience, sporadic test failures related to time only occur on a few buildbots. Sorry, I don't have the list, but most timing issues occur on between 3 or maybe 4 buildbots.


"I agree with Antoine.  There is never going to be a right value for sleep: the behavior of the buildbots can vary so much from run to run, especially if there are multiple VMs running on one host.  If anything, we should be working to remove as many "sleep" dependencies from the test suite as possible, not encourage their use."

Making tests faster is not my first target, it's just a side effect. My first goal is to make tests more reliable. I want all buildbots to be always 100% green.
msg213481 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-13 21:46
"How much this patch speeds up testing? Especially interesting results for medium-speed buildbots (about a hour)."

Quick benchmark, I only ran the test once. "Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz" (4 cores with HT: 8 logical cores) and 12 GB of RAM. I tested test_sleep.patch + more_reliable_tests.patch with TEST_SLEEP=1e-5 and TEST_SHORT_SLEEP=1e-6, 

Sequence tests
==============

Command: "time ./python -m test".

Original: 18 min 40 sec.

real	18m39.936s
user	10m43.139s
sys	1m1.410s

Patched patch: 7 min 52 sec (58% faster).

real	7m51.608s
user	5m1.324s
sys	0m13.376s

The speedup would be much lower in practice, a sleep of 10 us cannot be used for buildbots. But you may use such crazy sleeps can be used on your PC to test your patches faster.


Parallel tests
==============

Command: "time ./python -m test -j10"

Original: 3 min 21 sec

real	3m20.716s
user	18m17.689s
sys	0m45.073s

Patched patch: 3 min 12 sec *but 6 tests failed* (test_asyncore, test_ftplib, test_multiprocessing_fork, test_multiprocessing_forkserver, test_multiprocessing_spawn, test_threading).

real	3m11.723s
user	19m32.329s
sys	0m48.024s

Since the test suite failed on the patched Python, it's not possible to compare performances, but the total duration is very close to the original python. The speed up looks to be null.
msg213492 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-03-13 22:42
I've sped up "wait for something to happen" tests in other contexts and
found the best solution to be a polling loop so that success is as fast as
possible and failure may be slow.

For example, rather than just having a 10 second timeout, it's often better
to have a 500 ms timeout, and keep trying again until the overall 10 second
deadline has expired.

This means the worst case tolerance for slow systems can be increased
without hurting test speed on fast systems (assuming you don't break the
test).

I haven't looked at Victor's cases here to see if any of them are amenable
to that approach, though.
msg213493 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-13 22:58
"I've sped up "wait for something to happen" tests in other contexts and
found the best solution to be a polling loop so that success is as fast as possible and failure may be slow."

more_reliable_tests.patch tries to fix tests to not depend on timings. But it is not always possible to synchronize without "arbitrary sleep". 

test_sleep.patch helps to detect such unstable test.
msg213497 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-03-14 00:08
> Some tests use time.sleep() as a basic synchronization primitive,
> because other synchronization primitive cannot be used. For example,
> lock_tests.py tests locks and so cannot use lock in its tests.
> The problem is that slow buildbots require long sleep. It makes 
> the test suite slow, whereas fast buildbots could use short lseep

Would it be possible to create a decorator to make these test adaptive (run once with a short sleep and retry with a long sleep if the first fails)?
msg213557 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-03-14 14:45
Victor Stinner wrote:
> Is it a virtual machine or a physical machine? Was your Windows busy?
> Did you run tests in parallel?

Physical, not really other than the tests, and I ran with -j0 (on a machine with 2 single core CPUs).

I'm not sure what I think of the TEST_SLEEP/TEST_SHORT_SLEEP scheme, but I do like the idea behind support.check_time_delta.  I've come up with a couple of alternative ideas (which may or may not actually be worth anything :) in the same vein as the TEST_SLEEP constants:

1) define a support.sleep function that multiplies the value given by some definable constant (default of 1) before passing the value to time.sleep.
2) define a support.sleep_until function, which would sleep for a given interval repeatedly until some condition is satisfied or a timeout is reached.
msg213561 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-14 15:10
> Victor Stinner wrote:
>> Is it a virtual machine or a physical machine? Was your Windows busy?
>> Did you run tests in parallel?
>
> Physical, not really other than the tests, and I ran with -j0 (on a machine with 2 single core CPUs).

Ah yes, I missed the -j0. I didn't try with -j0. Some tests should be
fixed to support shorter sleep, or default sleep should be increased.

> I'm not sure what I think of the TEST_SLEEP/TEST_SHORT_SLEEP scheme, but I do like the idea behind support.check_time_delta.

If we cannot agree on the whole patch, I will split it in two parts:
check_time_delta() and TEST_SLEEP/TEST_SHORT_SLEEP.

> 1) define a support.sleep function that multiplies the value given by some definable constant (default of 1) before passing the value to time.sleep.

Oh, I like this idea. It's closer to what is done currently.

> 2) define a support.sleep_until function, which would sleep for a given interval repeatedly until some condition is satisfied or a timeout is reached.

I'm not sure that such helper will provide useful debug info if the test fails.
msg213657 - (view) Author: Andreas Schwab (schwab) * Date: 2014-03-15 16:02
test_io.CSignalsTest.check_interrupted_write can fail in an emulated environment when the alarm races with the write call.
msg213659 - (view) Author: Andreas Schwab (schwab) * Date: 2014-03-15 16:04
Also, sometimes tests fail in non-verbose mode, but succeed in verbose mode.
msg213916 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-03-17 23:23
I created #20964 to add the support.check_time_delta() function.
msg213919 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-03-17 23:39
New changeset d1c80a5b42ad by Victor Stinner in branch 'default':
Issue #20910: Make tests more reliable, less dependent on time
http://hg.python.org/cpython/rev/d1c80a5b42ad
History
Date User Action Args
2015-10-02 21:07:13vstinnersetstatus: open -> closed
resolution: out of date
2014-03-17 23:39:13python-devsetnosy: + python-dev
messages: + msg213919
2014-03-17 23:23:51vstinnersetmessages: + msg213916
2014-03-15 16:04:08schwabsetmessages: + msg213659
2014-03-15 16:02:05schwabsetnosy: + schwab
messages: + msg213657
2014-03-14 15:10:46vstinnersetmessages: + msg213561
2014-03-14 14:45:18zach.waresetmessages: + msg213557
2014-03-14 00:08:12rhettingersetnosy: + rhettinger
messages: + msg213497
2014-03-13 22:58:29vstinnersetmessages: + msg213493
2014-03-13 22:42:18ncoghlansetmessages: + msg213492
2014-03-13 21:46:17vstinnersetmessages: + msg213481
2014-03-13 21:20:06vstinnersetmessages: + msg213473
2014-03-13 19:25:09serhiy.storchakasetmessages: + msg213445
2014-03-13 19:01:17ned.deilysetnosy: + ned.deily
messages: + msg213442
2014-03-13 18:49:53pitrousetnosy: + neologix
messages: + msg213441
2014-03-13 18:01:59vstinnersetmessages: + msg213427
2014-03-13 17:55:01vstinnersetfiles: + more_reliable_tests.patch

messages: + msg213424
2014-03-13 17:49:54vstinnersetmessages: + msg213423
2014-03-13 16:56:05zach.waresetnosy: + zach.ware
messages: + msg213418
2014-03-13 15:32:27pitrousetmessages: + msg213407
2014-03-13 15:25:17vstinnersetmessages: + msg213405
2014-03-13 15:02:58vstinnersetmessages: + msg213404
2014-03-13 14:14:05r.david.murraysetnosy: + r.david.murray
messages: + msg213399
2014-03-13 13:33:24vstinnersetnosy: + ncoghlan, pitrou, serhiy.storchaka
messages: + msg213397
2014-03-13 13:18:57vstinnercreate