classification
Title: regrtest: log the system load?
Type: enhancement Stage: resolved
Components: Tests Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: jeremy.kloth, jkloth, louielu, terry.reedy, vstinner
Priority: normal Keywords:

Created on 2017-05-03 23:09 by vstinner, last changed 2017-08-21 16:01 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1452 merged vstinner, 2017-05-04 12:33
PR 2244 merged vstinner, 2017-06-16 11:35
PR 3165 merged vstinner, 2017-08-21 15:14
Messages (20)
msg292944 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-03 23:09
It's more and more common to get test failures only on a small group of buildbots, or even a single buildbot. In many cases, it's a race condition which depends on the system load. The system load depends on how many tests are running in parallel, if the buildbot slave allows multiple builds in parallel, etc. Sometimes, the failing test pass when it's run sequentially at the end of regrtest, sometimes it fails again.

I propose to add the system load after the timestamp to ease debug. Example of output on a patched regrtest when tests are run sequentially:

0:19:37 load: 1.28 [304/405/1] test_sunau
0:19:37 load: 1.28 [305/405/1] test_ctypes
...
0:20:02 load: 1.86 [309/405/1] test__osx_support -- test_socketserver skipped (resource denied)
...
0:20:09 load: 1.79 [314/405/1] test_int_literal
0:20:09 load: 1.79 [315/405/1] test_queue
0:20:13 load: 1.73 [316/405/1] test_fork1
0:20:20 load: 1.67 [317/405/1] test_atexit
...
0:22:09 load: 1.16 [346/405/1] test_sys_setprofile
0:22:09 load: 1.16 [347/405/1] test_selectors
0:22:24 load: 0.98 [348/405/1] test_bufio
0:22:25 load: 0.98 [349/405/1] test_string_literals
msg292945 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-03 23:10
Example of test only failing on one specific buildbot: issue #30225. test_http_body_pipe() of test_urllib2 only fails on macOS Tiger under high system load.
msg292976 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-04 12:22
Logging the system load may also help to understand why, on the ARM buildbot, some builds fail with a timeout after 49 minutes, whereas an usual build take 13 min.

Example of build killed with "command timed out: 1200 seconds without output":

http://buildbot.python.org/all/builders/ARMv7%20Ubuntu%203.5/builds/235/steps/test/logs/stdio
msg292977 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-04 13:00
Hum, "load: 1.73" is confusing. I replaced it with "load avg: 1.73" to be more explicit, even if it's a little bit longer.
msg292978 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-04 13:21
New changeset 3d0056842c5e06b4102f990b59ab3b607f932dd8 by Victor Stinner in branch 'master':
bpo-30263: regrtest: log system load (#1452)
https://github.com/python/cpython/commit/3d0056842c5e06b4102f990b59ab3b607f932dd8
msg292979 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-04 13:39
Oh... I see many unexpected load average!

On the small ARMv7 Ubuntu 3.x (1 CPU), -j8 is used: the load average starts with 2.58 but goes up to 11.13 around "0:04:20 load avg: 11.13 [216/405] ". Load average of 11 on such device seems too high and can explain test failures.

PPC64 Fedora 3.x (64 CPU) starts at 8.48 and goes up to "0:01:33 load avg: 11.33", but it has much more CPUs, so it's ok.
msg292981 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-04 13:55
On Windows, there is a "System > Processor Queue Length" metric which is close to the UNIX system load, but I'm unable to see how to get this value.
msg292990 - (view) Author: Louie Lu (louielu) * Date: 2017-05-04 15:17
why will a 1 CPU environment using -j8?
isn't that too much for it?
msg292992 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-04 15:21
> why will a 1 CPU environment using -j8?

Many tests of the Python test suite spend a significant time on just sleeping and waiting for I/O. Using -jN with N > 1 runs tests much faster.

> isn't that too much for it?

That's a good question :-) At least, we now have one more tool to take a decision. Maybe the buildbot slave allows also more than one build in parallel? (ex: test 3.5 and 3.6 branches)
msg293148 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-05-06 02:24
I have a 6 core, 12 cpu Pentium.  I previously stopped with -j10.  Reading the above, I tried -j14 and -j18.  Times for one run on a quiet system of 'test -jnn -ugui' with a fresh download and debug build were 3:55 (-j10), 3:48, and 3:53.  Not much affect.

A more promising source of reduced running time on this system is shown by the last 4 finish times: 2:42 tarfile (:42), 3:13 zipfile (:43), 3:16 venv (:54), and 3:48 multiprocessing_spawn (2:29).  If m._s. were split into files taking at most  40 seconds, I suspect the total running time would be perhaps 20 seconds less due to better CPU utilization. Currently, the total running time could be from about 30 seconds less to about 2 minutes more depending on when m._s. starts in a randomized run.
msg296187 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-16 12:39
New changeset a0ccc54e6dffacf9e7c06f2a3e9056d2d35d21eb by Victor Stinner in branch '3.6':
Synchronize libregrtest from master to 3.6 (#2244)
https://github.com/python/cpython/commit/a0ccc54e6dffacf9e7c06f2a3e9056d2d35d21eb
msg296311 - (view) Author: Jeremy Kloth (jkloth) * Date: 2017-06-19 11:39
I've implemented a getloadavg() equivalent using Windows performance counters that could be used to display the load in regrtest, as it happens, before finding this issue.  It is implemented in C (although it *can* be done in only Python).

It has the identical properties of the Linux /proc/loadavg stats.  An exponential decaying average using the same equations for 1m, 5m and 15m buckets.  So, in theory, it would usable as `os.getloadavg()`.

Before implementing this as a PR, I would like guidance as to the best place to integrate this:
 1. as os.getloadavg(),
 2. in _testcapi
 3. in winapi (although it is a conglomeration of WinAPIs)
 4. in regrtest (as Python).

I believe that #1 would be a nice to have, but #2 as most likely. 
 #3 doesn't fit so much with the "feel" of the winapi module; loadavg() being a utility function whereas the rest of that module is raw APIs.  And, #4 as, well, last.  Mostly due to the fact of walking C structures in Python sucks.
msg296312 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-19 11:43
> Before implementing this as a PR, I would like guidance as to the best place to integrate this:
> 1. as os.getloadavg(),
> 2. in _testcapi
> 3. in winapi (although it is a conglomeration of WinAPIs)
> 4. in regrtest (as Python).

That's tough question. Usuaully, the os module is a thin wrapper to native functions. os.getloadavg() calls C getloadavg(). It doesn't implement the logic to open /proc/loadavg and parses it.

Can I see the code somewhere before giving an answer? :-)

If you want to add a public function, you must open a new issue.

In case of doubt, we might start with a first implementation in libregrtest?
msg296319 - (view) Author: Jeremy Kloth (jkloth) * Date: 2017-06-19 12:13
Pushed utility to github:  https://github.com/jkloth/loadavg

It is currently just a command-line utility PoC.  The routines of interest would be CalculateLoadReg and the loop in wmain().
msg296321 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-19 12:24
> Pushed utility to github:  https://github.com/jkloth/loadavg

Hum, I see two implementations:

* LOADAVG_USE_REG: use the registry, RegQueryValueExW(HKEY_PERFORMANCE_DATA, L"2", ...)
* LOADAVG_USE_PDH: PdhOpenQueryW()

Are these API accessible by everyone? On Linux, the data is public to everybody:

haypo@selma$ ls -l /proc/loadavg 
-r--r--r--. 1 root root 0 Jun 19 14:18 /proc/loadavg

Why do you prefer registry over PDH?

Should CalculateLoad be called every 5 seconds in a thread? If yes, IMHO it's a blocker issue for regrtest since some tests fail if other threads are running.

I'm not sure that such code belongs to Python stdlib. What do you think of putting it on PyPI?

For Python 2, sometimes I would like to use faulthandler on our buildbots since it's really hard to guess what crashed Python when a random test crash. But faulthandler was only added to Python 3.3. So using it on Python 2 would require to install it using pip. Right now, Python only requires itself to run tests. I'm not sure that it's a good idea to have dependencies, even if there are externals.

Again, I'm not sure that this specific issue is the best place to discuss implementing "os.getloadavg()", since your implementation is more complex than a single function call, it seems like it requires to run a thread "in background", so maybe an object to start/stop the thread, etc.
msg296325 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2017-06-19 12:53
On Mon, Jun 19, 2017 at 6:24 AM, STINNER Victor <report@bugs.python.org> wrote:
>
> STINNER Victor added the comment:
>
>> Pushed utility to github:  https://github.com/jkloth/loadavg
>
> Hum, I see two implementations:
>
> * LOADAVG_USE_REG: use the registry, RegQueryValueExW(HKEY_PERFORMANCE_DATA, L"2", ...)
> * LOADAVG_USE_PDH: PdhOpenQueryW()
>
> Are these API accessible by everyone? On Linux, the data is public to everybody:

Yes.

> Why do you prefer registry over PDH?

A PDH query adds an additional 4MB to the running process.

> Should CalculateLoad be called every 5 seconds in a thread? If yes, IMHO it's a blocker issue for regrtest since some tests fail if other threads are running.

Yes, as that is exactly what Linux does (not in a thread, but as part
of the scheduler).  This is the reason for using the Thread Pool.
They are kernel provided worker threads for the running process.  As a
matter of fact, Windows 10 now uses them to speed up DLL loading on
startup so every Python process is running with 2+ threads.

However, regtest when run with -jX already starts an add'l thread so
that can no longer be an issue.

> I'm not sure that such code belongs to Python stdlib. What do you think of putting it on PyPI?

Sure, but packaging it up as a proper extension is quite low on my
todo list.  I am keen, however, on getting the buildbot fleet (well,
mine in particular) in good shape.

> For Python 2, sometimes I would like to use faulthandler on our buildbots since it's really hard to guess what crashed Python when a random test crash. But faulthandler was only added to Python 3.3. So using it on Python 2 would require to install it using pip. Right now, Python only requires itself to run tests. I'm not sure that it's a good idea to have dependencies, even if there are externals.

Agreed, which is why I offered to implement as a patch to the test suite.

> Again, I'm not sure that this specific issue is the best place to discuss implementing "os.getloadavg()", since your implementation is more complex than a single function call, it seems like it requires to run a thread "in background", so maybe an object to start/stop the thread, etc.

Don't get too caught up with `os.getloadavg()`, it was just a
possibility.  I was leaning more towards a private utility function
just for regrtest, but realized it may be useful outside of just
testing.  However, as you note, it does require a thread, which is why
it is best implemented using the Thread Pool API to minimize any
impact on Python proper.
msg296326 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-19 13:00
More general note about load average: at this point, I'm not sure of
the usefulness of this information to debug race conditions :-)
msg296328 - (view) Author: Jeremy Kloth (jkloth) * Date: 2017-06-19 13:03
Race conditions, not so much, but definitely helps with random timeout errors ;)
msg296858 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-26 09:35
The load average is now logged to 2.7, 3.5, 3.6 and master branches. I didn't see any bug releated to this tiny addition, so I close this issue.


Jeremy Kloth: "Pushed utility to github:  https://github.com/jkloth/loadavg"

I consider that this feature deserves its own issue since the implementation requires C code and is much more complex than just logging os.getloadavg().
msg300633 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-21 16:01
New changeset 02d4292df4bbd7a2cbf2716c9e6c22cae76e2093 by Victor Stinner in branch '2.7':
bpo-30263: regrtest: add system load average (#3165)
https://github.com/python/cpython/commit/02d4292df4bbd7a2cbf2716c9e6c22cae76e2093
History
Date User Action Args
2017-08-21 16:01:14vstinnersetmessages: + msg300633
2017-08-21 15:14:55vstinnersetpull_requests: + pull_request3201
2017-06-26 09:35:00vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg296858

stage: resolved
2017-06-19 13:03:36jklothsetmessages: + msg296328
2017-06-19 13:00:50vstinnersetmessages: + msg296326
2017-06-19 12:53:15jeremy.klothsetnosy: + jeremy.kloth
messages: + msg296325
2017-06-19 12:24:05vstinnersetmessages: + msg296321
2017-06-19 12:13:56jklothsetmessages: + msg296319
2017-06-19 11:43:33vstinnersetmessages: + msg296312
2017-06-19 11:39:12jklothsetmessages: + msg296311
2017-06-17 14:18:19jklothsetnosy: + jkloth
2017-06-16 12:39:11vstinnersetmessages: + msg296187
2017-06-16 11:35:58vstinnersetpull_requests: + pull_request2292
2017-05-06 02:24:14terry.reedysetnosy: + terry.reedy
messages: + msg293148
2017-05-04 15:21:03vstinnersetmessages: + msg292992
2017-05-04 15:17:11louielusetnosy: + louielu
messages: + msg292990
2017-05-04 13:55:51vstinnersetmessages: + msg292981
2017-05-04 13:39:16vstinnersetmessages: + msg292979
2017-05-04 13:21:14vstinnersetmessages: + msg292978
2017-05-04 13:00:17vstinnersetmessages: + msg292977
2017-05-04 12:33:24vstinnersetpull_requests: + pull_request1550
2017-05-04 12:22:56vstinnersetmessages: + msg292976
2017-05-03 23:10:39vstinnersetmessages: + msg292945
2017-05-03 23:09:31vstinnercreate