Issue 30263: regrtest: log the system load?

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74449

classification

Title:	regrtest: log the system load?
Type:	enhancement	Stage:	resolved
Components:	Tests	Versions:	Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	jeremy.kloth, jkloth, louielu, terry.reedy, vstinner
Priority:	normal	Keywords:

Created on 2017-05-03 23:09 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 1452	merged	vstinner, 2017-05-04 12:33
PR 2244	merged	vstinner, 2017-06-16 11:35
PR 3165	merged	vstinner, 2017-08-21 15:14

Messages (20)
msg292944 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-05-03 23:09
It's more and more common to get test failures only on a small group of buildbots, or even a single buildbot. In many cases, it's a race condition which depends on the system load. The system load depends on how many tests are running in parallel, if the buildbot slave allows multiple builds in parallel, etc. Sometimes, the failing test pass when it's run sequentially at the end of regrtest, sometimes it fails again. I propose to add the system load after the timestamp to ease debug. Example of output on a patched regrtest when tests are run sequentially: 0:19:37 load: 1.28 [304/405/1] test_sunau 0:19:37 load: 1.28 [305/405/1] test_ctypes ... 0:20:02 load: 1.86 [309/405/1] test__osx_support -- test_socketserver skipped (resource denied) ... 0:20:09 load: 1.79 [314/405/1] test_int_literal 0:20:09 load: 1.79 [315/405/1] test_queue 0:20:13 load: 1.73 [316/405/1] test_fork1 0:20:20 load: 1.67 [317/405/1] test_atexit ... 0:22:09 load: 1.16 [346/405/1] test_sys_setprofile 0:22:09 load: 1.16 [347/405/1] test_selectors 0:22:24 load: 0.98 [348/405/1] test_bufio 0:22:25 load: 0.98 [349/405/1] test_string_literals
msg292945 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-05-03 23:10
Example of test only failing on one specific buildbot: issue #30225. test_http_body_pipe() of test_urllib2 only fails on macOS Tiger under high system load.
msg292976 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-05-04 12:22
Logging the system load may also help to understand why, on the ARM buildbot, some builds fail with a timeout after 49 minutes, whereas an usual build take 13 min. Example of build killed with "command timed out: 1200 seconds without output": http://buildbot.python.org/all/builders/ARMv7%20Ubuntu%203.5/builds/235/steps/test/logs/stdio
msg292977 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-05-04 13:00
Hum, "load: 1.73" is confusing. I replaced it with "load avg: 1.73" to be more explicit, even if it's a little bit longer.
msg292978 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-05-04 13:21
New changeset 3d0056842c5e06b4102f990b59ab3b607f932dd8 by Victor Stinner in branch 'master': bpo-30263: regrtest: log system load (#1452) https://github.com/python/cpython/commit/3d0056842c5e06b4102f990b59ab3b607f932dd8
msg292979 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-05-04 13:39
Oh... I see many unexpected load average! On the small ARMv7 Ubuntu 3.x (1 CPU), -j8 is used: the load average starts with 2.58 but goes up to 11.13 around "0:04:20 load avg: 11.13 [216/405] ". Load average of 11 on such device seems too high and can explain test failures. PPC64 Fedora 3.x (64 CPU) starts at 8.48 and goes up to "0:01:33 load avg: 11.33", but it has much more CPUs, so it's ok.
msg292981 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-05-04 13:55
On Windows, there is a "System > Processor Queue Length" metric which is close to the UNIX system load, but I'm unable to see how to get this value.
msg292990 - (view)	Author: Louie Lu (louielu) *	Date: 2017-05-04 15:17
why will a 1 CPU environment using -j8? isn't that too much for it?
msg292992 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-05-04 15:21
> why will a 1 CPU environment using -j8? Many tests of the Python test suite spend a significant time on just sleeping and waiting for I/O. Using -jN with N > 1 runs tests much faster. > isn't that too much for it? That's a good question :-) At least, we now have one more tool to take a decision. Maybe the buildbot slave allows also more than one build in parallel? (ex: test 3.5 and 3.6 branches)
msg293148 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2017-05-06 02:24
I have a 6 core, 12 cpu Pentium. I previously stopped with -j10. Reading the above, I tried -j14 and -j18. Times for one run on a quiet system of 'test -jnn -ugui' with a fresh download and debug build were 3:55 (-j10), 3:48, and 3:53. Not much affect. A more promising source of reduced running time on this system is shown by the last 4 finish times: 2:42 tarfile (:42), 3:13 zipfile (:43), 3:16 venv (:54), and 3:48 multiprocessing_spawn (2:29). If m._s. were split into files taking at most 40 seconds, I suspect the total running time would be perhaps 20 seconds less due to better CPU utilization. Currently, the total running time could be from about 30 seconds less to about 2 minutes more depending on when m._s. starts in a randomized run.
msg296187 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-16 12:39
New changeset a0ccc54e6dffacf9e7c06f2a3e9056d2d35d21eb by Victor Stinner in branch '3.6': Synchronize libregrtest from master to 3.6 (#2244) https://github.com/python/cpython/commit/a0ccc54e6dffacf9e7c06f2a3e9056d2d35d21eb
msg296311 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2017-06-19 11:39
I've implemented a getloadavg() equivalent using Windows performance counters that could be used to display the load in regrtest, as it happens, before finding this issue. It is implemented in C (although it can be done in only Python). It has the identical properties of the Linux /proc/loadavg stats. An exponential decaying average using the same equations for 1m, 5m and 15m buckets. So, in theory, it would usable as `os.getloadavg()`. Before implementing this as a PR, I would like guidance as to the best place to integrate this: 1. as os.getloadavg(), 2. in _testcapi 3. in winapi (although it is a conglomeration of WinAPIs) 4. in regrtest (as Python). I believe that #1 would be a nice to have, but #2 as most likely. #3 doesn't fit so much with the "feel" of the winapi module; loadavg() being a utility function whereas the rest of that module is raw APIs. And, #4 as, well, last. Mostly due to the fact of walking C structures in Python sucks.
msg296312 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-19 11:43
> Before implementing this as a PR, I would like guidance as to the best place to integrate this: > 1. as os.getloadavg(), > 2. in _testcapi > 3. in winapi (although it is a conglomeration of WinAPIs) > 4. in regrtest (as Python). That's tough question. Usuaully, the os module is a thin wrapper to native functions. os.getloadavg() calls C getloadavg(). It doesn't implement the logic to open /proc/loadavg and parses it. Can I see the code somewhere before giving an answer? :-) If you want to add a public function, you must open a new issue. In case of doubt, we might start with a first implementation in libregrtest?
msg296319 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2017-06-19 12:13
Pushed utility to github: https://github.com/jkloth/loadavg It is currently just a command-line utility PoC. The routines of interest would be CalculateLoadReg and the loop in wmain().
msg296321 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-19 12:24
> Pushed utility to github: https://github.com/jkloth/loadavg Hum, I see two implementations: * LOADAVG_USE_REG: use the registry, RegQueryValueExW(HKEY_PERFORMANCE_DATA, L"2", ...) * LOADAVG_USE_PDH: PdhOpenQueryW() Are these API accessible by everyone? On Linux, the data is public to everybody: haypo@selma$ ls -l /proc/loadavg -r--r--r--. 1 root root 0 Jun 19 14:18 /proc/loadavg Why do you prefer registry over PDH? Should CalculateLoad be called every 5 seconds in a thread? If yes, IMHO it's a blocker issue for regrtest since some tests fail if other threads are running. I'm not sure that such code belongs to Python stdlib. What do you think of putting it on PyPI? For Python 2, sometimes I would like to use faulthandler on our buildbots since it's really hard to guess what crashed Python when a random test crash. But faulthandler was only added to Python 3.3. So using it on Python 2 would require to install it using pip. Right now, Python only requires itself to run tests. I'm not sure that it's a good idea to have dependencies, even if there are externals. Again, I'm not sure that this specific issue is the best place to discuss implementing "os.getloadavg()", since your implementation is more complex than a single function call, it seems like it requires to run a thread "in background", so maybe an object to start/stop the thread, etc.
msg296325 - (view)	Author: Jeremy Kloth (jeremy.kloth)	Date: 2017-06-19 12:53
On Mon, Jun 19, 2017 at 6:24 AM, STINNER Victor <report@bugs.python.org> wrote: > > STINNER Victor added the comment: > >> Pushed utility to github: https://github.com/jkloth/loadavg > > Hum, I see two implementations: > > * LOADAVG_USE_REG: use the registry, RegQueryValueExW(HKEY_PERFORMANCE_DATA, L"2", ...) > * LOADAVG_USE_PDH: PdhOpenQueryW() > > Are these API accessible by everyone? On Linux, the data is public to everybody: Yes. > Why do you prefer registry over PDH? A PDH query adds an additional 4MB to the running process. > Should CalculateLoad be called every 5 seconds in a thread? If yes, IMHO it's a blocker issue for regrtest since some tests fail if other threads are running. Yes, as that is exactly what Linux does (not in a thread, but as part of the scheduler). This is the reason for using the Thread Pool. They are kernel provided worker threads for the running process. As a matter of fact, Windows 10 now uses them to speed up DLL loading on startup so every Python process is running with 2+ threads. However, regtest when run with -jX already starts an add'l thread so that can no longer be an issue. > I'm not sure that such code belongs to Python stdlib. What do you think of putting it on PyPI? Sure, but packaging it up as a proper extension is quite low on my todo list. I am keen, however, on getting the buildbot fleet (well, mine in particular) in good shape. > For Python 2, sometimes I would like to use faulthandler on our buildbots since it's really hard to guess what crashed Python when a random test crash. But faulthandler was only added to Python 3.3. So using it on Python 2 would require to install it using pip. Right now, Python only requires itself to run tests. I'm not sure that it's a good idea to have dependencies, even if there are externals. Agreed, which is why I offered to implement as a patch to the test suite. > Again, I'm not sure that this specific issue is the best place to discuss implementing "os.getloadavg()", since your implementation is more complex than a single function call, it seems like it requires to run a thread "in background", so maybe an object to start/stop the thread, etc. Don't get too caught up with `os.getloadavg()`, it was just a possibility. I was leaning more towards a private utility function just for regrtest, but realized it may be useful outside of just testing. However, as you note, it does require a thread, which is why it is best implemented using the Thread Pool API to minimize any impact on Python proper.
msg296326 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-19 13:00
More general note about load average: at this point, I'm not sure of the usefulness of this information to debug race conditions :-)
msg296328 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2017-06-19 13:03
Race conditions, not so much, but definitely helps with random timeout errors ;)
msg296858 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-26 09:35
The load average is now logged to 2.7, 3.5, 3.6 and master branches. I didn't see any bug releated to this tiny addition, so I close this issue. Jeremy Kloth: "Pushed utility to github: https://github.com/jkloth/loadavg" I consider that this feature deserves its own issue since the implementation requires C code and is much more complex than just logging os.getloadavg().
msg300633 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-21 16:01
New changeset 02d4292df4bbd7a2cbf2716c9e6c22cae76e2093 by Victor Stinner in branch '2.7': bpo-30263: regrtest: add system load average (#3165) https://github.com/python/cpython/commit/02d4292df4bbd7a2cbf2716c9e6c22cae76e2093

History
Date	User	Action	Args
2022-04-11 14:58:46	admin	set	github: 74449
2017-08-21 16:01:14	vstinner	set	messages: + msg300633
2017-08-21 15:14:55	vstinner	set	pull_requests: + pull_request3201
2017-06-26 09:35:00	vstinner	set	status: open -> closed resolution: fixed messages: + msg296858 stage: resolved
2017-06-19 13:03:36	jkloth	set	messages: + msg296328
2017-06-19 13:00:50	vstinner	set	messages: + msg296326
2017-06-19 12:53:15	jeremy.kloth	set	nosy: + jeremy.kloth messages: + msg296325
2017-06-19 12:24:05	vstinner	set	messages: + msg296321
2017-06-19 12:13:56	jkloth	set	messages: + msg296319
2017-06-19 11:43:33	vstinner	set	messages: + msg296312
2017-06-19 11:39:12	jkloth	set	messages: + msg296311
2017-06-17 14:18:19	jkloth	set	nosy: + jkloth
2017-06-16 12:39:11	vstinner	set	messages: + msg296187
2017-06-16 11:35:58	vstinner	set	pull_requests: + pull_request2292
2017-05-06 02:24:14	terry.reedy	set	nosy: + terry.reedy messages: + msg293148
2017-05-04 15:21:03	vstinner	set	messages: + msg292992
2017-05-04 15:17:11	louielu	set	nosy: + louielu messages: + msg292990
2017-05-04 13:55:51	vstinner	set	messages: + msg292981
2017-05-04 13:39:16	vstinner	set	messages: + msg292979
2017-05-04 13:21:14	vstinner	set	messages: + msg292978
2017-05-04 13:00:17	vstinner	set	messages: + msg292977
2017-05-04 12:33:24	vstinner	set	pull_requests: + pull_request1550
2017-05-04 12:22:56	vstinner	set	messages: + msg292976
2017-05-03 23:10:39	vstinner	set	messages: + msg292945
2017-05-03 23:09:31	vstinner	create