Issue 34060: regrtest: log "CPU usage" on Windows

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/78241

classification

Title:	regrtest: log "CPU usage" on Windows
Type:		Stage:	resolved
Components:	Tests, Windows	Versions:	Python 3.8

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	ammar2, cheryl.sabella, giampaolo.rodola, jkloth, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority:	normal	Keywords:	patch

Created on 2018-07-06 16:01 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
benchmark.PNG	ammar2, 2018-07-16 09:36

Pull Requests
URL	Status	Linked	Edit
PR 8287	closed	ammar2, 2018-07-15 02:57
PR 8357	merged	ammar2, 2018-07-20 20:26

Messages (31)
msg321177 - (view)	Author: STINNER Victor (vstinner) *	Date: 2018-07-06 16:01
I would help to debug race conditions on Windows to log the "CPU usage" on regrtest, as we do on other platforms (using os.getloadavg()). Links: * https://stackoverflow.com/questions/13666458/python-collect-system-stats * https://docs.microsoft.com/en-us/windows/desktop/api/pdh/nf-pdh-pdhmakecounterpatha * GetSystemTimes()? * https://stackoverflow.com/questions/23143693/retrieving-cpu-load-percent-total-in-windows-with-c
msg321646 - (view)	Author: Ammar Askar (ammar2) *	Date: 2018-07-14 07:52
Annoyingly, it looks like Windows does not provide an API that gives an average value. There is a counter exposed called "System \ Processor Queue Length" which does what the equivalent of unix's load https://blogs.technet.microsoft.com/askperf/2008/01/15/an-overview-of-processor-bottlenecks/ But we're gonna have to average it ourselves if we want this information.
msg321647 - (view)	Author: Ammar Askar (ammar2) *	Date: 2018-07-14 07:55
Taking a shot at this, should take a day or so.
msg321648 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2018-07-14 10:31
There is an old ticket for this in psutil with some (possible useful) references in it: https://github.com/giampaolo/psutil/issues/604
msg321660 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2018-07-14 20:46
Also prior conversation: https://bugs.python.org/issue30263#msg296311
msg321672 - (view)	Author: Ammar Askar (ammar2) *	Date: 2018-07-15 02:57
Thanks a lot for that link Jeremy, it was really helpful. After reading up on it, my take is that winapi is the most appropriate place for this, it is a non public api that's used in the stdlib. I've used Windows APIs in a way that we don't need to manually start up a thread and call a calc_load function, instead using a callback invoked by windows. Internally this uses a thread pool, but it means we don't have to worry about managing the thread ourselves. The load is stored as a global but the winapi module is already marked as "-1" indicating it has global state, so that shouldn't be a problem. https://docs.python.org/3/c-api/module.html#c.PyModuleDef.m_size Like Jeremy noted, using WMI does add a 5mb overhead or so to the calling process. One more caveat is that the PdhAddEnglishCounterW function is only available in Vista+. I'm not sure if we still support Windows XP, but the alternative is to use PdhAddCounter, which breaks if the system language is not english because the counter paths are localized.
msg321673 - (view)	Author: Ammar Askar (ammar2) *	Date: 2018-07-15 03:00
Opened up a PR with a proof of concept to get feedback on if this approach is reasonable. Giampaolo, on your psutil issue you specifically said, "(possibly without using WMI)" Is there any particular problem with using WMI?
msg321687 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2018-07-15 10:01
> Giampaolo, on your psutil issue you specifically said, "(possibly without using WMI)" Is there any particular problem with using WMI? Performance. In general WMI is (a lot) slower than the Windows API counterpart (psutil never uses WMI except in unit tests). Don't know if this matters for this specific issue though, or whether a correspondent Windows API for doing the same thing exists.
msg321688 - (view)	Author: Ammar Askar (ammar2) *	Date: 2018-07-15 10:10
Aah, yeah I don't think there's a good way of doing it purely from the windows API. There might be a way to enumerate through all the processes and see if they're queued up but I didn't look into it. In this case it should be fine, we just pay a bit of WMI cost to initialize the query, all the updating and retrieval is done asynchronously to the python code.
msg321690 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2018-07-15 12:40
Not that it matters all that much, but from a terminology standpoint, WMI != PDH != Performance Counters. Performance counters (the objects, not the topic) are provided by DLLs registered in the HKLM\SYSTEM\CurrentControlSet\Services key. Their data is accessed via registry API functions using the HKEY_PERFORMANCE_DATA root key. PDH (Performance Data Helper) provides an abstraction layer that can access those values among other things like a GUI or writing to log files. WMI (Windows Management Instrumentation) is yet another layer on top of PDH and raw Performance Counters. In this case of the "System" performance counter object, it is provided by a performance DLL (perfos.dll in the case of Win10 1803). If overhead (memory and/or CPU) is a concern, then accessing the counter data via the registry is the way to go.
msg321691 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2018-07-15 12:57
I suppose there is no way to emulate os.getloadavg() on Windows because that would necessarily imply using a thread to call the necessary routine (WMI, PDH, whatever...) every X secs or something, correct?
msg321692 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2018-07-15 14:26
Correct. Windows provides the building blocks for implementing getloadavg(), but does not provide an interface that does the averaging. That is deferred to a per application basis. The best that an application can do for that is to use thread pools. You can think of thread pools as kernel-managed threads (different from user-managed threads via CreateThread()). As of Win10 1703, any process linked with DLLs automatically have thread pools created for them (to parallel-ize the loading of said DLLs). Leveraging that feature would minimize the costs incurred to do the running average.
msg321694 - (view)	Author: Ammar Askar (ammar2) *	Date: 2018-07-15 19:07
Is the function I used for the callback, RegisterWaitForSingleObject https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-registerwaitforsingleobject using the thread pooling system you're talking about underneath? The documentation seems to suggest so, but I'll wait for your opinion on it.
msg321699 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2018-07-16 00:13
The RegisterWaitForSingleObject() function does use the thread pool API: https://docs.microsoft.com/en-us/windows/desktop/ProcThread/thread-pool-api However, PdhCollectQueryDataEx() also creates a user-space thread to handle its work of setting the event whenever the timeout elapses. I would think that it should be preferable to use a single thread pool worker to retrieve the queue length and calculate the average.
msg321700 - (view)	Author: Ammar Askar (ammar2) *	Date: 2018-07-16 00:24
Roger, personally I don't think its worth it to complicate the code in order to use the thread pool API. Especially considering this is just a private API and the only consumer will be the test suite runner. Let's see what the core devs think when this gets reviewed.
msg321715 - (view)	Author: STINNER Victor (vstinner) *	Date: 2018-07-16 07:39
Logging the current value can be an acceptable compromise. When we run tests in subprocesses (python3 -m test -jN), we can run a test every N milliseconds with no thread. It's ok if the average is not accurate. Most buildbots use -jN for performance but also to isolate the main processes from crashes and hard timeouts.
msg321724 - (view)	Author: STINNER Victor (vstinner) *	Date: 2018-07-16 09:26
https://docs.microsoft.com/en-us/sql/relational-databases/performance-monitor/monitor-cpu-usage?view=sql-server-2017 Processor Queue Length: Corresponds to the number of threads waiting for processor time. A processor bottleneck develops when threads of a process require more processor cycles than are available. If more than a few processes attempt to utilize the processor's time, you might need to install a faster processor. Or, if you have a multiprocessor system, you could add a processor. When you examine processor usage, consider the type of work that the instance of SQL Server performs. If SQL Server performs many calculations, such as queries involving aggregates or memory-bound queries that require no disk I/O, 100 percent of the processor's time can be used. If this causes the performance of other applications to suffer, try changing the workload. For example, dedicate the computer to running the instance of SQL Server. Usage rates around 100 percent, where many client requests are being processed, may indicate that processes are queuing up, waiting for processor time, and causing a bottleneck. Resolve the problem by adding faster processors. -- Is it exactly the same thing on Unix (load average)? If not, I would prefer to use a different name in regrtest and "loadavg". Maybe "PQL avg"? What is the impact of the number of CPUs on this value?
msg321726 - (view)	Author: Ammar Askar (ammar2) *	Date: 2018-07-16 09:36
I don't think taking instantaneous values instead of averaging will work out too well. For reference I've attached a screenshot. It has sampled values at every second on an unloaded computer and then with running prime95 for cpu stress testing. The load tends to peak and fall. >Is it exactly the same thing on Unix (load average)? Indeed it is: https://en.wikipedia.org/wiki/Load_(computing)#Unix-style_load_calculation "An idle computer has a load number of 0 (the idle process isn't counted). Each process using or waiting for CPU (the ready queue or run queue) increments the load number by 1." From what I can tell, the number of processors are dealt with the same way as on Linux, that is, a single core processor is overloaded when the load is >1 and a quad core processor is overloaded when the load is >4
msg321869 - (view)	Author: STINNER Victor (vstinner) *	Date: 2018-07-18 08:33
I found a command to get the CPU usage in percent per CPU. Here with 2 CPUs: vstinner@WIN C:\vstinner\python\master>wmic cpu get loadpercentage LoadPercentage 100 100
msg321886 - (view)	Author: Steve Dower (steve.dower) *	Date: 2018-07-18 15:00
> wmic cpu ... This is the WMI solution we are trying to avoid. But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command? I also haven't seen it suggested, but perhaps GetProcessTimes (https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-getprocesstimes) (or GetThreadTimes) would provide enough information to detect the same information?
msg321902 - (view)	Author: STINNER Victor (vstinner) *	Date: 2018-07-18 17:10
> I also haven't seen it suggested, but perhaps GetProcessTimes (https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-getprocesstimes) (or GetThreadTimes) would provide enough information to detect the same information? My intent is to get an idea if the whole system is busy. Not if the current Python process is busy. Most buildbots run tests with multiple worker processes (at least 2).
msg321903 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2018-07-18 17:25
psutil exposes this functionality as "psutil.cpu_percent()": https://github.com/giampaolo/psutil/blob/ac9dccab6b038835b5d612f92cf4804ec2662c2e/psutil/_psutil_windows.c#L992 https://github.com/giampaolo/psutil/blob/ac9dccab6b038835b5d612f92cf4804ec2662c2e/psutil/__init__.py#L1675 I'm not sure if it's worth it to copy all that stuff into Modules/_winapi.c and test/libregrtest/main.py though. It would probably be simpler to change the policy and allow (at least some) some third party libs in cPython's test suite. =)
msg321904 - (view)	Author: Ammar Askar (ammar2) *	Date: 2018-07-18 17:26
> But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command? This sounds like a very good solution to me, it avoids adding the complexity of the C code. We actually have two options here, to keep the results consistent with the unix load, we can use `typeperf "\System\Processor Queue Length"` To get cpu usage, we can use the command Victor posted. I'll make an alternative PR with that today just so we can contrast the two approaches.
msg321906 - (view)	Author: Steve Dower (steve.dower) *	Date: 2018-07-18 17:33
> It would probably be simpler to change the policy and allow (at least some) some third party libs in cPython's test suite. =) I'm actually totally okay with this, as I'd really like to have JUnit XML output from the test suite, which is easiest to do with the existing third-party libraries. Can we formalize a way by which optional third-party libraries are allowed? Provided they aren't critical for the overall pass/fail state of the test suite (or the more strict alternative: pass/fail state of each test), I don't see any particular harm in certain site packages being used. (This is probably a discussion for python-dev, assuming the policy is written down somewhere.)
msg321911 - (view)	Author: STINNER Victor (vstinner) *	Date: 2018-07-18 19:57
Processor Queue Length seems simpler and easier to read. I don't want to log 24 numbers per regrtest output line if a machine has 24 CPUs... The load average is a "raw" value to give the idea if the system is "loaded" or not. More precise metrics can be used later to debug a test failure, but manually.
msg321912 - (view)	Author: STINNER Victor (vstinner) *	Date: 2018-07-18 19:59
PR 8287 seems short to me and it seems like psutils doesn't expose Processor Queue Length, so I'm not sure why we are talking about depending on psutils?
msg322033 - (view)	Author: Ammar Askar (ammar2) *	Date: 2018-07-20 20:29
> But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command? I opened up https://github.com/python/cpython/pull/8357 with this strategy, in my opinion its a lot nicer and doesn't clutter up _winapi with half baked, extremely specialized functions. Its a bit more involved than running a thread, details are on the PR.
msg322322 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2018-07-24 18:39
> PR 8287 seems short to me and it seems like psutils doesn't expose Processor Queue Length, so I'm not sure why we are talking about depending on psutils? I'm not sure if you're strictly interested in getting system load or if CPU utilization is also fine. FWIW with psutil you would be able to get the system-wide CPU utilization occurred in a given period of time: >>> import psutil, time >>> psutil.cpu_percent(interval=None) # non-blocking 0.0 >>> time.sleep(60) >>> psutil.cpu_percent(interval=None) # average of the last 60 secs 23.6 >>> ...and you can do the same for the current process too (psutil.Process().cpu_percent()).
msg339737 - (view)	Author: Cheryl Sabella (cheryl.sabella) *	Date: 2019-04-09 12:20
New changeset e16467af0bfcc9f399df251495ff2d2ad20a1669 by Cheryl Sabella (Ammar Askar) in branch 'master': bpo-34060: Report system load when running test suite for Windows (GH-8357) https://github.com/python/cpython/commit/e16467af0bfcc9f399df251495ff2d2ad20a1669
msg339739 - (view)	Author: Cheryl Sabella (cheryl.sabella) *	Date: 2019-04-09 12:37
I've merged PR8357. I believe PR8287 can be closed now with PR8357 as the superseder? Thank you to everyone for your contributions to this discussion!
msg340900 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-04-26 10:16
New changeset 1069d38fa18f3a4f97c2e358bcb3b82cab1c051b by Victor Stinner in branch '3.7': [3.7] bpo-36719: sync regrtest with master branch (GH-12967) https://github.com/python/cpython/commit/1069d38fa18f3a4f97c2e358bcb3b82cab1c051b

History
Date	User	Action	Args
2022-04-11 14:59:02	admin	set	github: 78241
2019-04-26 10:16:34	vstinner	set	messages: + msg340900
2019-04-11 11:00:27	vstinner	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2019-04-09 12:37:38	cheryl.sabella	set	messages: + msg339739
2019-04-09 12:20:47	cheryl.sabella	set	nosy: + cheryl.sabella messages: + msg339737
2018-07-24 18:39:10	giampaolo.rodola	set	messages: + msg322322
2018-07-20 20:29:24	ammar2	set	messages: + msg322033
2018-07-20 20:26:43	ammar2	set	pull_requests: + pull_request7892
2018-07-18 19:59:26	vstinner	set	messages: + msg321912
2018-07-18 19:57:31	vstinner	set	messages: + msg321911
2018-07-18 17:33:46	steve.dower	set	messages: + msg321906
2018-07-18 17:26:41	ammar2	set	messages: + msg321904
2018-07-18 17:25:08	giampaolo.rodola	set	messages: + msg321903
2018-07-18 17:10:07	vstinner	set	messages: + msg321902
2018-07-18 15:00:35	steve.dower	set	messages: + msg321886
2018-07-18 08:33:52	vstinner	set	messages: + msg321869
2018-07-16 09:36:43	ammar2	set	files: + benchmark.PNG messages: + msg321726
2018-07-16 09:26:41	vstinner	set	messages: + msg321724
2018-07-16 07:39:10	vstinner	set	messages: + msg321715
2018-07-16 00:24:37	ammar2	set	messages: + msg321700
2018-07-16 00:13:17	jkloth	set	messages: + msg321699
2018-07-15 19:07:18	ammar2	set	messages: + msg321694
2018-07-15 14:26:59	jkloth	set	messages: + msg321692
2018-07-15 12:57:42	giampaolo.rodola	set	messages: + msg321691
2018-07-15 12:40:32	jkloth	set	messages: + msg321690
2018-07-15 10:10:42	ammar2	set	messages: + msg321688
2018-07-15 10:01:36	giampaolo.rodola	set	messages: + msg321687
2018-07-15 03:00:14	ammar2	set	messages: + msg321673
2018-07-15 02:57:06	ammar2	set	keywords: + patch stage: patch review pull_requests: + pull_request7821
2018-07-15 02:57:01	ammar2	set	messages: + msg321672
2018-07-14 20:46:36	jkloth	set	nosy: + jkloth messages: + msg321660
2018-07-14 10:31:59	giampaolo.rodola	set	messages: + msg321648
2018-07-14 07:55:44	ammar2	set	messages: + msg321647
2018-07-14 07:52:49	ammar2	set	nosy: + ammar2 messages: + msg321646
2018-07-14 07:11:51	giampaolo.rodola	set	nosy: + giampaolo.rodola
2018-07-06 16:01:09	vstinner	create