classification
Title: regrtest: log "CPU usage" on Windows
Type: Stage: resolved
Components: Tests, Windows Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ammar2, cheryl.sabella, giampaolo.rodola, jkloth, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2018-07-06 16:01 by vstinner, last changed 2019-04-26 10:16 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
benchmark.PNG ammar2, 2018-07-16 09:36
Pull Requests
URL Status Linked Edit
PR 8287 closed ammar2, 2018-07-15 02:57
PR 8357 merged ammar2, 2018-07-20 20:26
Messages (31)
msg321177 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-06 16:01
I would help to debug race conditions on Windows to log the "CPU usage" on regrtest, as we do on other platforms (using os.getloadavg()).

Links:

* https://stackoverflow.com/questions/13666458/python-collect-system-stats
* https://docs.microsoft.com/en-us/windows/desktop/api/pdh/nf-pdh-pdhmakecounterpatha
* GetSystemTimes()?
* https://stackoverflow.com/questions/23143693/retrieving-cpu-load-percent-total-in-windows-with-c
msg321646 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-14 07:52
Annoyingly, it looks like Windows does not provide an API that gives an average value. There is a counter exposed called "System \ Processor Queue Length" which does what the equivalent of unix's load

https://blogs.technet.microsoft.com/askperf/2008/01/15/an-overview-of-processor-bottlenecks/

But we're gonna have to average it ourselves if we want this information.
msg321647 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-14 07:55
Taking a shot at this, should take a day or so.
msg321648 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2018-07-14 10:31
There is an old ticket for this in psutil with some (possible useful) references in it: 
https://github.com/giampaolo/psutil/issues/604
msg321660 - (view) Author: Jeremy Kloth (jkloth) * Date: 2018-07-14 20:46
Also prior conversation:

https://bugs.python.org/issue30263#msg296311
msg321672 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-15 02:57
Thanks a lot for that link Jeremy, it was really helpful. After reading up on it, my take is that winapi is the most appropriate place for this, it is a non public api that's used in the stdlib.

I've used Windows APIs in a way that we don't need to manually start up a thread and call a calc_load function, instead using a callback invoked by windows. Internally this uses a thread pool, but it means we don't have to worry about managing the thread ourselves.

The load is stored as a global but the winapi module is already marked as "-1" indicating it has global state, so that shouldn't be a problem. https://docs.python.org/3/c-api/module.html#c.PyModuleDef.m_size

Like Jeremy noted, using WMI does add a 5mb overhead or so to the calling process. One more caveat is that the PdhAddEnglishCounterW function is only available in Vista+. I'm not sure if we still support Windows XP, but the alternative is to use PdhAddCounter, which breaks if the system language is not english because the counter paths are localized.
msg321673 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-15 03:00
Opened up a PR with a proof of concept to get feedback on if this approach is reasonable. 

Giampaolo, on your psutil issue you specifically said, "(possibly without using WMI)"

Is there any particular problem with using WMI?
msg321687 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2018-07-15 10:01
> Giampaolo, on your psutil issue you specifically said, "(possibly without using WMI)" Is there any particular problem with using WMI?

Performance. In general WMI is (a lot) slower than the Windows API counterpart (psutil never uses WMI except in unit tests). Don't know if this matters for this specific issue though, or whether a correspondent Windows API for doing the same thing exists.
msg321688 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-15 10:10
Aah, yeah I don't think there's a good way of doing it purely from the windows API. There might be a way to enumerate through all the processes and see if they're queued up but I didn't look into it.

In this case it should be fine, we just pay a bit of WMI cost to initialize the query, all the updating and retrieval is done asynchronously to the python code.
msg321690 - (view) Author: Jeremy Kloth (jkloth) * Date: 2018-07-15 12:40
Not that it matters all that much, but from a terminology standpoint, WMI != PDH != Performance Counters.

Performance counters (the objects, not the topic) are provided by DLLs registered in the HKLM\SYSTEM\CurrentControlSet\Services key.  Their data is accessed via registry API functions using the HKEY_PERFORMANCE_DATA root key.

PDH (Performance Data Helper) provides an abstraction layer that can access those values among other things like a GUI or writing to log files.

WMI (Windows Management Instrumentation) is yet another layer on top of PDH and raw Performance Counters.

In this case of the "System" performance counter object, it is provided by a performance DLL (perfos.dll in the case of Win10 1803).  If overhead (memory and/or CPU) is a concern, then accessing the counter data via the registry is the way to go.
msg321691 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2018-07-15 12:57
I suppose there is no way to emulate os.getloadavg() on Windows because that would necessarily imply using a thread to call the necessary routine (WMI, PDH, whatever...) every X secs or something, correct?
msg321692 - (view) Author: Jeremy Kloth (jkloth) * Date: 2018-07-15 14:26
Correct.  Windows provides the building blocks for implementing getloadavg(), but does not provide an interface that does the averaging.  That is deferred to a per application basis.  The best that an application can do for that is to use thread pools.  You can think of thread pools as kernel-managed threads (different from user-managed threads via CreateThread()).

As of Win10 1703, any process linked with DLLs automatically have thread pools created for them (to parallel-ize the loading of said DLLs).  Leveraging that feature would minimize the costs incurred to do the running average.
msg321694 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-15 19:07
Is the function I used for the callback, RegisterWaitForSingleObject  https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-registerwaitforsingleobject
using the thread pooling system you're talking about underneath? The documentation seems to suggest so, but I'll wait for your opinion on it.
msg321699 - (view) Author: Jeremy Kloth (jkloth) * Date: 2018-07-16 00:13
The RegisterWaitForSingleObject() function does use the thread pool API:

https://docs.microsoft.com/en-us/windows/desktop/ProcThread/thread-pool-api

However, PdhCollectQueryDataEx() also creates a user-space thread to handle its work of setting the event whenever the timeout elapses.

I would think that it should be preferable to use a single thread pool worker to retrieve the queue length and calculate the average.
msg321700 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-16 00:24
Roger, personally I don't think its worth it to complicate the code in order to use the thread pool API. Especially considering this is just a private API and the only consumer will be the test suite runner. Let's see what the core devs think when this gets reviewed.
msg321715 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-16 07:39
Logging the current value can be an acceptable compromise.

When we run tests in subprocesses (python3 -m test -jN), we can run a test every N milliseconds with no thread. It's ok if the average is not accurate.

Most buildbots use -jN for performance but also to isolate the main processes from crashes and hard timeouts.
msg321724 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-16 09:26
https://docs.microsoft.com/en-us/sql/relational-databases/performance-monitor/monitor-cpu-usage?view=sql-server-2017

Processor Queue Length:

Corresponds to the number of threads waiting for processor time. A processor bottleneck develops when threads of a process require more processor cycles than are available. If more than a few processes attempt to utilize the processor's time, you might need to install a faster processor. Or, if you have a multiprocessor system, you could add a processor.

When you examine processor usage, consider the type of work that the instance of SQL Server performs. If SQL Server performs many calculations, such as queries involving aggregates or memory-bound queries that require no disk I/O, 100 percent of the processor's time can be used. If this causes the performance of other applications to suffer, try changing the workload. For example, dedicate the computer to running the instance of SQL Server.

Usage rates around 100 percent, where many client requests are being processed, may indicate that processes are queuing up, waiting for processor time, and causing a bottleneck. Resolve the problem by adding faster processors.

--

Is it exactly the same thing on Unix (load average)? If not, I would prefer to use a different name in regrtest and "loadavg". Maybe "PQL avg"?

What is the impact of the number of CPUs on this value?
msg321726 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-16 09:36
I don't think taking instantaneous values instead of averaging will work out too well. For reference I've attached a screenshot. It has sampled values at every second on an unloaded computer and then with running prime95 for cpu stress testing. The load tends to peak and fall.

>Is it exactly the same thing on Unix (load average)?

Indeed it is: https://en.wikipedia.org/wiki/Load_(computing)#Unix-style_load_calculation

"An idle computer has a load number of 0 (the idle process isn't counted). Each process using or waiting for CPU (the ready queue or run queue) increments the load number by 1."

From what I can tell, the number of processors are dealt with the same way as on Linux, that is, a single core processor is overloaded when the load is >1 and a quad core processor is overloaded when the load is >4
msg321869 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-18 08:33
I found a command to get the CPU usage in percent *per* CPU. Here with 2 CPUs:

vstinner@WIN C:\vstinner\python\master>wmic cpu get loadpercentage              
LoadPercentage                                                                  
100                                                                             
100
msg321886 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-07-18 15:00
> wmic cpu ...

This is the WMI solution we are trying to avoid.

But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command?

I also haven't seen it suggested, but perhaps GetProcessTimes (https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-getprocesstimes) (or GetThreadTimes) would provide enough information to detect the same information?
msg321902 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-18 17:10
> I also haven't seen it suggested, but perhaps GetProcessTimes (https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-getprocesstimes) (or GetThreadTimes) would provide enough information to detect the same information?

My intent is to get an idea if the whole system is busy. Not if the current Python process is busy. Most buildbots run tests with multiple worker processes (at least 2).
msg321903 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2018-07-18 17:25
psutil exposes this functionality as "psutil.cpu_percent()":

https://github.com/giampaolo/psutil/blob/ac9dccab6b038835b5d612f92cf4804ec2662c2e/psutil/_psutil_windows.c#L992

https://github.com/giampaolo/psutil/blob/ac9dccab6b038835b5d612f92cf4804ec2662c2e/psutil/__init__.py#L1675

I'm not sure if it's worth it to copy all that stuff into Modules/_winapi.c and test/libregrtest/main.py though. It would probably be simpler to change the policy and allow (at least some) some third party libs in cPython's test suite. =)
msg321904 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-18 17:26
> But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command?

This sounds like a very good solution to me, it avoids adding the complexity of the C code. We actually have two options here, to keep the results consistent with the unix load, we can use `typeperf "\System\Processor Queue Length"`

To get cpu usage, we can use the command Victor posted. I'll make an alternative PR with that today just so we can contrast the two approaches.
msg321906 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-07-18 17:33
> It would probably be simpler to change the policy and allow (at least some) some third party libs in cPython's test suite. =)

I'm actually totally okay with this, as I'd really like to have JUnit XML output from the test suite, which is easiest to do with the existing third-party libraries.

Can we formalize a way by which optional third-party libraries are allowed? Provided they aren't critical for the overall pass/fail state of the test suite (or the more strict alternative: pass/fail state of *each* test), I don't see any particular harm in certain site packages being used.

(This is probably a discussion for python-dev, assuming the policy is written down somewhere.)
msg321911 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-18 19:57
Processor Queue Length seems simpler and easier to read. I don't want to log 24 numbers per regrtest output line if a machine has 24 CPUs... The load average is a "raw" value to give the idea if the system is "loaded" or not. More precise metrics can be used later to debug a test failure, but manually.
msg321912 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-18 19:59
PR 8287 seems short to me and it seems like psutils doesn't expose Processor Queue Length, so I'm not sure why we are talking about depending on psutils?
msg322033 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-20 20:29
> But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command?

I opened up https://github.com/python/cpython/pull/8357 with this strategy, in my opinion its a lot nicer and doesn't clutter up _winapi with half baked, extremely specialized functions. Its a bit more involved than running a thread, details are on the PR.
msg322322 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2018-07-24 18:39
> PR 8287 seems short to me and it seems like psutils doesn't expose Processor Queue Length, so I'm not sure why we are talking about depending on psutils?

I'm not sure if you're strictly interested in getting system load or if CPU utilization is also fine. FWIW with psutil you would be able to get the system-wide CPU utilization occurred in a given period of time:

>>> import psutil, time
>>> psutil.cpu_percent(interval=None)  # non-blocking
0.0
>>> time.sleep(60)
>>> psutil.cpu_percent(interval=None)  # average of the last 60 secs
23.6
>>>

...and you can do the same for the current process too (psutil.Process().cpu_percent()).
msg339737 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-04-09 12:20
New changeset e16467af0bfcc9f399df251495ff2d2ad20a1669 by Cheryl Sabella (Ammar Askar) in branch 'master':
bpo-34060: Report system load when running test suite for Windows (GH-8357)
https://github.com/python/cpython/commit/e16467af0bfcc9f399df251495ff2d2ad20a1669
msg339739 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-04-09 12:37
I've merged PR8357.  I believe PR8287 can be closed now with PR8357 as the superseder?  

Thank you to everyone for your contributions to this discussion!
msg340900 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-26 10:16
New changeset 1069d38fa18f3a4f97c2e358bcb3b82cab1c051b by Victor Stinner in branch '3.7':
[3.7] bpo-36719: sync regrtest with master branch (GH-12967)
https://github.com/python/cpython/commit/1069d38fa18f3a4f97c2e358bcb3b82cab1c051b
History
Date User Action Args
2019-04-26 10:16:34vstinnersetmessages: + msg340900
2019-04-11 11:00:27vstinnersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-04-09 12:37:38cheryl.sabellasetmessages: + msg339739
2019-04-09 12:20:47cheryl.sabellasetnosy: + cheryl.sabella
messages: + msg339737
2018-07-24 18:39:10giampaolo.rodolasetmessages: + msg322322
2018-07-20 20:29:24ammar2setmessages: + msg322033
2018-07-20 20:26:43ammar2setpull_requests: + pull_request7892
2018-07-18 19:59:26vstinnersetmessages: + msg321912
2018-07-18 19:57:31vstinnersetmessages: + msg321911
2018-07-18 17:33:46steve.dowersetmessages: + msg321906
2018-07-18 17:26:41ammar2setmessages: + msg321904
2018-07-18 17:25:08giampaolo.rodolasetmessages: + msg321903
2018-07-18 17:10:07vstinnersetmessages: + msg321902
2018-07-18 15:00:35steve.dowersetmessages: + msg321886
2018-07-18 08:33:52vstinnersetmessages: + msg321869
2018-07-16 09:36:43ammar2setfiles: + benchmark.PNG

messages: + msg321726
2018-07-16 09:26:41vstinnersetmessages: + msg321724
2018-07-16 07:39:10vstinnersetmessages: + msg321715
2018-07-16 00:24:37ammar2setmessages: + msg321700
2018-07-16 00:13:17jklothsetmessages: + msg321699
2018-07-15 19:07:18ammar2setmessages: + msg321694
2018-07-15 14:26:59jklothsetmessages: + msg321692
2018-07-15 12:57:42giampaolo.rodolasetmessages: + msg321691
2018-07-15 12:40:32jklothsetmessages: + msg321690
2018-07-15 10:10:42ammar2setmessages: + msg321688
2018-07-15 10:01:36giampaolo.rodolasetmessages: + msg321687
2018-07-15 03:00:14ammar2setmessages: + msg321673
2018-07-15 02:57:06ammar2setkeywords: + patch
stage: patch review
pull_requests: + pull_request7821
2018-07-15 02:57:01ammar2setmessages: + msg321672
2018-07-14 20:46:36jklothsetnosy: + jkloth
messages: + msg321660
2018-07-14 10:31:59giampaolo.rodolasetmessages: + msg321648
2018-07-14 07:55:44ammar2setmessages: + msg321647
2018-07-14 07:52:49ammar2setnosy: + ammar2
messages: + msg321646
2018-07-14 07:11:51giampaolo.rodolasetnosy: + giampaolo.rodola
2018-07-06 16:01:09vstinnercreate