regrtest: log "CPU usage" on Windows #78241

vstinner · 2018-07-06T16:01:09Z

BPO	34060
Nosy	@pfmoore, @vstinner, @giampaolo, @tjguk, @jkloth, @zware, @zooba, @ammaraskar, @csabella
PRs	bpo-34060: Report system load when running test suite for Windows #8287 bpo-34060: Report system load when running test suite for Windows #8357
Files	benchmark.PNG

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-04-11.11:00:27.525>
created_at = <Date 2018-07-06.16:01:09.489>
labels = ['3.8', 'tests', 'OS-windows']
title = 'regrtest: log "CPU usage" on Windows'
updated_at = <Date 2019-04-26.10:16:34.571>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2019-04-26.10:16:34.571>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2019-04-11.11:00:27.525>
closer = 'vstinner'
components = ['Tests', 'Windows']
creation = <Date 2018-07-06.16:01:09.489>
creator = 'vstinner'
dependencies = []
files = ['47694']
hgrepos = []
issue_num = 34060
keywords = ['patch']
message_count = 31.0
messages = ['321177', '321646', '321647', '321648', '321660', '321672', '321673', '321687', '321688', '321690', '321691', '321692', '321694', '321699', '321700', '321715', '321724', '321726', '321869', '321886', '321902', '321903', '321904', '321906', '321911', '321912', '322033', '322322', '339737', '339739', '340900']
nosy_count = 9.0
nosy_names = ['paul.moore', 'vstinner', 'giampaolo.rodola', 'tim.golden', 'jkloth', 'zach.ware', 'steve.dower', 'ammar2', 'cheryl.sabella']
pr_nums = ['8287', '8357']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue34060'
versions = ['Python 3.8']

vstinner · 2018-07-06T16:01:09Z

I would help to debug race conditions on Windows to log the "CPU usage" on regrtest, as we do on other platforms (using os.getloadavg()).

Links:

ammaraskar · 2018-07-14T07:52:49Z

Annoyingly, it looks like Windows does not provide an API that gives an average value. There is a counter exposed called "System \ Processor Queue Length" which does what the equivalent of unix's load

https://blogs.technet.microsoft.com/askperf/2008/01/15/an-overview-of-processor-bottlenecks/

But we're gonna have to average it ourselves if we want this information.

ammaraskar · 2018-07-14T07:55:44Z

Taking a shot at this, should take a day or so.

giampaolo · 2018-07-14T10:31:59Z

There is an old ticket for this in psutil with some (possible useful) references in it:
giampaolo/psutil#604

jkloth · 2018-07-14T20:46:36Z

Also prior conversation:

https://bugs.python.org/issue30263#msg296311

ammaraskar · 2018-07-15T02:57:00Z

Thanks a lot for that link Jeremy, it was really helpful. After reading up on it, my take is that winapi is the most appropriate place for this, it is a non public api that's used in the stdlib.

I've used Windows APIs in a way that we don't need to manually start up a thread and call a calc_load function, instead using a callback invoked by windows. Internally this uses a thread pool, but it means we don't have to worry about managing the thread ourselves.

The load is stored as a global but the winapi module is already marked as "-1" indicating it has global state, so that shouldn't be a problem. https://docs.python.org/3/c-api/module.html#c.PyModuleDef.m_size

Like Jeremy noted, using WMI does add a 5mb overhead or so to the calling process. One more caveat is that the PdhAddEnglishCounterW function is only available in Vista+. I'm not sure if we still support Windows XP, but the alternative is to use PdhAddCounter, which breaks if the system language is not english because the counter paths are localized.

ammaraskar · 2018-07-15T03:00:15Z

Opened up a PR with a proof of concept to get feedback on if this approach is reasonable.

Giampaolo, on your psutil issue you specifically said, "(possibly without using WMI)"

Is there any particular problem with using WMI?

giampaolo · 2018-07-15T10:01:37Z

Giampaolo, on your psutil issue you specifically said, "(possibly without using WMI)" Is there any particular problem with using WMI?

Performance. In general WMI is (a lot) slower than the Windows API counterpart (psutil never uses WMI except in unit tests). Don't know if this matters for this specific issue though, or whether a correspondent Windows API for doing the same thing exists.

ammaraskar · 2018-07-15T10:10:42Z

Aah, yeah I don't think there's a good way of doing it purely from the windows API. There might be a way to enumerate through all the processes and see if they're queued up but I didn't look into it.

In this case it should be fine, we just pay a bit of WMI cost to initialize the query, all the updating and retrieval is done asynchronously to the python code.

jkloth · 2018-07-15T12:40:33Z

Not that it matters all that much, but from a terminology standpoint, WMI != PDH != Performance Counters.

Performance counters (the objects, not the topic) are provided by DLLs registered in the HKLM\SYSTEM\CurrentControlSet\Services key. Their data is accessed via registry API functions using the HKEY_PERFORMANCE_DATA root key.

PDH (Performance Data Helper) provides an abstraction layer that can access those values among other things like a GUI or writing to log files.

WMI (Windows Management Instrumentation) is yet another layer on top of PDH and raw Performance Counters.

In this case of the "System" performance counter object, it is provided by a performance DLL (perfos.dll in the case of Win10 1803). If overhead (memory and/or CPU) is a concern, then accessing the counter data via the registry is the way to go.

giampaolo · 2018-07-15T12:57:42Z

I suppose there is no way to emulate os.getloadavg() on Windows because that would necessarily imply using a thread to call the necessary routine (WMI, PDH, whatever...) every X secs or something, correct?

jkloth · 2018-07-15T14:27:00Z

Correct. Windows provides the building blocks for implementing getloadavg(), but does not provide an interface that does the averaging. That is deferred to a per application basis. The best that an application can do for that is to use thread pools. You can think of thread pools as kernel-managed threads (different from user-managed threads via CreateThread()).

As of Win10 1703, any process linked with DLLs automatically have thread pools created for them (to parallel-ize the loading of said DLLs). Leveraging that feature would minimize the costs incurred to do the running average.

ammaraskar · 2018-07-15T19:07:18Z

Is the function I used for the callback, RegisterWaitForSingleObject https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-registerwaitforsingleobject
using the thread pooling system you're talking about underneath? The documentation seems to suggest so, but I'll wait for your opinion on it.

jkloth · 2018-07-16T00:13:17Z

The RegisterWaitForSingleObject() function does use the thread pool API:

https://docs.microsoft.com/en-us/windows/desktop/ProcThread/thread-pool-api

However, PdhCollectQueryDataEx() also creates a user-space thread to handle its work of setting the event whenever the timeout elapses.

I would think that it should be preferable to use a single thread pool worker to retrieve the queue length and calculate the average.

ammaraskar · 2018-07-16T00:24:37Z

Roger, personally I don't think its worth it to complicate the code in order to use the thread pool API. Especially considering this is just a private API and the only consumer will be the test suite runner. Let's see what the core devs think when this gets reviewed.

vstinner · 2018-07-16T07:39:11Z

Logging the current value can be an acceptable compromise.

When we run tests in subprocesses (python3 -m test -jN), we can run a test every N milliseconds with no thread. It's ok if the average is not accurate.

Most buildbots use -jN for performance but also to isolate the main processes from crashes and hard timeouts.

vstinner · 2018-07-16T09:26:41Z

https://docs.microsoft.com/en-us/sql/relational-databases/performance-monitor/monitor-cpu-usage?view=sql-server-2017

Processor Queue Length:

Corresponds to the number of threads waiting for processor time. A processor bottleneck develops when threads of a process require more processor cycles than are available. If more than a few processes attempt to utilize the processor's time, you might need to install a faster processor. Or, if you have a multiprocessor system, you could add a processor.

When you examine processor usage, consider the type of work that the instance of SQL Server performs. If SQL Server performs many calculations, such as queries involving aggregates or memory-bound queries that require no disk I/O, 100 percent of the processor's time can be used. If this causes the performance of other applications to suffer, try changing the workload. For example, dedicate the computer to running the instance of SQL Server.

Usage rates around 100 percent, where many client requests are being processed, may indicate that processes are queuing up, waiting for processor time, and causing a bottleneck. Resolve the problem by adding faster processors.

--

Is it exactly the same thing on Unix (load average)? If not, I would prefer to use a different name in regrtest and "loadavg". Maybe "PQL avg"?

What is the impact of the number of CPUs on this value?

ammaraskar · 2018-07-16T09:36:43Z

I don't think taking instantaneous values instead of averaging will work out too well. For reference I've attached a screenshot. It has sampled values at every second on an unloaded computer and then with running prime95 for cpu stress testing. The load tends to peak and fall.

Is it exactly the same thing on Unix (load average)?

Indeed it is: https://en.wikipedia.org/wiki/Load_(computing)#Unix-style_load_calculation

"An idle computer has a load number of 0 (the idle process isn't counted). Each process using or waiting for CPU (the ready queue or run queue) increments the load number by 1."

From what I can tell, the number of processors are dealt with the same way as on Linux, that is, a single core processor is overloaded when the load is >1 and a quad core processor is overloaded when the load is >4

vstinner · 2018-07-18T08:33:52Z

I found a command to get the CPU usage in percent *per* CPU. Here with 2 CPUs:

vstinner@WIN C:\vstinner\python\master>wmic cpu get loadpercentage
LoadPercentage
100
100

zooba · 2018-07-18T15:00:35Z

wmic cpu ...

This is the WMI solution we are trying to avoid.

But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command?

I also haven't seen it suggested, but perhaps GetProcessTimes (https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-getprocesstimes) (or GetThreadTimes) would provide enough information to detect the same information?

vstinner · 2018-07-18T17:10:07Z

I also haven't seen it suggested, but perhaps GetProcessTimes (https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-getprocesstimes) (or GetThreadTimes) would provide enough information to detect the same information?

My intent is to get an idea if the whole system is busy. Not if the current Python process is busy. Most buildbots run tests with multiple worker processes (at least 2).

giampaolo · 2018-07-18T17:25:09Z

psutil exposes this functionality as "psutil.cpu_percent()":

https://github.com/giampaolo/psutil/blob/ac9dccab6b038835b5d612f92cf4804ec2662c2e/psutil/_psutil_windows.c#L992

https://github.com/giampaolo/psutil/blob/ac9dccab6b038835b5d612f92cf4804ec2662c2e/psutil/__init__.py#L1675

I'm not sure if it's worth it to copy all that stuff into Modules/_winapi.c and test/libregrtest/main.py though. It would probably be simpler to change the policy and allow (at least some) some third party libs in cPython's test suite. =)

ammaraskar · 2018-07-18T17:26:41Z

But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command?

This sounds like a very good solution to me, it avoids adding the complexity of the C code. We actually have two options here, to keep the results consistent with the unix load, we can use typeperf "\System\Processor Queue Length"

To get cpu usage, we can use the command Victor posted. I'll make an alternative PR with that today just so we can contrast the two approaches.

zooba · 2018-07-18T17:33:46Z

It would probably be simpler to change the policy and allow (at least some) some third party libs in cPython's test suite. =)

I'm actually totally okay with this, as I'd really like to have JUnit XML output from the test suite, which is easiest to do with the existing third-party libraries.

Can we formalize a way by which optional third-party libraries are allowed? Provided they aren't critical for the overall pass/fail state of the test suite (or the more strict alternative: pass/fail state of *each* test), I don't see any particular harm in certain site packages being used.

(This is probably a discussion for python-dev, assuming the policy is written down somewhere.)

vstinner · 2018-07-18T19:57:32Z

Processor Queue Length seems simpler and easier to read. I don't want to log 24 numbers per regrtest output line if a machine has 24 CPUs... The load average is a "raw" value to give the idea if the system is "loaded" or not. More precise metrics can be used later to debug a test failure, but manually.

vstinner · 2018-07-18T19:59:26Z

PR 8287 seems short to me and it seems like psutils doesn't expose Processor Queue Length, so I'm not sure why we are talking about depending on psutils?

ammaraskar · 2018-07-20T20:29:25Z

But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command?

I opened up #8357 with this strategy, in my opinion its a lot nicer and doesn't clutter up _winapi with half baked, extremely specialized functions. Its a bit more involved than running a thread, details are on the PR.

giampaolo · 2018-07-24T18:39:11Z

PR 8287 seems short to me and it seems like psutils doesn't expose Processor Queue Length, so I'm not sure why we are talking about depending on psutils?

I'm not sure if you're strictly interested in getting system load or if CPU utilization is also fine. FWIW with psutil you would be able to get the system-wide CPU utilization occurred in a given period of time:

>>> import psutil, time
>>> psutil.cpu_percent(interval=None)  # non-blocking
0.0
>>> time.sleep(60)
>>> psutil.cpu_percent(interval=None)  # average of the last 60 secs
23.6
>>>

...and you can do the same for the current process too (psutil.Process().cpu_percent()).

csabella · 2019-04-09T12:20:47Z

New changeset e16467a by Cheryl Sabella (Ammar Askar) in branch 'master':
bpo-34060: Report system load when running test suite for Windows (GH-8357)
e16467a

csabella · 2019-04-09T12:37:39Z

I've merged PR8357. I believe PR8287 can be closed now with PR8357 as the superseder?

Thank you to everyone for your contributions to this discussion!

vstinner · 2019-04-26T10:16:35Z

New changeset 1069d38 by Victor Stinner in branch '3.7':
[3.7] bpo-36719: sync regrtest with master branch (GH-12967)
1069d38

vstinner added 3.8 only security fixes tests Tests in the Lib/test dir OS-windows labels Jul 6, 2018

vstinner closed this as completed Apr 11, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regrtest: log "CPU usage" on Windows #78241

regrtest: log "CPU usage" on Windows #78241

vstinner commented Jul 6, 2018

vstinner commented Jul 6, 2018

ammaraskar commented Jul 14, 2018

ammaraskar commented Jul 14, 2018

giampaolo commented Jul 14, 2018

jkloth commented Jul 14, 2018

ammaraskar commented Jul 15, 2018

ammaraskar commented Jul 15, 2018

giampaolo commented Jul 15, 2018

ammaraskar commented Jul 15, 2018

jkloth commented Jul 15, 2018

giampaolo commented Jul 15, 2018

jkloth commented Jul 15, 2018

ammaraskar commented Jul 15, 2018

jkloth commented Jul 16, 2018

ammaraskar commented Jul 16, 2018

vstinner commented Jul 16, 2018

vstinner commented Jul 16, 2018

ammaraskar commented Jul 16, 2018

vstinner commented Jul 18, 2018

zooba commented Jul 18, 2018

vstinner commented Jul 18, 2018

giampaolo commented Jul 18, 2018

ammaraskar commented Jul 18, 2018

zooba commented Jul 18, 2018

vstinner commented Jul 18, 2018

vstinner commented Jul 18, 2018

ammaraskar commented Jul 20, 2018

giampaolo commented Jul 24, 2018

csabella commented Apr 9, 2019

csabella commented Apr 9, 2019

vstinner commented Apr 26, 2019

regrtest: log "CPU usage" on Windows #78241

regrtest: log "CPU usage" on Windows #78241

Comments

vstinner commented Jul 6, 2018

vstinner commented Jul 6, 2018

ammaraskar commented Jul 14, 2018

ammaraskar commented Jul 14, 2018

giampaolo commented Jul 14, 2018

jkloth commented Jul 14, 2018

ammaraskar commented Jul 15, 2018

ammaraskar commented Jul 15, 2018

giampaolo commented Jul 15, 2018

ammaraskar commented Jul 15, 2018

jkloth commented Jul 15, 2018

giampaolo commented Jul 15, 2018

jkloth commented Jul 15, 2018

ammaraskar commented Jul 15, 2018

jkloth commented Jul 16, 2018

ammaraskar commented Jul 16, 2018

vstinner commented Jul 16, 2018

vstinner commented Jul 16, 2018

ammaraskar commented Jul 16, 2018

vstinner commented Jul 18, 2018

zooba commented Jul 18, 2018

vstinner commented Jul 18, 2018

giampaolo commented Jul 18, 2018

ammaraskar commented Jul 18, 2018

zooba commented Jul 18, 2018

vstinner commented Jul 18, 2018

vstinner commented Jul 18, 2018

ammaraskar commented Jul 20, 2018

giampaolo commented Jul 24, 2018

csabella commented Apr 9, 2019

csabella commented Apr 9, 2019

vstinner commented Apr 26, 2019