Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regrtest: log "CPU usage" on Windows #78241

Closed
vstinner opened this issue Jul 6, 2018 · 31 comments
Closed

regrtest: log "CPU usage" on Windows #78241

vstinner opened this issue Jul 6, 2018 · 31 comments
Labels
3.8 only security fixes OS-windows tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

vstinner commented Jul 6, 2018

BPO 34060
Nosy @pfmoore, @vstinner, @giampaolo, @tjguk, @jkloth, @zware, @zooba, @ammaraskar, @csabella
PRs
  • bpo-34060: Report system load when running test suite for Windows #8287
  • bpo-34060: Report system load when running test suite for Windows #8357
  • Files
  • benchmark.PNG
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-04-11.11:00:27.525>
    created_at = <Date 2018-07-06.16:01:09.489>
    labels = ['3.8', 'tests', 'OS-windows']
    title = 'regrtest: log "CPU usage" on Windows'
    updated_at = <Date 2019-04-26.10:16:34.571>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-04-26.10:16:34.571>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-04-11.11:00:27.525>
    closer = 'vstinner'
    components = ['Tests', 'Windows']
    creation = <Date 2018-07-06.16:01:09.489>
    creator = 'vstinner'
    dependencies = []
    files = ['47694']
    hgrepos = []
    issue_num = 34060
    keywords = ['patch']
    message_count = 31.0
    messages = ['321177', '321646', '321647', '321648', '321660', '321672', '321673', '321687', '321688', '321690', '321691', '321692', '321694', '321699', '321700', '321715', '321724', '321726', '321869', '321886', '321902', '321903', '321904', '321906', '321911', '321912', '322033', '322322', '339737', '339739', '340900']
    nosy_count = 9.0
    nosy_names = ['paul.moore', 'vstinner', 'giampaolo.rodola', 'tim.golden', 'jkloth', 'zach.ware', 'steve.dower', 'ammar2', 'cheryl.sabella']
    pr_nums = ['8287', '8357']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue34060'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 6, 2018

    I would help to debug race conditions on Windows to log the "CPU usage" on regrtest, as we do on other platforms (using os.getloadavg()).

    Links:

    @vstinner vstinner added 3.8 only security fixes tests Tests in the Lib/test dir OS-windows labels Jul 6, 2018
    @ammaraskar
    Copy link
    Member

    Annoyingly, it looks like Windows does not provide an API that gives an average value. There is a counter exposed called "System \ Processor Queue Length" which does what the equivalent of unix's load

    https://blogs.technet.microsoft.com/askperf/2008/01/15/an-overview-of-processor-bottlenecks/

    But we're gonna have to average it ourselves if we want this information.

    @ammaraskar
    Copy link
    Member

    Taking a shot at this, should take a day or so.

    @giampaolo
    Copy link
    Contributor

    There is an old ticket for this in psutil with some (possible useful) references in it:
    giampaolo/psutil#604

    @jkloth
    Copy link
    Contributor

    jkloth commented Jul 14, 2018

    Also prior conversation:

    https://bugs.python.org/issue30263#msg296311

    @ammaraskar
    Copy link
    Member

    Thanks a lot for that link Jeremy, it was really helpful. After reading up on it, my take is that winapi is the most appropriate place for this, it is a non public api that's used in the stdlib.

    I've used Windows APIs in a way that we don't need to manually start up a thread and call a calc_load function, instead using a callback invoked by windows. Internally this uses a thread pool, but it means we don't have to worry about managing the thread ourselves.

    The load is stored as a global but the winapi module is already marked as "-1" indicating it has global state, so that shouldn't be a problem. https://docs.python.org/3/c-api/module.html#c.PyModuleDef.m_size

    Like Jeremy noted, using WMI does add a 5mb overhead or so to the calling process. One more caveat is that the PdhAddEnglishCounterW function is only available in Vista+. I'm not sure if we still support Windows XP, but the alternative is to use PdhAddCounter, which breaks if the system language is not english because the counter paths are localized.

    @ammaraskar
    Copy link
    Member

    Opened up a PR with a proof of concept to get feedback on if this approach is reasonable.

    Giampaolo, on your psutil issue you specifically said, "(possibly without using WMI)"

    Is there any particular problem with using WMI?

    @giampaolo
    Copy link
    Contributor

    Giampaolo, on your psutil issue you specifically said, "(possibly without using WMI)" Is there any particular problem with using WMI?

    Performance. In general WMI is (a lot) slower than the Windows API counterpart (psutil never uses WMI except in unit tests). Don't know if this matters for this specific issue though, or whether a correspondent Windows API for doing the same thing exists.

    @ammaraskar
    Copy link
    Member

    Aah, yeah I don't think there's a good way of doing it purely from the windows API. There might be a way to enumerate through all the processes and see if they're queued up but I didn't look into it.

    In this case it should be fine, we just pay a bit of WMI cost to initialize the query, all the updating and retrieval is done asynchronously to the python code.

    @jkloth
    Copy link
    Contributor

    jkloth commented Jul 15, 2018

    Not that it matters all that much, but from a terminology standpoint, WMI != PDH != Performance Counters.

    Performance counters (the objects, not the topic) are provided by DLLs registered in the HKLM\SYSTEM\CurrentControlSet\Services key. Their data is accessed via registry API functions using the HKEY_PERFORMANCE_DATA root key.

    PDH (Performance Data Helper) provides an abstraction layer that can access those values among other things like a GUI or writing to log files.

    WMI (Windows Management Instrumentation) is yet another layer on top of PDH and raw Performance Counters.

    In this case of the "System" performance counter object, it is provided by a performance DLL (perfos.dll in the case of Win10 1803). If overhead (memory and/or CPU) is a concern, then accessing the counter data via the registry is the way to go.

    @giampaolo
    Copy link
    Contributor

    I suppose there is no way to emulate os.getloadavg() on Windows because that would necessarily imply using a thread to call the necessary routine (WMI, PDH, whatever...) every X secs or something, correct?

    @jkloth
    Copy link
    Contributor

    jkloth commented Jul 15, 2018

    Correct. Windows provides the building blocks for implementing getloadavg(), but does not provide an interface that does the averaging. That is deferred to a per application basis. The best that an application can do for that is to use thread pools. You can think of thread pools as kernel-managed threads (different from user-managed threads via CreateThread()).

    As of Win10 1703, any process linked with DLLs automatically have thread pools created for them (to parallel-ize the loading of said DLLs). Leveraging that feature would minimize the costs incurred to do the running average.

    @ammaraskar
    Copy link
    Member

    Is the function I used for the callback, RegisterWaitForSingleObject https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-registerwaitforsingleobject
    using the thread pooling system you're talking about underneath? The documentation seems to suggest so, but I'll wait for your opinion on it.

    @jkloth
    Copy link
    Contributor

    jkloth commented Jul 16, 2018

    The RegisterWaitForSingleObject() function does use the thread pool API:

    https://docs.microsoft.com/en-us/windows/desktop/ProcThread/thread-pool-api

    However, PdhCollectQueryDataEx() also creates a user-space thread to handle its work of setting the event whenever the timeout elapses.

    I would think that it should be preferable to use a single thread pool worker to retrieve the queue length and calculate the average.

    @ammaraskar
    Copy link
    Member

    Roger, personally I don't think its worth it to complicate the code in order to use the thread pool API. Especially considering this is just a private API and the only consumer will be the test suite runner. Let's see what the core devs think when this gets reviewed.

    @vstinner
    Copy link
    Member Author

    Logging the current value can be an acceptable compromise.

    When we run tests in subprocesses (python3 -m test -jN), we can run a test every N milliseconds with no thread. It's ok if the average is not accurate.

    Most buildbots use -jN for performance but also to isolate the main processes from crashes and hard timeouts.

    @vstinner
    Copy link
    Member Author

    https://docs.microsoft.com/en-us/sql/relational-databases/performance-monitor/monitor-cpu-usage?view=sql-server-2017

    Processor Queue Length:

    Corresponds to the number of threads waiting for processor time. A processor bottleneck develops when threads of a process require more processor cycles than are available. If more than a few processes attempt to utilize the processor's time, you might need to install a faster processor. Or, if you have a multiprocessor system, you could add a processor.

    When you examine processor usage, consider the type of work that the instance of SQL Server performs. If SQL Server performs many calculations, such as queries involving aggregates or memory-bound queries that require no disk I/O, 100 percent of the processor's time can be used. If this causes the performance of other applications to suffer, try changing the workload. For example, dedicate the computer to running the instance of SQL Server.

    Usage rates around 100 percent, where many client requests are being processed, may indicate that processes are queuing up, waiting for processor time, and causing a bottleneck. Resolve the problem by adding faster processors.

    --

    Is it exactly the same thing on Unix (load average)? If not, I would prefer to use a different name in regrtest and "loadavg". Maybe "PQL avg"?

    What is the impact of the number of CPUs on this value?

    @ammaraskar
    Copy link
    Member

    I don't think taking instantaneous values instead of averaging will work out too well. For reference I've attached a screenshot. It has sampled values at every second on an unloaded computer and then with running prime95 for cpu stress testing. The load tends to peak and fall.

    Is it exactly the same thing on Unix (load average)?

    Indeed it is: https://en.wikipedia.org/wiki/Load_(computing)#Unix-style_load_calculation

    "An idle computer has a load number of 0 (the idle process isn't counted). Each process using or waiting for CPU (the ready queue or run queue) increments the load number by 1."

    From what I can tell, the number of processors are dealt with the same way as on Linux, that is, a single core processor is overloaded when the load is >1 and a quad core processor is overloaded when the load is >4

    @vstinner
    Copy link
    Member Author

    I found a command to get the CPU usage in percent *per* CPU. Here with 2 CPUs:

    vstinner@WIN C:\vstinner\python\master>wmic cpu get loadpercentage
    LoadPercentage
    100
    100

    @zooba
    Copy link
    Member

    zooba commented Jul 18, 2018

    wmic cpu ...

    This is the WMI solution we are trying to avoid.

    But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command?

    I also haven't seen it suggested, but perhaps GetProcessTimes (https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-getprocesstimes) (or GetThreadTimes) would provide enough information to detect the same information?

    @vstinner
    Copy link
    Member Author

    I also haven't seen it suggested, but perhaps GetProcessTimes (https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-getprocesstimes) (or GetThreadTimes) would provide enough information to detect the same information?

    My intent is to get an idea if the whole system is busy. Not if the current Python process is busy. Most buildbots run tests with multiple worker processes (at least 2).

    @giampaolo
    Copy link
    Contributor

    psutil exposes this functionality as "psutil.cpu_percent()":

    https://github.com/giampaolo/psutil/blob/ac9dccab6b038835b5d612f92cf4804ec2662c2e/psutil/_psutil_windows.c#L992

    https://github.com/giampaolo/psutil/blob/ac9dccab6b038835b5d612f92cf4804ec2662c2e/psutil/__init__.py#L1675

    I'm not sure if it's worth it to copy all that stuff into Modules/_winapi.c and test/libregrtest/main.py though. It would probably be simpler to change the policy and allow (at least some) some third party libs in cPython's test suite. =)

    @ammaraskar
    Copy link
    Member

    But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command?

    This sounds like a very good solution to me, it avoids adding the complexity of the C code. We actually have two options here, to keep the results consistent with the unix load, we can use typeperf "\System\Processor Queue Length"

    To get cpu usage, we can use the command Victor posted. I'll make an alternative PR with that today just so we can contrast the two approaches.

    @zooba
    Copy link
    Member

    zooba commented Jul 18, 2018

    It would probably be simpler to change the policy and allow (at least some) some third party libs in cPython's test suite. =)

    I'm actually totally okay with this, as I'd really like to have JUnit XML output from the test suite, which is easiest to do with the existing third-party libraries.

    Can we formalize a way by which optional third-party libraries are allowed? Provided they aren't critical for the overall pass/fail state of the test suite (or the more strict alternative: pass/fail state of *each* test), I don't see any particular harm in certain site packages being used.

    (This is probably a discussion for python-dev, assuming the policy is written down somewhere.)

    @vstinner
    Copy link
    Member Author

    Processor Queue Length seems simpler and easier to read. I don't want to log 24 numbers per regrtest output line if a machine has 24 CPUs... The load average is a "raw" value to give the idea if the system is "loaded" or not. More precise metrics can be used later to debug a test failure, but manually.

    @vstinner
    Copy link
    Member Author

    PR 8287 seems short to me and it seems like psutils doesn't expose Processor Queue Length, so I'm not sure why we are talking about depending on psutils?

    @ammaraskar
    Copy link
    Member

    But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command?

    I opened up #8357 with this strategy, in my opinion its a lot nicer and doesn't clutter up _winapi with half baked, extremely specialized functions. Its a bit more involved than running a thread, details are on the PR.

    @giampaolo
    Copy link
    Contributor

    PR 8287 seems short to me and it seems like psutils doesn't expose Processor Queue Length, so I'm not sure why we are talking about depending on psutils?

    I'm not sure if you're strictly interested in getting system load or if CPU utilization is also fine. FWIW with psutil you would be able to get the system-wide CPU utilization occurred in a given period of time:

    >>> import psutil, time
    >>> psutil.cpu_percent(interval=None)  # non-blocking
    0.0
    >>> time.sleep(60)
    >>> psutil.cpu_percent(interval=None)  # average of the last 60 secs
    23.6
    >>>

    ...and you can do the same for the current process too (psutil.Process().cpu_percent()).

    @csabella
    Copy link
    Contributor

    csabella commented Apr 9, 2019

    New changeset e16467a by Cheryl Sabella (Ammar Askar) in branch 'master':
    bpo-34060: Report system load when running test suite for Windows (GH-8357)
    e16467a

    @csabella
    Copy link
    Contributor

    csabella commented Apr 9, 2019

    I've merged PR8357. I believe PR8287 can be closed now with PR8357 as the superseder?

    Thank you to everyone for your contributions to this discussion!

    @vstinner
    Copy link
    Member Author

    New changeset 1069d38 by Victor Stinner in branch '3.7':
    [3.7] bpo-36719: sync regrtest with master branch (GH-12967)
    1069d38

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes OS-windows tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants