classification
Title: multiprocessing.cpu_count() should use hw.availcpu on Mac OS X
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: jszakmeister, ned.deily, pitrou, ronaldoussoren, sbt, trent, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2013-03-17 11:16 by jszakmeister, last changed 2014-10-04 21:22 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
use-availcpu.patch jszakmeister, 2013-03-17 11:16 review
use-activecpu.patch jszakmeister, 2013-03-18 16:01 Updated patch to use hw.activecpu review
Messages (11)
msg184367 - (view) Author: John Szakmeister (jszakmeister) * Date: 2013-03-17 11:16
While trying to test a fix for Nose, I discovered that multiprocessing is picking up the CPU count incorrectly.  It should be using hw.availcpu instead of hw.ncpu.  The latter is the number of cpus installed in the system, but the former is the number that are available for processing.  The processor pane let's you adjust the available CPUs, which is handy for testing and troubleshooting.
msg184459 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2013-03-18 15:18
I'm not sure if hw.availcpu is the right value to use as it is not documented at all (neither in a manpage, nor in a headerfile).

hw.activecpu seems to be the one that should be used: it is documented as "The number of processors currently available for executing threads." in the sysctl.h header file and that comment also mentions that it should be used to determine the amount of threads to start in an SMP application.
msg184463 - (view) Author: John Szakmeister (jszakmeister) * Date: 2013-03-18 16:01
Ronald: it is mentioned in some books (a Google search can turn them up), but they don't really offer much description behind the intent.  When I looked into this several years ago, it was very unclear what `hw.activecpu` was intended for.  It sounded more like a report about how many processors are active, versus targetting your SMP aware application to that number.

But since you've turned some information in sysctl.h, I think we should follow that advice and use hw.activecpu.  I've attached a new patch with the change.
msg184581 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-03-19 02:36
Here is an interesting, but old (2007), email on darwin-dev:
http://lists.apple.com/archives/darwin-dev/2007/Jun/msg00088.html

"This can all change in the future, but currently:

hw.ncpu is a wart; consider it to be deprecated.
hw.physicalcpu is the number of physical CPUs
hw.logicalcpu is the number of logical CPUs; this is for SMT, which we don't support (maybe T1s?)
hw.availcpu are the number logical CPUs currently online

These interfaces are evolving, however, you are unlikely to get a description of these. They are intended for internal ibrary use, and not for use by applications, since applications should use the library abstractions rather than trying to use this information directly themselves."

By the way, multiprocessing should use subprocessing directly, not os.popen() (but this is a different issue).
msg184585 - (view) Author: Trent Nelson (trent) * (Python committer) Date: 2013-03-19 03:50
I remember looking at what multiprocessing did and not really liking it; I ended up writing a C version that works across a wider range of platforms, accessible via posixmodule.c:posix_cpu_count() (os.cpu_count()):

http://hg.python.org/sandbox/trent/file/dd1c2fd3aa31/Modules/posixmodule.c#l10213
msg184615 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-03-19 07:54
I like the idea of a new function in the os module because I don't
like having to import the multiprocessing module just to known the
number of CPUs. I'm using such function to set MAKEFLAGS envrionment
variable on Linux: -j8 par example.

2013/3/19 Trent Nelson <report@bugs.python.org>:
>
> Trent Nelson added the comment:
>
> I remember looking at what multiprocessing did and not really liking it; I ended up writing a C version that works across a wider range of platforms, accessible via posixmodule.c:posix_cpu_count() (os.cpu_count()):
>
> http://hg.python.org/sandbox/trent/file/dd1c2fd3aa31/Modules/posixmodule.c#l10213
>
> ----------
> nosy: +trent
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17444>
> _______________________________________
msg184617 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2013-03-19 08:07
I also like the os.cpu_count() function, the information is useful sometimes outside of multiprocessing, and calling out to external scripts to gather the information (as multiprocessing currently does) feels yucky.

That should probably be a new issue, the change in this issue fixes a real problem  (the cpu count code in multiprocessing can overestimate the usable CPU count on OSX) and is a bugfix that should be backported to the stable branches.

BTW. Trent's os.cpu_count implementation also uses hw.ncpu and is therefore also broken on OSX.
msg184623 - (view) Author: John Szakmeister (jszakmeister) * Date: 2013-03-19 08:58
Actually, Trent's version looks at hw.logicalcpu and then falls back to hw.ncpu, if there was an error.  Given the state of the documentation on these parameters, it's hard to say whether it's right or wrong, but at least hw.logicalcpu scales correctly if I disable some of the processors.
msg184661 - (view) Author: Trent Nelson (trent) * (Python committer) Date: 2013-03-19 18:50
On Tue, Mar 19, 2013 at 01:58:59AM -0700, John Szakmeister wrote:
> 
> John Szakmeister added the comment:
> 
> Actually, Trent's version looks at hw.logicalcpu and then falls back
> to hw.ncpu, if there was an error.  Given the state of the
> documentation on these parameters, it's hard to say whether it's right
> or wrong, but at least hw.logicalcpu scales correctly if I disable
> some of the processors.

    That's pretty much the rationale I used.  I tested the fallback on
    OS X manually (i.e. the _bsd_cpu_count()), and that works, and the
    hw.logicalcpu definitely works in the first place, so, I figured it
    was good enough.

    I'll raise a new issue for os.cpu_count().
msg209841 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2014-01-31 22:27
bump?
msg209905 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-02-01 18:21
The current os.cpu_count implementation calls sysconf(_SC_NPROCESSORS_ONLN), which is apparently defined under OS X, and returns the number of online CPUs (logical?):
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/sysconf.3.html

multiprocessing has been modified to re-use os.cpu_count(), so I suggest closing this issue as out-of-date.
History
Date User Action Args
2014-10-04 21:22:41pitrousetstatus: open -> closed
resolution: out of date
stage: patch review -> resolved
2014-02-01 18:21:25pitrousetnosy: + pitrou
messages: + msg209905
2014-01-31 22:27:27yselivanovsetnosy: + yselivanov
messages: + msg209841
2013-03-19 18:50:02trentsetmessages: + msg184661
2013-03-19 08:58:59jszakmeistersetmessages: + msg184623
2013-03-19 08:07:46ronaldoussorensetmessages: + msg184617
2013-03-19 07:54:54vstinnersetmessages: + msg184615
2013-03-19 03:50:09trentsetnosy: + trent
messages: + msg184585
2013-03-19 02:36:42vstinnersetnosy: + vstinner
messages: + msg184581
2013-03-18 16:01:49jszakmeistersetfiles: + use-activecpu.patch

messages: + msg184463
2013-03-18 15:18:35ronaldoussorensetmessages: + msg184459
2013-03-17 18:29:29ned.deilysetnosy: + ronaldoussoren, ned.deily
2013-03-17 13:56:50pitrousetstage: patch review
versions: + Python 2.7, Python 3.2, Python 3.3, Python 3.4
2013-03-17 12:40:55sbtsetnosy: + sbt
2013-03-17 11:16:10jszakmeistercreate