Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing.cpu_count() should use hw.availcpu on Mac OS X #61646

Closed
jszakmeister mannequin opened this issue Mar 17, 2013 · 11 comments
Closed

multiprocessing.cpu_count() should use hw.availcpu on Mac OS X #61646

jszakmeister mannequin opened this issue Mar 17, 2013 · 11 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@jszakmeister
Copy link
Mannequin

jszakmeister mannequin commented Mar 17, 2013

BPO 17444
Nosy @ronaldoussoren, @pitrou, @vstinner, @jszakmeister, @ned-deily, @tpn, @1st1
Files
  • use-availcpu.patch
  • use-activecpu.patch: Updated patch to use hw.activecpu
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-10-04.21:22:41.933>
    created_at = <Date 2013-03-17.11:16:10.388>
    labels = ['type-bug', 'library']
    title = 'multiprocessing.cpu_count() should use hw.availcpu on Mac OS X'
    updated_at = <Date 2014-10-04.21:22:41.932>
    user = 'https://github.com/jszakmeister'

    bugs.python.org fields:

    activity = <Date 2014-10-04.21:22:41.932>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-10-04.21:22:41.933>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2013-03-17.11:16:10.388>
    creator = 'jszakmeister'
    dependencies = []
    files = ['29430', '29442']
    hgrepos = []
    issue_num = 17444
    keywords = ['patch']
    message_count = 11.0
    messages = ['184367', '184459', '184463', '184581', '184585', '184615', '184617', '184623', '184661', '209841', '209905']
    nosy_count = 8.0
    nosy_names = ['ronaldoussoren', 'pitrou', 'vstinner', 'jszakmeister', 'ned.deily', 'trent', 'sbt', 'yselivanov']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue17444'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.4']

    @jszakmeister
    Copy link
    Mannequin Author

    jszakmeister mannequin commented Mar 17, 2013

    While trying to test a fix for Nose, I discovered that multiprocessing is picking up the CPU count incorrectly. It should be using hw.availcpu instead of hw.ncpu. The latter is the number of cpus installed in the system, but the former is the number that are available for processing. The processor pane let's you adjust the available CPUs, which is handy for testing and troubleshooting.

    @jszakmeister jszakmeister mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Mar 17, 2013
    @ronaldoussoren
    Copy link
    Contributor

    I'm not sure if hw.availcpu is the right value to use as it is not documented at all (neither in a manpage, nor in a headerfile).

    hw.activecpu seems to be the one that should be used: it is documented as "The number of processors currently available for executing threads." in the sysctl.h header file and that comment also mentions that it should be used to determine the amount of threads to start in an SMP application.

    @jszakmeister
    Copy link
    Mannequin Author

    jszakmeister mannequin commented Mar 18, 2013

    Ronald: it is mentioned in some books (a Google search can turn them up), but they don't really offer much description behind the intent. When I looked into this several years ago, it was very unclear what hw.activecpu was intended for. It sounded more like a report about how many processors are active, versus targetting your SMP aware application to that number.

    But since you've turned some information in sysctl.h, I think we should follow that advice and use hw.activecpu. I've attached a new patch with the change.

    @vstinner
    Copy link
    Member

    Here is an interesting, but old (2007), email on darwin-dev:
    http://lists.apple.com/archives/darwin-dev/2007/Jun/msg00088.html

    "This can all change in the future, but currently:

    hw.ncpu is a wart; consider it to be deprecated.
    hw.physicalcpu is the number of physical CPUs
    hw.logicalcpu is the number of logical CPUs; this is for SMT, which we don't support (maybe T1s?)
    hw.availcpu are the number logical CPUs currently online

    These interfaces are evolving, however, you are unlikely to get a description of these. They are intended for internal ibrary use, and not for use by applications, since applications should use the library abstractions rather than trying to use this information directly themselves."

    By the way, multiprocessing should use subprocessing directly, not os.popen() (but this is a different issue).

    @tpn
    Copy link
    Member

    tpn commented Mar 19, 2013

    I remember looking at what multiprocessing did and not really liking it; I ended up writing a C version that works across a wider range of platforms, accessible via posixmodule.c:posix_cpu_count() (os.cpu_count()):

    http://hg.python.org/sandbox/trent/file/dd1c2fd3aa31/Modules/posixmodule.c#l10213

    @vstinner
    Copy link
    Member

    I like the idea of a new function in the os module because I don't
    like having to import the multiprocessing module just to known the
    number of CPUs. I'm using such function to set MAKEFLAGS envrionment
    variable on Linux: -j8 par example.

    2013/3/19 Trent Nelson <report@bugs.python.org>:

    Trent Nelson added the comment:

    I remember looking at what multiprocessing did and not really liking it; I ended up writing a C version that works across a wider range of platforms, accessible via posixmodule.c:posix_cpu_count() (os.cpu_count()):

    http://hg.python.org/sandbox/trent/file/dd1c2fd3aa31/Modules/posixmodule.c#l10213

    ----------
    nosy: +trent


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue17444\>


    @ronaldoussoren
    Copy link
    Contributor

    I also like the os.cpu_count() function, the information is useful sometimes outside of multiprocessing, and calling out to external scripts to gather the information (as multiprocessing currently does) feels yucky.

    That should probably be a new issue, the change in this issue fixes a real problem (the cpu count code in multiprocessing can overestimate the usable CPU count on OSX) and is a bugfix that should be backported to the stable branches.

    BTW. Trent's os.cpu_count implementation also uses hw.ncpu and is therefore also broken on OSX.

    @jszakmeister
    Copy link
    Mannequin Author

    jszakmeister mannequin commented Mar 19, 2013

    Actually, Trent's version looks at hw.logicalcpu and then falls back to hw.ncpu, if there was an error. Given the state of the documentation on these parameters, it's hard to say whether it's right or wrong, but at least hw.logicalcpu scales correctly if I disable some of the processors.

    @tpn
    Copy link
    Member

    tpn commented Mar 19, 2013

    On Tue, Mar 19, 2013 at 01:58:59AM -0700, John Szakmeister wrote:

    John Szakmeister added the comment:

    Actually, Trent's version looks at hw.logicalcpu and then falls back
    to hw.ncpu, if there was an error. Given the state of the
    documentation on these parameters, it's hard to say whether it's right
    or wrong, but at least hw.logicalcpu scales correctly if I disable
    some of the processors.

    That's pretty much the rationale I used.  I tested the fallback on
    OS X manually (i.e. the _bsd_cpu_count()), and that works, and the
    hw.logicalcpu definitely works in the first place, so, I figured it
    was good enough.
    
    I'll raise a new issue for os.cpu_count().
    

    @1st1
    Copy link
    Member

    1st1 commented Jan 31, 2014

    bump?

    @pitrou
    Copy link
    Member

    pitrou commented Feb 1, 2014

    The current os.cpu_count implementation calls sysconf(_SC_NPROCESSORS_ONLN), which is apparently defined under OS X, and returns the number of online CPUs (logical?):
    https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/sysconf.3.html

    multiprocessing has been modified to re-use os.cpu_count(), so I suggest closing this issue as out-of-date.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants