Title: multiprocessing.dummy.Pool does not accept maxtasksperchild argument
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.4, Python 3.5, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Noah.Yetter, jnoller, josh.r, sbt
Priority: normal Keywords:

Created on 2013-02-04 18:28 by Noah.Yetter, last changed 2019-04-26 20:01 by BreamoreBoy.

Messages (4)
msg181365 - (view) Author: Noah Yetter (Noah.Yetter) Date: 2013-02-04 18:28
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32

Type "help", "copyright", "credits" or "license" for more information.

The docs claim that "multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module." however dummy's Pool method does not replicate the API of multiprocessing's Pool method:

>>> import inspect
>>> import multiprocessing
>>> inspect.getargspec(multiprocessing.Pool)
ArgSpec(args=['processes', 'initializer', 'initargs', 'maxtasksperchild'], varargs=None, keywords=None, defaults=(None, None, (), None))
>>> import multiprocessing.dummy
>>> inspect.getargspec(multiprocessing.dummy.Pool)
ArgSpec(args=['processes', 'initializer', 'initargs'], varargs=None, keywords=None, defaults=(None, None, ()))

Thus when attempting to downshift from multiprocessing to threading like so...

import multiprocessing.dummy as multiprocessing

...code that supplies the maxtasksperchild argument to Pool() will not run.
msg223364 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-17 19:48
I've confirmed that the behaviour is identical in 3.4.1 on Windows.
msg223394 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-07-18 00:18
Note: To my knowledge there is little or no benefit to using maxtasksperchild when the implementation is using threads. Cleaning up worker processes intermittently will guarantee that memory, handles, etc., are returned to the OS. But memory and handles allocated in a thread are not freed when the thread exits (with the exception of explicitly thread local stuff, which isn't common); it's all using the same heap, and memory allocated by thread 1 is indistinguishable from memory allocated by thread 2.

It's not a bad idea to keep the interface consistent, but I'm not sure it's a good idea to offer and implement a behavior that isn't actually accomplishing anything. Anyone else have any thoughts?
msg223395 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-07-18 00:22
Actually, now that I think about, most thread local stuff wouldn't be freed automatically either, since it's still allocated from a common pool of memory, and interleaved allocations would still prevent memory blocks from being returned to the OS. As far as behavior goes, assuming you aren't explicitly checking thread IDs (which should be meaningless to worker threads anyway; the tasks are assigned arbitrarily), clearing out the thread local storage when a thread exits would be equivalent to ending one worker thread and starting a new one.
Date User Action Args
2019-04-26 20:01:33BreamoreBoysetnosy: - BreamoreBoy
2014-07-18 00:22:25josh.rsetmessages: + msg223395
2014-07-18 00:19:00josh.rsetnosy: + josh.r
messages: + msg223394
2014-07-17 19:48:26BreamoreBoysetnosy: + BreamoreBoy, jnoller, sbt

messages: + msg223364
versions: + Python 3.4, Python 3.5, - Python 3.3
2013-02-04 18:28:33Noah.Yettercreate