classification
Title: Support different contexts in multiprocessing
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: open Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Olivier.Grisel, jnoller, larsmans, python-dev, sbt
Priority: normal Keywords: patch

Created on 2013-09-10 14:53 by larsmans, last changed 2014-03-10 22:11 by python-dev.

Files
File name Uploaded Description Edit
mp_getset_start_method.patch larsmans, 2013-09-10 15:39 review
context.patch sbt, 2013-10-10 14:35 review
Messages (25)
msg197444 - (view) Author: Lars Buitinck (larsmans) Date: 2013-09-10 14:53
The new multiprocessing based on forkserver (issue8713) looks great, but it has two problems. The first:

"set_start_method() should not be used more than once in the program."

The documentation does not explain what the effect of calling it twice would be. Judging from the documentation, it should be possible to do

start_method = get_start_method()
if start_method is None:
    set_start_method('forkserver')

but this code shows the second problem: it always succeeds with the (undocumented!) side-effect of setting the start method in get_start_method, to the system default, if it hasn't been set already.

I was just going to forge a patch for joblib (http://pythonhosted.org/joblib/) to set the start method to forkserver at import time. But in the current state of affairs, it would be impossible for the user to safely override the start method before importing joblib, because joblib can't figure out if it's been set already without setting it.

The enclosed patch solves the problem by making the new functions more robust:

* get_start_method no longer sets anything, but returns None if the start method has not been set already;
* set_start_method raises a RuntimeError (for want of a better exception type) when resetting the start method is attempted.

Unfortunately, I had to hack up the tests a bit, because they were violating the set_start_method contract. There is a test for the new set_start_method behavior, though, and all {fork,forkserver,spawn} tests pass on Linux.
msg197447 - (view) Author: Olivier Grisel (Olivier.Grisel) * Date: 2013-09-10 15:13
Related question: is there any good reason that would prevent to pass a custom `start_method` kwarg to the `Pool` constructor to make it use an alternative `Popen` instance (that is an instance different from the  `multiprocessing._Popen` singleton)?

This would allow libraries such as joblib to keep minimal side effect by try to impact the default multiprocessing runtime as low as possible.
msg197450 - (view) Author: Lars Buitinck (larsmans) Date: 2013-09-10 15:39
Cleaned up the patch.
msg197453 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-09-10 16:04
With your patch, I think if you call get_start_method() without later calling set_start_method() then the helper process(es) will never be started.

With the current code, popen.Popen() automatically starts the helper processes if they have not already been started.

Also, set_start_method() can have the side-effect of starting helper process(es).  I do not really approve of new processes being started as a side-effect of importing a library.  But it is reasonable for a library to want a specific start method unless the user demands otherwise.

I will have to think this over.


BTW, the reason for discouraging using set_start_method() more than once is because some shared resources are created differently depending on what the current start method is.

For instance using the fork method semaphores are created and then immediately unlinked.  But with the other start methods we must not unlink the semaphore until we are finished with it (while being paranoid about cleanup).

Maybe it would be better to have separate contexts for each start method.  That way joblib could use the forkserver context without interfering with the rest of the user's program.

    from multiprocessing import forkserver_context as mp
    l = mp.Lock()
    p = mp.Process(...)
    with mp.Pool() as pool:
        ...
msg197467 - (view) Author: Lars Buitinck (larsmans) Date: 2013-09-10 21:15
In my patched version, the private popen.get_start_method gets a kwarg set_if_needed=True. popen.Popen calls that as before, so its behavior should not change, while the public get_start_method sets the kwarg to False.

I realise now that this has the side effect that get_start_method's output changes when multiprocessing has first been used, but then that reflects how the library works. Maybe this should be documented.

As for the contexts, those would be great.
msg197468 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-09-10 21:48
> In my patched version, the private popen.get_start_method gets a kwarg 
> set_if_needed=True. popen.Popen calls that as before, so its behavior 
> should not change, while the public get_start_method sets the kwarg to 
> False.

My mistake.
msg197486 - (view) Author: Olivier Grisel (Olivier.Grisel) * Date: 2013-09-11 11:02
> Maybe it would be better to have separate contexts for each start method.  That way joblib could use the forkserver context without interfering with the rest of the user's program.

Yes in general it would be great if libraries could customize the multiprocessing default options without impacting any of the module singletons. That also include the ForkingPickler registry for custom: now it's a class attribute. It would be great to be able to scope custom reducers registration to a given multiprocessing.Pool or multiprocessing.Process instance.

Now how to implement that kind of isolation: it could either be done by adding new constructor parameters or new public methods to the Process and Pool classes to be able to customize their behavior while sticking to the OOP paradigm if possible or by using a context manager as you suggested.

I am not sure which option is best. Prototyping both is probably the best way to feel the tradeoffs.
msg197518 - (view) Author: Lars Buitinck (larsmans) Date: 2013-09-12 10:51
I don't really see the benefit of a context manager over an argument. It's a power user feature anyway, and context managers (at least to me) signal cleanup actions, rather than construction options.
msg197523 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-09-12 12:37
By "context" I did not really mean a context manager.  I just meant an object (possibly a singleton or module) which implements the same interface as multiprocessing.

(However, it may be a good idea to also make it a context manager whose __enter__() method starts the helper processes, and whose __exit__() method shuts them down.)
msg197524 - (view) Author: Olivier Grisel (Olivier.Grisel) * Date: 2013-09-12 12:54
The process pool executor [1] from the concurrent futures API would be suitable to explicitly start and stop the helper process for the `forkserver` mode.

[1] http://docs.python.org/3.4/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor

The point would be to have as few state as possible encoded in the multiprocessing module (and its singletons) and move that state information to be directly managed by multiprocessing Process and Pool class instances so that libraries could customize the behavior (start_method, executable, ForkingPIckler reducers registry and so on) without mutating the state of the multiprocessing module singletons.
msg197532 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-09-12 15:10
There are lots of things that behave differently depending on the currently set start method: Lock(), Semaphore(), Queue(), Value(), ...  It is not just when creating a Process or Pool that you need to know the start method.

Passing a context or start_method argument to all of these constructors would be very awkward, which is why I think it is better to treat the context as an object with methods Process(), Pool(), Lock(), Semaphore(), etc.

Unfortunately, I do not have time to work on this just now...
msg197534 - (view) Author: Olivier Grisel (Olivier.Grisel) * Date: 2013-09-12 15:16
Richard Oudkerk: thanks for the clarification, that makes sense. I don't have the time either in the coming month, maybe later.
msg197562 - (view) Author: Lars Buitinck (larsmans) Date: 2013-09-13 08:53
Ok. Do you (or jnoller?) have time to review my proposed patch, at least before 3.4 is released? I didn't see it in the release schedule, so it's probably not planned soon, but I wouldn't want the API to change *again* in 3.5.
msg197669 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-09-13 21:46
I'll review the patch.  (According to http://www.python.org/dev/peps/pep-0429/ feature freeze is expected in late November, so there is not too much of rush.)
msg199389 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-10-10 14:35
Attached is a patch which allows the use of separate contexts.  For example

    try:
        ctx = multiprocessing.get_context('forkserver')
    except ValueError:
        ctx = multiprocessing.get_context('spawn')

    q = ctx.Queue()
    p = ctx.Process(target=foo, args=(q,))
    p.start()
    ...

Also, get_start_method(allow_none=True) will return None if the start method has not yet been fixed.
msg199390 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-10-10 14:44
BTW, the context objects are singletons.

I could not see a sensible way to make ctx.Process be a picklable class (rather than a method) if there can be multiple instances of a context type.  This means that the helper processes survive until the program closes down.
msg199706 - (view) Author: Lars Buitinck (larsmans) Date: 2013-10-13 14:12
> BTW, the context objects are singletons.

I haven't read all of your patch yet, but does this mean a forkserver will be started regardless of whether it is later used?

That would be a good thing, since starting the fork server after reading in large data sets would mean the fork server would hold on to large swaths of memory even when the data set is deallocated in the master process.
msg199709 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-10-13 14:38
> I haven't read all of your patch yet, but does this mean a forkserver 
> will be started regardless of whether it is later used?

No, it is started on demand.  But since it is started using _posixsbuprocess.fork_exec(), nothing is inherited from the main process.
msg199710 - (view) Author: Lars Buitinck (larsmans) Date: 2013-10-13 14:40
Ok, great.
msg200060 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-16 15:44
New changeset 72a5ac909c7a by Richard Oudkerk in branch 'default':
Issue #18999: Make multiprocessing use context objects.
http://hg.python.org/cpython/rev/72a5ac909c7a
msg200061 - (view) Author: Lars Buitinck (larsmans) Date: 2013-10-16 16:37
Thanks, much better than my solution!
msg200497 - (view) Author: Lars Buitinck (larsmans) Date: 2013-10-19 21:07
Strange, I can't actually get it to work:

>>> from multiprocessing import Pool, get_context
>>> forkserver = get_context('forkserver')
>>> Pool(context=forkserver)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Pool() got an unexpected keyword argument 'context'
msg200499 - (view) Author: Lars Buitinck (larsmans) Date: 2013-10-19 21:12
I also tried

from multiprocessing.pool import Pool

but that died with

ImportError: cannot import name get_context
msg200507 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-10-19 22:28
I guess this should be clarified in the docs, but multiprocessing.pool.Pool is a *class* whose constructor takes a context argument, where as multiprocessing.Pool() is a *bound method* of the default context.  (In previous versions multiprocessing.Pool was a *function*.)

The only reason you might need the context argument is if you have subclassed multiprocessing.pool.Pool.

>>> from multiprocessing import pool, get_context
>>> forkserver = get_context('forkserver')
>>> p = forkserver.Pool()
>>> q = pool.Pool(context=forkserver)
>>> p, q
(<multiprocessing.pool.Pool object at 0xb71f3eec>, <multiprocessing.pool.Pool object at 0xb6edb06c>)

I suppose we could just make the bound methods accept a context argument which (if not None) is used instead of self.
msg213100 - (view) Author: Roundup Robot (python-dev) Date: 2014-03-10 22:11
New changeset b941a320601a by R David Murray in branch 'default':
whatsnew: multiprocessing start methods and context (#8713 and #18999)
http://hg.python.org/cpython/rev/b941a320601a
History
Date User Action Args
2014-03-10 22:11:24python-devsetmessages: + msg213100
2013-10-19 22:28:50sbtsetstatus: closed -> open

messages: + msg200507
2013-10-19 21:12:50larsmanssetmessages: + msg200499
2013-10-19 21:07:03larsmanssetmessages: + msg200497
2013-10-16 17:20:09sbtsetstatus: open -> closed
2013-10-16 16:37:46larsmanssetstatus: pending -> open

messages: + msg200061
2013-10-16 15:48:23sbtsetstatus: open -> pending
type: behavior -> enhancement
title: Robustness issues in multiprocessing.{get,set}_start_method -> Support different contexts in multiprocessing
resolution: fixed
stage: resolved
2013-10-16 15:44:24python-devsetnosy: + python-dev
messages: + msg200060
2013-10-13 14:40:48larsmanssetmessages: + msg199710
2013-10-13 14:38:12sbtsetmessages: + msg199709
2013-10-13 14:12:15larsmanssetmessages: + msg199706
2013-10-10 14:44:47sbtsetmessages: + msg199390
2013-10-10 14:35:26sbtsetfiles: + context.patch

messages: + msg199389
2013-09-13 21:46:34sbtsetmessages: + msg197669
2013-09-13 08:53:03larsmanssetmessages: + msg197562
2013-09-12 15:16:57Olivier.Griselsetmessages: + msg197534
2013-09-12 15:10:24sbtsetmessages: + msg197532
2013-09-12 12:54:11Olivier.Griselsetmessages: + msg197524
2013-09-12 12:37:02sbtsetmessages: + msg197523
2013-09-12 10:51:26larsmanssetmessages: + msg197518
2013-09-12 10:33:26larsmanssetnosy: + jnoller
2013-09-11 11:02:26Olivier.Griselsetmessages: + msg197486
2013-09-10 21:48:54sbtsetmessages: + msg197468
2013-09-10 21:15:33larsmanssetmessages: + msg197467
2013-09-10 16:04:40sbtsetmessages: + msg197453
2013-09-10 15:39:13larsmanssetfiles: + mp_getset_start_method.patch

messages: + msg197450
2013-09-10 15:38:48larsmanssetfiles: - mp_getset_start_method.patch
2013-09-10 15:13:25Olivier.Griselsetnosy: + Olivier.Grisel
messages: + msg197447
2013-09-10 15:05:56larsmanssettitle: Allow multiple calls to multiprocessing.set_start_method -> Robustness issues in multiprocessing.{get,set}_start_method
2013-09-10 14:59:57larsmanssetnosy: + sbt
2013-09-10 14:53:41larsmanscreate