Author nirai
Recipients Giovanni.Bajo, avian, bobbyi, gregory.p.smith, neologix, nirai, pitrou, sdaoden, vstinner
Date 2011-07-12.20:57:35
SpamBayes Score 2.498e-15
Marked as misclassified No
Message-id <1310504257.51.0.297152901145.issue6721@psf.upfronthosting.co.za>
In-reply-to
Content
Well, my brain did not deadlock, but after spinning on the problem for a while longer, it now thinks Tomaž Šolc and Steffen are right.

We should try to fix the multiprocessing module so it does not deadlock single-thread programs and deprecate fork in multi-threaded programs.

Here is the longer version, which is a summary of what people said here in various forms, observations from diving into the code and Googling around:


1) The rabbit hole

a) In a multi-threaded program, fork() may be called while another thread is in a critical section. That thread will not exist in the child and the critical section will remain locked. Any attempt to enter that critical section will deadlock the child.

b) POSIX included the pthread_atfork() API in an attempt to deal with the problem:
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

c) But it turns out atfork handlers are limited to calling async-signal-safe functions since fork may be called from a signal handler:
http://download.oracle.com/docs/cd/E19963-01/html/821-1601/gen-61908.html#gen-95948

This means atfork handlers may not actually acquire or release locks. See opinion by David Butenhof who was involved in the standardization effort of POSIX threads:
http://groups.google.com/group/comp.programming.threads/msg/3a43122820983fde

d) One consequence is that we can not assume third party libraries are safe to fork in multi-threaded program. It is likely their developers consider this scenario broken.

e) It seems the general consensus across the WWW concerning this problem is that it has no solution and that a fork should be followed by exec as soon as possible.

Some references:
http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
http://austingroupbugs.net/view.php?id=62
http://sourceware.org/bugzilla/show_bug.cgi?id=4737
http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them


2) Python’s killer rabbit

The standard library multiprocessing module does two things that force us into the rabbit hole; it creates worker threads and forks without exec.

Therefore, any program that uses the multiprocessing module is a multi-threading forking program.

One immediate problem is that a multiprocessing.Pool may fork from its worker thread in Pool._handle_workers(). This puts the forked child at risk of deadlock with any code that was run by the parent’s main thread (the entire program logic).

More problems may be found with a code review.

Other modules to look at are concurrent.futures.process (creates a worker thread and uses multiprocessing) and socketserver (ForkingMixIn forks without exec).


3) God bless the GIL

a) Python signal handlers run synchronously in the interpreter loop of the main thread, so os.fork() will never be called from a POSIX signal handler.

This means Python atfork prepare and parent handlers may run any code. The code run at the child is still restricted though and may deadlock into any acquired locks left behind by dead threads in the standard library or lower level third party libraries.

b) Turns out the GIL also helps by synchronizing threads.

Any lock held for the duration of a function call while the GIL is held will be released by the time os.fork() is called. But if a thread in the program calls a function that yields the GIL we are in la la land again and need to watch out step.


4) Landing gently at the bottom

a) I think we should try to review and sanitize the worker threads of the multiprocessing module and other implicit worker threads in the standard library.

Once we do (and since os.fork() is never run from a POSIX signal handler) the multiprocessing library should be safe to use in programs that do not start other threads.

b) Then we should declare the user scenario of mixing the threading and multiprocessing modules as broken by design.

c) Finally, optionally provide atfork API

The atfork API can be used to refactor existing fork handlers in the standard library, provide handlers for modules such as the logging module that will reduce the risk of deadlock in existing programs, and can be used by developers who insist on mixing threading and forking in their programs.


5) Sanitizing worker threads in the multiprocessing module

TODO :) 

(will try to post some ideas soon)
History
Date User Action Args
2011-07-12 20:57:37niraisetrecipients: + nirai, gregory.p.smith, pitrou, vstinner, bobbyi, neologix, Giovanni.Bajo, sdaoden, avian
2011-07-12 20:57:37niraisetmessageid: <1310504257.51.0.297152901145.issue6721@psf.upfronthosting.co.za>
2011-07-12 20:57:36nirailinkissue6721 messages
2011-07-12 20:57:35niraicreate