classification
Title: Add an 'atfork' module
Type: enhancement Stage:
Components: Extension Modules, Interpreter Core Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: christian.heimes Nosy List: Arfrever, amaury.forgeotdarc, asvetlov, christian.heimes, georg.brandl, grahamd, gregory.p.smith, haypo, jcea, lemburg, pitrou, sbt, twouters
Priority: normal Keywords: needs review, patch

Created on 2012-11-18 15:20 by christian.heimes, last changed 2013-01-14 16:06 by pitrou.

Files
File name Uploaded Description Edit
pure-python-atfork.patch sbt, 2012-11-19 22:43 review
Messages (19)
msg175878 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-11-18 15:20
I propose the addition of an 'afterfork' module. The module shall fulfill a similar task as the 'atexit' module except that it handles process forks instead of process shutdown.

The 'afterfork' module shall allow libraries to register callbacks that are executed on fork() inside the child process and as soon as possible. Python already has a function that must be called by C code: PyOS_AfterFork(). The 'afterfork' callbacks are called as the last step in PyOS_AfterFork().

Use case example:
The tempfile module has a specialized RNG that re-initialized the RNG after fork() by comparing os.getpid() to an instance variable every time the RNG is accessed. The check can be replaced with an afterfork callback.

Open questions:
How should the afterfork() module handle exceptions that are raised by callbacks?

Implementation:
I'm going to use as much code from atexitmodule.c as possible. I'm going to copy common code to a template file and include the template from atexitmodule.c and afterforkmodule.c with some preprocessor tricks.
msg175892 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-11-18 17:34
pthread_atfork() allows the registering of three types of callbacks:

1) prepare callbacks which are called before the fork,
2) parent callbacks which are called in the parent after the fork
3) child callbacks which are called in the child after the fork.

I think all three should be supported.

I also think that a recursive "fork lock" should be introduced which is held during the fork.  This can be acquired around critical sections during which forks must not occur.

This is more or less a duplicate of #6923.  See also #6721.
msg175967 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-11-19 20:55
Thanks Richard!

My first reaction was YAGNI but after I read the two tickets I now understand the need for three different hooks. I suggest that we implement our own hooks like the http://linux.die.net/man/3/pthread_atfork function, especially the order of function calls:

The parent and child fork handlers shall be called in the order in which they were established by calls to pthread_atfork().  The prepare fork handlers shall be called in the opposite order.

I like to focus on three hooks + the Python API and leave the usage of the hooks to other developers.

Proposal:
* Introduce a new module called atfork (Modules/atforkmodule.c) that is build into the core.
* Move PyOS_AfterFork to Modules/atforkmodule.c.
* Add PyOS_BeforeFork() (or PyOS_PrepareFork() ?) and PyOS_AfterForkParent() 
* call the two new methods around the calls to fork() in the stdlib.

I'm not yet sure how to implement the Python API. I could either implement six methods:

  atfork.register_before_fork(callable, *args, **kwargs)
  atfork.register_after_fork_child(callable, *args, **kwargs)
  atfork.register_after_fork_parent(callable, *args, **kwargs)
  atfork.unregister_before_fork(callable)
  atfork.unregister_after_fork_child(callable)
  atfork.unregister_after_fork_parent(callable)

or two:

  atfork.register(prepare=None, parent=None, child=None, *args, **kwargs)
  atfork.unregister(prepare=None, parent=None, child=None)
msg175972 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-11-19 22:43
Note that Gregory P. Smith has written

    http://code.google.com/p/python-atfork/

I also started a pure python patch but did not get round it posting it.  (It also implements the fork lock idea.)  I'll attach it here.

How do you intend to handle the propagation of exceptions?  I decided that after

    atfork.atfork(prepare1, parent1, child1)
    atfork.atfork(prepare2, parent2, child2)
    ...
    atfork.atfork(prepareN, parentN, childN)

calling "pid = os.fork()" should be equivalent to

    pid = None
    prepareN()
    try:
        ...
            prepare2()
            try:
                prepare1()
                try:
                    pid = posix.fork()
                finally:
                    parent1() if pid != 0 else child1()
            finally:
                parent2() if pid != 0 else child2()
        ...
    finally:
        parentN() if pid != 0 else childN()
msg175973 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-11-19 23:59
I would not allow exceptions to propagate. No caller is expecting them.
msg175974 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-11-20 00:13
pthread_atfork() cannot be used to implement this. Another non-python
thread started by a C extension module or the C application that is
embedding Python within it is always free to call fork() on its own with
zero knowledge that Python even exists at all. It's guaranteed that fork
will be called while the Python GIL is held in this situation which would
cause any pre-fork thing registered by Python to deadlock.

At best, this can be implemented manually as we do with some of the before
and after fork stuff today but it must come with the caveat warning that it
cannot guarantee that these things are actually called before and after
fork() other than direct os.fork() calls from Python code or extremely
Python aware C extension modules that may call fork() (very rare, most C &
C++ libraries an extension module may be using assume that they've got the
run of the house).  ie: this problem is unsolvable unless you control 100%
of the code being used by your entire user application.

On Mon, Nov 19, 2012 at 3:59 PM, Gregory P. Smith <report@bugs.python.org>wrote:

>
> Gregory P. Smith added the comment:
>
> I would not allow exceptions to propagate. No caller is expecting them.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue16500>
> _______________________________________
>
msg175975 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-11-20 00:52
Meh! Exception handling takes all the fun of the API and is going to make it MUCH more complicated. pthread_atfork() ignores error handling for a good reason. It's going to be hard to get it right. :/

IFF we are going to walk the hard and rocky road of exception handling, then we are going to need at least four hooks and a register function that takres four callables as arguments: register(prepare, error, parent, child). Each prepare() call pushes an error handling onto a stack. In case of an exception in a prepare handler, the error stack is popped until all error handlers are called. This approach allows a prepare handler to actually prevent a fork() call from succeeding.

The parent and child hooks are always called no matter what. Exception are recorded and a warning is emitted when at least one hook fails. We might raise an exception but it has to be a special exception that ships information if fork() has succeeded, if the code runs in child or parent and about the child's PID.

I fear it's going to be *really* hard to get everything right.

Gregory made a good point, too. We can rely on pthread_atfork() as we are unable to predict how third party code is using fork(): "Take cover, dead locks ahead!" :) A cooperative design of the C API with three function is my preferred way, too. PyOS_AfterForkParent() should take an argument to signal a failed fork() call.
msg175980 - (view) Author: Amaury Forgeot d'Arc (Amaury.Forgeot.d'Arc) * Date: 2012-11-20 09:11
2012/11/20 Christian Heimes <report@bugs.python.org>

> IFF we are going to walk the hard and rocky road of exception handling,
> then we are going to need at least four hooks and a register function that
> takres four callables as arguments: register(prepare, error, parent,
> child). Each prepare() call pushes an error handling onto a stack. In case
> of an exception in a prepare handler, the error stack is popped until all
> error handlers are called. This approach allows a prepare handler to
> actually prevent a fork() call from succeeding.
>

FWIW, PyPy already has a notion of fork hooks:
https://bitbucket.org/pypy/pypy/src/b4e4017909bac6c102fbc883ac8d2e42fa41553b/pypy/module/posix/interp_posix.py?at=default#cl-682

Various subsystems (threads cleanup, import lock, threading.local...)
register their hook functions.

You may want to experiment from there :-)
A new "atfork" module would be easy to implement.
msg175997 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-11-20 15:44
> IFF we are going to walk the hard and rocky road of exception handling,
> then we are going to need at least four hooks and a register function that
> takres four callables as arguments: register(prepare, error, parent,
> child). Each prepare() call pushes an error handling onto a stack. In case
> of an exception in a prepare handler, the error stack is popped until all
> error handlers are called. This approach allows a prepare handler to
> actually prevent a fork() call from succeeding.

I think there are two main options if a prepare callback fails:

1) The fork should not occur and the exception should be raised
2) The fork should occur and the exception should be only be printed

I favour option 1 since, if they want, users can always wrap their prepare callbacks with

  try:
    ...
  except:
    sys.excepthook(*sys.exc_info())

With option 1 I don't see why error callbacks are necessary.  Just unwind the stack of imaginary try...finally... clauses and let any exceptions propagate out using exception chaining if necessary.  This is what pure-python-atfork.patch does.  Note, however, that if the fork succeeds then any subsequent exception is only printed.
msg176002 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-11-20 16:20
Amaury:
PyPy doesn't handle exceptions in hooks. Is there a reason why PyPy goes for the simplistic approach?

Richard:
An error callback has the benefit that the API can notice the hooks that some error has occurred. We may not need it, though.

I can think of six exception scenarios that must be handled:

(1) exception in a prepare hook -> don't call the remaining prepare hooks, run all related parent hooks in FILO order, prevent fork() call
(2) exception in parent hook during the handling of (1) -> print exception, continue with next parent hook
(3) exception in fork() call -> run parent hooks in FILO order
(4) exception in parent hook during the handling of (3) -> print exception, continue with next parent hook
(5) exception in parent hook when fork() has succeeded -> print exception, continue with next parent hook
(6) exception in child hook when fork() has succeeded  -> print exception, continue with next child hook

Do you agree?
msg176004 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-11-20 16:29
> PyPy doesn't handle exceptions in hooks.
> Is there a reason why PyPy goes for the simplistic approach?

Probably because nobody thought about it.
At the moment, there is only one 'before', one 'parent' hook (so the FILO order is simple), and three 'child' hooks.
And if the _PyImport_ReleaseLock call fails, you'd better not ignore the error...
msg176019 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-11-20 19:33
I think you are solving a non-problem if you want to expose exceptions from
such hooks. Nobody needs it.
msg176020 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-20 19:49
> I think you are solving a non-problem if you want to expose exceptions from
> such hooks. Nobody needs it.

Agreed.
msg176022 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-11-20 20:04
Your suggestion is that the hooks are called as:

for hook in hooks:
    try:
        hook()
    except:
        try:
            sys.excepthook(*sys.exc_info())
        except:
            pass

That makes the implementation much easier. :)
msg179838 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-01-12 23:37
"The tempfile module has a specialized RNG that re-initialized the RNG after fork() by comparing os.getpid() to an instance variable every time the RNG is accessed. The check can be replaced with an afterfork callback."

By the way, OpenSSL expects that its PRNG is reseed somehow (call RNG_add) after a fork. I wrote a patch for OpenSSL, but I don't remember if I sent it to OpenSSL.
https://bitbucket.org/haypo/hasard/src/4a1be69a47eb1b2ec7ca95a341d4ca953a77f8c6/patches/openssl_rand_fork.patch?at=default

Reseeding tempfile PRNG is useless (but spend CPU/memory/hang until we have enough entropy?) if the tempfile is not used after fork. I like the current approach.

--

I'm not saying that a new atfork module would not help, just that the specific case of tempfile should be discussed :-) I like the idea of a generic module to call code after fork.
msg179888 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-01-13 19:21
Might make sense to put this in atexit.atfork() to avoid small-module inflation?
msg179927 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-01-14 09:13
> Might make sense to put this in atexit.atfork() to avoid small-module inflation?

It sounds strange to mix "at exit" and "at fork" in the same module.
Both are very different.

2013/1/13 Arfrever Frehtes Taifersar Arahesis <report@bugs.python.org>:
>
> Changes by Arfrever Frehtes Taifersar Arahesis <Arfrever.FTA@GMail.Com>:
>
>
> ----------
> nosy: +Arfrever
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue16500>
> _______________________________________
msg179945 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-01-14 15:49
On 13.01.2013 00:37, STINNER Victor wrote:
> By the way, OpenSSL expects that its PRNG is reseed somehow (call RNG_add) after a fork. I wrote a patch for OpenSSL, but I don't remember if I sent it to OpenSSL.
> https://bitbucket.org/haypo/hasard/src/4a1be69a47eb1b2ec7ca95a341d4ca953a77f8c6/patches/openssl_rand_fork.patch?at=default

Apparently not, and according to this thread, they don't think
this is an OpenSSL problem to solve:

http://openssl.6102.n7.nabble.com/recycled-pids-causes-PRNG-to-repeat-td41669.html

Note that you don't have to reseed the RNG just make sure that the
two forks use different sequences. Simply adding some extra data
in each process will suffice, e.g. by adding the PID of the new process
to the RNG pool. This is certainly doable without any major CPU
overhead :-)
msg179949 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-01-14 16:06
> It sounds strange to mix "at exit" and "at fork" in the same module.
> Both are very different.

That's true. The sys module would probably be the right place for both functionalities.
History
Date User Action Args
2013-01-14 16:06:08pitrousetmessages: + msg179949
2013-01-14 15:49:37lemburgsetnosy: + lemburg
messages: + msg179945
2013-01-14 09:13:28hayposetmessages: + msg179927
2013-01-13 21:43:00Arfreversetnosy: + Arfrever
2013-01-13 19:21:22georg.brandlsetnosy: + georg.brandl
messages: + msg179888
2013-01-12 23:37:35hayposetnosy: + haypo

messages: + msg179838
title: Add an 'afterfork' module -> Add an 'atfork' module
2012-11-27 05:56:18grahamdsetnosy: + grahamd
2012-11-24 00:35:44jceasetnosy: + jcea
2012-11-20 20:04:30christian.heimessetmessages: + msg176022
2012-11-20 19:49:23pitrousetnosy: + pitrou
messages: + msg176020
2012-11-20 19:33:38gregory.p.smithsetmessages: + msg176019
2012-11-20 16:29:10amaury.forgeotdarcsetmessages: + msg176004
2012-11-20 16:20:42christian.heimessetmessages: + msg176002
2012-11-20 15:44:34sbtsetmessages: + msg175997
2012-11-20 14:59:07amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, - Amaury.Forgeot.d'Arc
2012-11-20 14:35:50asvetlovsetnosy: + asvetlov
2012-11-20 09:11:52Amaury.Forgeot.d'Arcsetnosy: + Amaury.Forgeot.d'Arc
messages: + msg175980
2012-11-20 00:52:11christian.heimessetmessages: + msg175975
2012-11-20 00:13:31gregory.p.smithsetmessages: + msg175974
2012-11-19 23:59:25gregory.p.smithsetmessages: + msg175973
2012-11-19 22:43:50sbtsetfiles: + pure-python-atfork.patch
keywords: + patch
messages: + msg175972
2012-11-19 20:55:20christian.heimessetnosy: + twouters, gregory.p.smith
messages: + msg175967
2012-11-18 17:34:43sbtsetmessages: + msg175892
2012-11-18 17:08:15pitrousetnosy: + sbt
2012-11-18 15:20:05christian.heimescreate