classification
Title: garbage collection just after multiprocessing's fork causes exceptions
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: nadeem.vawda, neologix, pitrou, python-dev, sbt
Priority: normal Keywords: patch

Created on 2012-04-11 16:32 by sbt, last changed 2014-01-23 00:13 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
mp_disable_gc.patch sbt, 2012-04-11 17:50
mp_finalize_pid.patch sbt, 2012-04-12 18:22 review
Messages (14)
msg158049 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-11 16:32
When running test_multiprocessing on Linux I occasionally see a stream of errors caused by ignored weakref callbacks:

  Exception AssertionError: AssertionError() in <Finalize object, dead> ignored

These do not cause the unittests to fail.

Finalizers from the parent process are supposed to be cleared after the fork.  But if a garbage collection before that then Finalizer callbacks can be run in the "wrong" process.

Disabling gc during fork seems to prevent the errors.  Or maybe the Finalizer should record the pid of the process which created it and only invoke the callback if it matches the current pid.

(Compare Issure 1336 conscerning subprocess.)
msg158052 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-11 16:40
> Disabling gc during fork seems to prevent the errors.

Sounds reasonable to me.
msg158064 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-11 17:50
Patch to disable gc.
msg158071 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-11 20:27
Shouldn't there be a try..finally, in case os.fork() fails?
msg158080 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-04-11 21:40
Hmm...
I don't really like disabling GC, because it has a process-wide side
effect, and hence isn't thread-safe: if another thread forks() or
creates a subprocess right at the wrong time, it could end up with the
GC disabled for good...
msg158081 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-11 21:46
> Hmm...
> I don't really like disabling GC, because it has a process-wide side
> effect, and hence isn't thread-safe: if another thread forks() or
> creates a subprocess right at the wrong time, it could end up with the
> GC disabled for good...

That's a problem indeed. Perhaps we need a global "fork lock" shared
between subprocess and multiprocessing?
msg158087 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-11 23:07
> That's a problem indeed. Perhaps we need a global "fork lock" shared
> between subprocess and multiprocessing?

I did an atfork patch which included a (recursive) fork lock.  See

    http://bugs.python.org/review/6721/show

The patch included changes to multiprocessing and subprocess.  (Being able to acquire the lock when doing fd manipulation is quite useful.  For instance, the creation of Process.sentinel currently has a race which can mean than another process inherits the write end of the pipe.  That would cause Process.join() to wait till both processes terminate.)

Actually, for Finalizers I think it would be easier to just record and check the pid.
msg158113 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-04-12 08:01
>> That's a problem indeed. Perhaps we need a global "fork lock" shared
>> between subprocess and multiprocessing?
>
> I did an atfork patch which included a (recursive) fork lock.  See
>
>    http://bugs.python.org/review/6721/show
>
> The patch included changes to multiprocessing and subprocess.  (Being able to acquire the lock when doing fd manipulation is quite useful.  For instance, the creation of Process.sentinel currently has a race which can mean than another process inherits the write end of the pipe.  That would cause Process.join() to wait till both processes terminate.)

Indeed, I had a look and it looked good.
I just had a couple minor comments, I'll try to get back to this later
today, or by the end of the week.

> Actually, for Finalizers I think it would be easier to just record and check the pid.

I'd prefer this too.
msg158159 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-12 18:22
Alternative patch which records pid when Finalize object is created.  The callback does nothing if recorded pid does not match os.getpid().
msg158164 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-12 19:15
But what if Finalize is used to cleanup a resource that gets duplicated in children, like a file descriptor?
See e.g. forking.py, line 137 (in Popen.__init__())
or heap.py, line 244 (BufferWrapper.__init__()).
msg158180 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-12 22:30
> But what if Finalize is used to cleanup a resource that gets 
> duplicated in children, like a file descriptor?
> See e.g. forking.py, line 137 (in Popen.__init__())
> or heap.py, line 244 (BufferWrapper.__init__()).

This was how Finalize objects already acted (or were supposed to).

In the case of BufferWrapper this is intended.  BufferWrapper objects do not have reference counting semantics.  Instead the memory is deallocated when the object is garbage collected in the process that created it.  (Garbage collection in a child process should *not* invalidate memory owned by the parent process.)  You can prevent the parent process from garbage collecting the object too early by following the advice below from the documentation:

  Explicitly pass resources to child processes

    On Unix a child process can make use of a shared resource created in a 
    parent process using a global resource. However, it is better to pass 
    the object as an argument to the constructor for the child process.

    Apart from making the code (potentially) compatible with Windows this 
    also ensures that as long as the child process is still alive the object 
    will not be garbage collected in the parent process. This might be important 
    if some resource is freed when the object is garbage collected in the parent process.

In the case of the sentinel in Popen.__init__(), it is harmless if this end of the pipe gets accidentally inherited by another process.  Since Process does not have a closefds argument like subprocess.Popen unintended leaking happens all the time.  And even without the pid check, I think this finalizer would very rarely be triggered in a child process.  (A Process object can only be garbage collected after it has been joined, and it can only be joined by it parent process.)
msg158567 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-04-17 20:17
> Alternative patch which records pid when Finalize object is created.

Looks good to me.
Note that it could eventually be rewritten to use an atfork() module, when we finally merge your patch for this other issue :-)
msg161580 - (view) Author: Roundup Robot (python-dev) Date: 2012-05-25 12:58
New changeset 59567c117b0e by Richard Oudkerk in branch 'default':
Issue #14548: Make multiprocessing finalizers check pid before running
http://hg.python.org/cpython/rev/59567c117b0e
msg208868 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-23 00:13
New changeset 751371dd4d1c by Richard Oudkerk in branch '2.7':
Issue #14548: Make multiprocessing finalizers check pid before
http://hg.python.org/cpython/rev/751371dd4d1c
History
Date User Action Args
2014-01-23 00:13:24python-devsetmessages: + msg208868
2012-05-25 19:41:08sbtsetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2012-05-25 12:58:21python-devsetnosy: + python-dev
messages: + msg161580
2012-04-17 20:17:44neologixsetmessages: + msg158567
stage: commit review
2012-04-12 22:30:10sbtsetmessages: + msg158180
2012-04-12 19:15:50pitrousetmessages: + msg158164
2012-04-12 18:22:36sbtsetfiles: + mp_finalize_pid.patch

messages: + msg158159
2012-04-12 08:01:00neologixsetmessages: + msg158113
2012-04-11 23:07:35sbtsetmessages: + msg158087
2012-04-11 21:46:47pitrousetmessages: + msg158081
2012-04-11 21:40:35neologixsetmessages: + msg158080
2012-04-11 20:27:40pitrousetmessages: + msg158071
2012-04-11 17:50:06sbtsetfiles: + mp_disable_gc.patch
keywords: + patch
messages: + msg158064
2012-04-11 17:07:17nadeem.vawdasetnosy: + nadeem.vawda
2012-04-11 16:40:48pitrousetversions: + Python 2.7, Python 3.2, Python 3.3
nosy: + neologix, pitrou

messages: + msg158052

components: + Library (Lib)
type: behavior
2012-04-11 16:32:31sbtcreate