classification
Title: race in SocketServer.ForkingMixIn.collect_children
Type: crash Stage:
Components: Library (Lib) Versions: Python 2.4, Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: dripton, jyasskin, schmir
Priority: normal Keywords: patch

Created on 2007-09-20 19:35 by dripton, last changed 2008-02-29 05:26 by jyasskin. This issue is now closed.

Messages (6)
msg56065 - (view) Author: David Ripton (dripton) Date: 2007-09-20 19:35
CentOS Linux 5, Python 2.4.3  (but code appears unchanged in 2.5 and
trunk, so I don't believe this bug has already been fixed)

We have an xmlrpc server that subclasses DocXMLRPCServer.DocXMLRPCServer
and SocketServer.ForkingMixIn.  Under load, it sometimes crashes with an
error in SocketServer.ForkingMixIn.collect_children

The bug is that collect_children calls os.waitpid with pid 0, so it
waits for any child.  But then it assumes that the pid found was in the
list self.active_children, and attempts to remove it from that list
without a try block.  However, another call to collect_children could
have already removed it, so we get "ValueError: list.remove(x): x not in
list"

The fix is just adding a try/except block around the attempt to remove
pid from self.active children.

diff -u SocketServer.py /tmp/SocketServer.py
--- SocketServer.py     2007-08-27 10:52:24.000000000 -0400
+++ /tmp/SocketServer.py        2007-09-20 15:34:00.000000000 -0400
@@ -421,7 +421,10 @@
             except os.error:
                 pid = None
             if not pid: break
-            self.active_children.remove(pid)
+            try:
+                self.active_children.remove(pid)
+            except ValueError:
+                pass

     def process_request(self, request, client_address):
         """Fork a new subprocess to process the request."""
msg56892 - (view) Author: Ralf Schmitt (schmir) Date: 2007-10-28 19:22
I've had the exact same error - but only when I used a subclass of
XMLRPCServer, which installed signal handlers for SIGCHLD (which then
called collect_children). Does your code install such a signal handler?
(I found mine somewhere on the web).
Do you start any subprocesses?
msg56903 - (view) Author: David Ripton (dripton) Date: 2007-10-29 12:59
No signal handler.  Yes, we run subprocesses.

I don't believe either is necessary to trigger the race condition, though.
msg62042 - (view) Author: David Ripton (dripton) Date: 2008-02-04 15:20
Just noticed that this is a partial duplicate of issue 1540386.
msg63100 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008-02-28 18:23
Hmm. I think the race can only happen if you call collect_children()
concurrently from multiple threads or from a signal handler. The
waidpid(0) bug (which affected anyone who spawned subprocesses from
anything other than ForkingMixIn) is partly fixed by r61106, but I don't
intend to make ForkingMixIn thread- or signal-safe. Let me know if this
is enough for you. :)
msg63124 - (view) Author: David Ripton (dripton) Date: 2008-02-29 03:24
The "if pid not in self.active_children: continue" check that was added
in r61106 appears to fix the bug, so I'm happy.  Thanks.
History
Date User Action Args
2008-02-29 05:26:30jyasskinsetstatus: pending -> closed
2008-02-29 03:24:28driptonsetmessages: + msg63124
2008-02-28 18:23:08jyasskinsetstatus: open -> pending
resolution: fixed
messages: + msg63100
nosy: + jyasskin
versions: + Python 2.6
2008-02-04 15:20:42driptonsetmessages: + msg62042
versions: + Python 2.5
2007-10-29 12:59:31driptonsetmessages: + msg56903
2007-10-28 19:22:15schmirsetnosy: + schmir
messages: + msg56892
2007-09-21 02:16:16jafosetpriority: normal
keywords: + patch
2007-09-20 19:35:42driptoncreate