classification
Title: subprocess module has race condition with SIGCHLD handlers
Type: Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: joseph.h.garvin, loewis, rosslagerwall
Priority: normal Keywords:

Created on 2010-12-06 00:19 by joseph.h.garvin, last changed 2011-06-23 16:50 by rosslagerwall. This issue is now closed.

Messages (4)
msg123445 - (view) Author: (joseph.h.garvin) Date: 2010-12-06 00:19
The following code will result in a traceback 99% of the time, though it may take two runs (sometimes the first run won't trigger it, I think due to the changing in timing from genrating the .pyc file). It spawns an instance of /bin/echo, chosen because it's a very quick to finish program. Any program that executes faster than subprocess.Popen can return will work as a substitute though:

import signal
import subprocess

to_launch = None

def sig_chld_handler(signum, frame):
    global to_launch
    # Crashes here.
    # 'NoneType' object has no attribute 'poll'
    to_launch.poll()

    print to_launch.returncode

signal.signal(signal.SIGCHLD, sig_chld_handler)

to_launch = subprocess.Popen("/bin/echo")


And the traceback:


Traceback (most recent call last):
  File "/tmp/sigchld.py", line 15, in <module>
    to_launch = subprocess.Popen("/bin/echo")
  File "/usr/lib/python2.6/subprocess.py", line 623, in __init__
    errread, errwrite)
  File "/usr/lib/python2.6/subprocess.py", line 1130, in _execute_child
    data = _eintr_retry_call(os.read, errpipe_read, 1048576)
  File "/usr/lib/python2.6/subprocess.py", line 455, in _eintr_retry_call
    return func(*args)
  File "/tmp/sigchld.py", line 9, in sig_chld_handler
    to_launch.poll()
AttributeError: 'NoneType' object has no attribute 'poll'


I believe the problem is that the process completes before Popen can return, which means the assignment of to_launch hasn't happened yet, so it's not defined when we get into sig_chld_handler.

I tried to work around this issue by setting preexec_fn to signal.pause and sending the child process a signal after the assignment, but then ran into another bug: http://bugs.python.org/issue10635

If when it caught SIGCHLD python pushed an event onto its internal event loop to execute the handler, I think that would make sure it's deferred until after the assignment. There might be other consequences of that, but I'm not familiar with the interpreter internals. Alternatively it could be fixed with an API change -- let Popen return an object before it actually launches the process, and have a separate start() method.
msg123447 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-06 00:33
> If when it caught SIGCHLD python pushed an event onto its internal
> event loop to execute the handler, I think that would make sure it's
> deferred until after the assignment.

This is not a reasonable request. How long would you want to postpone
this? Suppose somebody writes

def foo():
    local_to_launch = subprocess.Popen("/bin/echo")
    return local_to_launch

local_to_launch = foo()

then deferring the signal until after the assignment would not help,
so the semantics of your proposed change are fuzzy.

> There might be other
> consequences of that, but I'm not familiar with the interpreter
> internals. Alternatively it could be fixed with an API change -- let
> Popen return an object before it actually launches the process, and
> have a separate start() method.

The right approach is to use sigblock/sigsetmask before creating the
process, and then again after creating it. Unfortunately, these aren't
exposed from the signal module.
msg123448 - (view) Author: (joseph.h.garvin) Date: 2010-12-06 01:16
Sorry I wasn't trying to make a request, just suggesting one potential 'fix' (I agree that it isn't really though) to make things more intutive.

Unless the app is delayed from launching until after the assignment finishes though I think a workaround is required. When it's deferred until after the assignment you at least have the ability to get it stored into a global. When it's not deferred I think you're required to something like disable SIGCHLD like you say, spawn the child, enable, and then manually poll each of the subprocess popen objects you've opened before to see if they died while the signal was down.

I realize now this isn't really a bug. Principle of least surprise is probably too much to hope for with signals anyway. Feel free to close unless you want to leave it open as a bug to get sigblock/sigsetmask in.
msg138870 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-06-23 16:50
> The right approach is to use sigblock/sigsetmask before creating the
> process, and then again after creating it. Unfortunately, these aren't
> exposed from the signal module.

Since 3.3, pthread_sigmask is exposed so the right approach can now be taken ;-)

Closing as invalid.
History
Date User Action Args
2011-06-23 16:50:43rosslagerwallsetstatus: open -> closed

nosy: + rosslagerwall
messages: + msg138870

resolution: not a bug
2010-12-06 01:16:06joseph.h.garvinsetmessages: + msg123448
2010-12-06 00:33:54loewissetnosy: + loewis
title: subprocess module has race condition with SIGCHLD handlers -> subprocess module has race condition with SIGCHLD handlers
messages: + msg123447
2010-12-06 00:19:55joseph.h.garvincreate