classification
Title: subprocess.Popen hangs when child writes to stderr
Type: crash Stage:
Components: Library (Lib) Versions: Python 2.6, Python 2.5, Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: gagenellina, gregory.p.smith, gvanrossum, jba, loewis, therve (6)
Priority: Keywords: patch

Created on 2007-10-26 14:04 by jba, last changed 2008-01-22 03:04 by gvanrossum.

Files
File name Uploaded Description Edit Remove
subprocess_nogc.py jba, 2007-12-05 17:34
diff-against-subprocess-2.4.1.txt gregory.p.smith, 2008-01-17 00:22 (using python 2.4.1 sources) diff --unified=4 subprocess.py subprocess_nogc.py
diff-subprocess-trunk-1336-01.txt gregory.p.smith, 2008-01-17 00:51 updated patch against trunk
Messages (15)
msg56790 - (view) Author: Jonathan Amsterdam (jba) Date: 2007-10-26 14:04
This is under Linux (2.6).

I occasionally see subprocess.Popen() fail to return, and I have
finally figured out roughly what's going on. It involves the GC and
stderr.

1. os.fork()

2. Parent blocks reading from errpipe_read (subprocess.py:982)

3. In child, a GC occurs before the exec.

4. The GC attempts to free a file descriptor, calling file_dealloc.

5. That function gets an error closing the descriptor ("close failed:
[Errno 9] Bad file descriptor\n," is the string I extracted via gdb).

6. It attempts to write the error to stderr and blocks. Since it never
execs or writes to errpipe_write, both child and parent are hung.

Here is the pstack output on the child:

#0  0x006587a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x0072f11b in __write_nocancel () from /lib/tls/libc.so.6
#2  0x006d409f in _IO_new_file_write () from /lib/tls/libc.so.6
#3  0x006d42ec in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#4  0x006afce9 in buffered_vfprintf () from /lib/tls/libc.so.6
#5  0x006afe8b in vfprintf () from /lib/tls/libc.so.6
#6  0x080dd4af in mywrite ()
#7  0x080dd505 in PySys_WriteStderr ()
#8  0x08064cd0 in file_dealloc ()
#9  0x08079c88 in dict_dealloc ()
#10 0x0808931d in subtype_dealloc ()
#11 0x08079af3 in PyDict_Clear ()
#12 0x0807bb6a in dict_tp_clear ()
#13 0x080e0ead in collect ()
#14 0x080e1912 in _PyObject_GC_New ()
#15 0x0805e50b in PyInstance_NewRaw ()
#16 0x0805e5d7 in PyInstance_New ()
#17 0x0805bdc0 in PyObject_Call ()
#18 0x080aecef in PyEval_CallObjectWithKeywords ()
#19 0x080ca975 in PyErr_NormalizeException ()
#20 0x080b492c in PyEval_EvalFrame ()
#21 0x080b5eb2 in PyEval_EvalCodeEx ()
#22 0x080b3c83 in PyEval_EvalFrame ()
#23 0x080b5734 in PyEval_EvalFrame ()
#24 0x080b5eb2 in PyEval_EvalCodeEx ()
#25 0x080fce92 in function_call ()
#26 0x0805bdc0 in PyObject_Call ()
#27 0x080641e5 in instancemethod_call ()
#28 0x0805bdc0 in PyObject_Call ()
#29 0x0808ffce in slot_tp_init ()
#30 0x08088b3a in type_call ()
#31 0x0805bdc0 in PyObject_Call ()
#32 0x080b0f05 in PyEval_EvalFrame ()
#33 0x080b5eb2 in PyEval_EvalCodeEx ()
#34 0x080fce92 in function_call ()
#35 0x0805bdc0 in PyObject_Call ()
#36 0x080641e5 in instancemethod_call ()
#37 0x0805bdc0 in PyObject_Call ()
#38 0x0808ffce in slot_tp_init ()
#39 0x08088b3a in type_call ()
#40 0x0805bdc0 in PyObject_Call ()
#41 0x080b0f05 in PyEval_EvalFrame ()
#42 0x080b5734 in PyEval_EvalFrame ()
#43 0x080b5eb2 in PyEval_EvalCodeEx ()
#44 0x080fce92 in function_call ()
#45 0x0805bdc0 in PyObject_Call ()
#46 0x080641e5 in instancemethod_call ()
#47 0x0805bdc0 in PyObject_Call ()
#48 0x0808ffce in slot_tp_init ()
#49 0x08088b3a in type_call ()
#50 0x0805bdc0 in PyObject_Call ()
#51 0x080b0f05 in PyEval_EvalFrame ()
#52 0x080b5eb2 in PyEval_EvalCodeEx ()
#53 0x080fce92 in function_call ()
#54 0x0805bdc0 in PyObject_Call ()
#55 0x080b075f in PyEval_EvalFrame ()
#56 0x080b5734 in PyEval_EvalFrame ()
#57 0x080b5734 in PyEval_EvalFrame ()
#58 0x080b5734 in PyEval_EvalFrame ()
#59 0x080b5eb2 in PyEval_EvalCodeEx ()
#60 0x080b3c83 in PyEval_EvalFrame ()
#61 0x080b5734 in PyEval_EvalFrame ()
#62 0x080b5734 in PyEval_EvalFrame ()
#63 0x080b5eb2 in PyEval_EvalCodeEx ()
#64 0x080b3c83 in PyEval_EvalFrame ()
#65 0x080b5eb2 in PyEval_EvalCodeEx ()
#66 0x080b3c83 in PyEval_EvalFrame ()
#67 0x080b5eb2 in PyEval_EvalCodeEx ()
#68 0x080b3c83 in PyEval_EvalFrame ()
#69 0x080b5734 in PyEval_EvalFrame ()
#70 0x080b5734 in PyEval_EvalFrame ()
#71 0x080b5734 in PyEval_EvalFrame ()
#72 0x080b5734 in PyEval_EvalFrame ()
#73 0x080b5734 in PyEval_EvalFrame ()
#74 0x080b5eb2 in PyEval_EvalCodeEx ()
#75 0x080fce92 in function_call ()
#76 0x0805bdc0 in PyObject_Call ()
#77 0x080b075f in PyEval_EvalFrame ()
#78 0x080b5eb2 in PyEval_EvalCodeEx ()
#79 0x080b3c83 in PyEval_EvalFrame ()
#80 0x080b5eb2 in PyEval_EvalCodeEx ()
#81 0x080b3c83 in PyEval_EvalFrame ()
#82 0x080b5eb2 in PyEval_EvalCodeEx ()
#83 0x080b3c83 in PyEval_EvalFrame ()
#84 0x080b5734 in PyEval_EvalFrame ()
#85 0x080b5734 in PyEval_EvalFrame ()
#86 0x080b5eb2 in PyEval_EvalCodeEx ()
#87 0x080b601a in PyEval_EvalCode ()
#88 0x080d9ff4 in PyRun_FileExFlags ()
#89 0x080da8d6 in PyRun_SimpleFileExFlags ()
#90 0x08055617 in Py_Main ()
#91 0x08054e3f in main ()
msg56805 - (view) Author: Guido van Rossum (gvanrossum) Date: 2007-10-26 18:00
I don't think we can prevent GC from occurring between fork and exec --
it's legal to just call os.fork() and execute Python code in the
subprocess forever.  I think the right solution might be to ignore
errors in file_close().  Can you try to whip up a patch for that and
test it?
msg56966 - (view) Author: Jonathan Amsterdam (jba) Date: 2007-10-30 17:52
It's an honor to hear from you, BDFL.

I'm just a Python user, not a developer, so the time it would take me
to check out, compile, test, etc. would dwarf the change itself
(removing two lines from file_dealloc).

There is another solution, following Java: file_dealloc could raise an
exception. Then the error would appear in normal contexts, but
(assuming the GC ignores exceptions in finalization) would be ignored
in GC.

There is also a simpler, less invasive change you could consider:
disabling GC in the child inside subprocess.Popen.

On 10/26/07, Guido van Rossum <report@bugs.python.org> wrote:
>
> Guido van Rossum added the comment:
>
> I don't think we can prevent GC from occurring between fork and exec --
> it's legal to just call os.fork() and execute Python code in the
> subprocess forever.  I think the right solution might be to ignore
> errors in file_close().  Can you try to whip up a patch for that and
> test it?
>
> ----------
> nosy: +gvanrossum
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1336>
> __________________________________
>
msg56976 - (view) Author: Guido van Rossum (gvanrossum) Date: 2007-10-30 18:43
In Python, you can't raise an exception from file_dealloc. (Note how
it is a void function -- there's no way to return an error status.)

Disabling gc however makes sense. I'll think about that. Since you
seem to have the only reproducible occurrence of this bug, try this:
right after the "# Child" comment add a call to gc.disable(). You
should import gc at the top of the module with the rest of the
imports.
msg58222 - (view) Author: Jonathan Amsterdam (jba) Date: 2007-12-05 17:34
Sorry for the long delay -- I had to put the patch into production to
get the volume of work needed to reveal the problem. There has been no
occurrence for about two weeks now, so I am declaring victory.
(Previously it would happen about once a day.) Attached is
subprocess_nogc.py, which is subprocess.py with exactly the changes
you suggested.

On Oct 30, 2007 1:43 PM, Guido van Rossum <report@bugs.python.org> wrote:
>
> Guido van Rossum added the comment:
>
> In Python, you can't raise an exception from file_dealloc. (Note how
> it is a void function -- there's no way to return an error status.)
>
> Disabling gc however makes sense. I'll think about that. Since you
> seem to have the only reproducible occurrence of this bug, try this:
> right after the "# Child" comment add a call to gc.disable(). You
> should import gc at the top of the module with the rest of the
> imports.
>
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1336>
> __________________________________
>
msg58223 - (view) Author: Guido van Rossum (gvanrossum) Date: 2007-12-05 17:39
I'm sorry to ask you to do more work, but could you do me a favor and
send this in the form of a svn diff?  That file has evolved quite a bit
and it's unclear what version you used as a baseline.
msg58239 - (view) Author: Thomas Herve (therve) Date: 2007-12-06 10:42
FWIW, we encountered roughly the same problem in Twisted. However, we
fixed it by disabling gc *before* fork, because the gc can occur just
after the fork. See http://twistedmatrix.com/trac/ticket/2483 for the
bug report, and http://twistedmatrix.com/trac/changeset/21686 for the
solution.
msg60010 - (view) Author: Gregory P. Smith (gregory.p.smith) Date: 2008-01-17 00:22
The uploaded file appears to be derives from Python 2.4.1 (tags/r241).

i've uploaded an actual diff between the two.

regardless, Thomas's point about disabling it -before- the fork() is true.
msg60011 - (view) Author: Gregory P. Smith (gregory.p.smith) Date: 2008-01-17 00:51
I've just attached a diff against trunk that I believe fixes this (based
off of jba's patch).  Unit tests pass but that doesn't say a whole lot
given that this is a hard to reproduce timing/scheduling related bug.

Would someone please code review it?
msg60234 - (view) Author: Gregory P. Smith (gregory.p.smith) Date: 2008-01-19 21:05
fixed in trunk r60104.  i'm backporting it to 2.5 and 2.4.
msg60238 - (view) Author: Gregory P. Smith (gregory.p.smith) Date: 2008-01-19 22:25
r60111 undid part of r60104 that added an unnecessary else and
indentation.  svn diff -r60103:60111 of Lib/subprocess.py is a nice
clean patch.
msg60239 - (view) Author: Gregory P. Smith (gregory.p.smith) Date: 2008-01-19 22:35
r60113 in release25-maint, r60114 in release24-maint.
msg61477 - (view) Author: Guido van Rossum (gvanrossum) Date: 2008-01-22 02:00
Was this a security fix? We aren't supposed to backport non-security
fixes to 2.4, and Martin is actually planning to *reverse* the
non-security patches applied to 2.4 since 2.4.4 was released, to make it
easier to release a "pure security" source tree.
msg61480 - (view) Author: Gregory P. Smith (gregory.p.smith) Date: 2008-01-22 02:22
oh, whoops.  not a security issue.  should i undo it or let Martin?
msg61481 - (view) Author: Guido van Rossum (gvanrossum) Date: 2008-01-22 03:04
Martin can do it all in one fell sweep.
History
Date User Action Args
2008-01-22 03:04:22gvanrossumsetnosy: + loewis
messages: + msg61481
2008-01-22 02:22:44gregory.p.smithsetmessages: + msg61480
2008-01-22 02:00:08gvanrossumsetmessages: + msg61477
2008-01-19 22:35:26gregory.p.smithsetstatus: open -> closed
messages: + msg60239
2008-01-19 22:25:15gregory.p.smithsetmessages: + msg60238
2008-01-19 21:05:12gregory.p.smithsetresolution: fixed
messages: + msg60234
versions: + Python 2.6, Python 2.5
2008-01-17 00:51:01gregory.p.smithsetfiles: + diff-subprocess-trunk-1336-01.txt
assignee: gregory.p.smith
messages: + msg60011
2008-01-17 00:22:50gregory.p.smithsetfiles: + diff-against-subprocess-2.4.1.txt
messages: + msg60010
2008-01-16 23:43:54gregory.p.smithsetnosy: + gregory.p.smith
2007-12-06 10:42:22thervesetnosy: + therve
messages: + msg58239
2007-12-05 17:39:34gvanrossumsetkeywords: + patch
messages: + msg58223
2007-12-05 17:34:20jbasetfiles: + subprocess_nogc.py
messages: + msg58222
2007-10-30 18:43:34gvanrossumsetmessages: + msg56976
2007-10-30 17:52:43jbasetmessages: + msg56966
2007-10-27 23:53:49gagenellinasetnosy: + gagenellina
2007-10-26 18:00:47gvanrossumsetnosy: + gvanrossum
messages: + msg56805
2007-10-26 14:04:12jbacreate