Issue1336
Created on 2007-10-26 14:04 by jba, last changed 2008-01-22 03:04 by gvanrossum.
|
msg56790 - (view) |
Author: Jonathan Amsterdam (jba) |
Date: 2007-10-26 14:04 |
|
This is under Linux (2.6).
I occasionally see subprocess.Popen() fail to return, and I have
finally figured out roughly what's going on. It involves the GC and
stderr.
1. os.fork()
2. Parent blocks reading from errpipe_read (subprocess.py:982)
3. In child, a GC occurs before the exec.
4. The GC attempts to free a file descriptor, calling file_dealloc.
5. That function gets an error closing the descriptor ("close failed:
[Errno 9] Bad file descriptor\n," is the string I extracted via gdb).
6. It attempts to write the error to stderr and blocks. Since it never
execs or writes to errpipe_write, both child and parent are hung.
Here is the pstack output on the child:
#0 0x006587a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x0072f11b in __write_nocancel () from /lib/tls/libc.so.6
#2 0x006d409f in _IO_new_file_write () from /lib/tls/libc.so.6
#3 0x006d42ec in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#4 0x006afce9 in buffered_vfprintf () from /lib/tls/libc.so.6
#5 0x006afe8b in vfprintf () from /lib/tls/libc.so.6
#6 0x080dd4af in mywrite ()
#7 0x080dd505 in PySys_WriteStderr ()
#8 0x08064cd0 in file_dealloc ()
#9 0x08079c88 in dict_dealloc ()
#10 0x0808931d in subtype_dealloc ()
#11 0x08079af3 in PyDict_Clear ()
#12 0x0807bb6a in dict_tp_clear ()
#13 0x080e0ead in collect ()
#14 0x080e1912 in _PyObject_GC_New ()
#15 0x0805e50b in PyInstance_NewRaw ()
#16 0x0805e5d7 in PyInstance_New ()
#17 0x0805bdc0 in PyObject_Call ()
#18 0x080aecef in PyEval_CallObjectWithKeywords ()
#19 0x080ca975 in PyErr_NormalizeException ()
#20 0x080b492c in PyEval_EvalFrame ()
#21 0x080b5eb2 in PyEval_EvalCodeEx ()
#22 0x080b3c83 in PyEval_EvalFrame ()
#23 0x080b5734 in PyEval_EvalFrame ()
#24 0x080b5eb2 in PyEval_EvalCodeEx ()
#25 0x080fce92 in function_call ()
#26 0x0805bdc0 in PyObject_Call ()
#27 0x080641e5 in instancemethod_call ()
#28 0x0805bdc0 in PyObject_Call ()
#29 0x0808ffce in slot_tp_init ()
#30 0x08088b3a in type_call ()
#31 0x0805bdc0 in PyObject_Call ()
#32 0x080b0f05 in PyEval_EvalFrame ()
#33 0x080b5eb2 in PyEval_EvalCodeEx ()
#34 0x080fce92 in function_call ()
#35 0x0805bdc0 in PyObject_Call ()
#36 0x080641e5 in instancemethod_call ()
#37 0x0805bdc0 in PyObject_Call ()
#38 0x0808ffce in slot_tp_init ()
#39 0x08088b3a in type_call ()
#40 0x0805bdc0 in PyObject_Call ()
#41 0x080b0f05 in PyEval_EvalFrame ()
#42 0x080b5734 in PyEval_EvalFrame ()
#43 0x080b5eb2 in PyEval_EvalCodeEx ()
#44 0x080fce92 in function_call ()
#45 0x0805bdc0 in PyObject_Call ()
#46 0x080641e5 in instancemethod_call ()
#47 0x0805bdc0 in PyObject_Call ()
#48 0x0808ffce in slot_tp_init ()
#49 0x08088b3a in type_call ()
#50 0x0805bdc0 in PyObject_Call ()
#51 0x080b0f05 in PyEval_EvalFrame ()
#52 0x080b5eb2 in PyEval_EvalCodeEx ()
#53 0x080fce92 in function_call ()
#54 0x0805bdc0 in PyObject_Call ()
#55 0x080b075f in PyEval_EvalFrame ()
#56 0x080b5734 in PyEval_EvalFrame ()
#57 0x080b5734 in PyEval_EvalFrame ()
#58 0x080b5734 in PyEval_EvalFrame ()
#59 0x080b5eb2 in PyEval_EvalCodeEx ()
#60 0x080b3c83 in PyEval_EvalFrame ()
#61 0x080b5734 in PyEval_EvalFrame ()
#62 0x080b5734 in PyEval_EvalFrame ()
#63 0x080b5eb2 in PyEval_EvalCodeEx ()
#64 0x080b3c83 in PyEval_EvalFrame ()
#65 0x080b5eb2 in PyEval_EvalCodeEx ()
#66 0x080b3c83 in PyEval_EvalFrame ()
#67 0x080b5eb2 in PyEval_EvalCodeEx ()
#68 0x080b3c83 in PyEval_EvalFrame ()
#69 0x080b5734 in PyEval_EvalFrame ()
#70 0x080b5734 in PyEval_EvalFrame ()
#71 0x080b5734 in PyEval_EvalFrame ()
#72 0x080b5734 in PyEval_EvalFrame ()
#73 0x080b5734 in PyEval_EvalFrame ()
#74 0x080b5eb2 in PyEval_EvalCodeEx ()
#75 0x080fce92 in function_call ()
#76 0x0805bdc0 in PyObject_Call ()
#77 0x080b075f in PyEval_EvalFrame ()
#78 0x080b5eb2 in PyEval_EvalCodeEx ()
#79 0x080b3c83 in PyEval_EvalFrame ()
#80 0x080b5eb2 in PyEval_EvalCodeEx ()
#81 0x080b3c83 in PyEval_EvalFrame ()
#82 0x080b5eb2 in PyEval_EvalCodeEx ()
#83 0x080b3c83 in PyEval_EvalFrame ()
#84 0x080b5734 in PyEval_EvalFrame ()
#85 0x080b5734 in PyEval_EvalFrame ()
#86 0x080b5eb2 in PyEval_EvalCodeEx ()
#87 0x080b601a in PyEval_EvalCode ()
#88 0x080d9ff4 in PyRun_FileExFlags ()
#89 0x080da8d6 in PyRun_SimpleFileExFlags ()
#90 0x08055617 in Py_Main ()
#91 0x08054e3f in main ()
|
|
msg56805 - (view) |
Author: Guido van Rossum (gvanrossum) |
Date: 2007-10-26 18:00 |
|
I don't think we can prevent GC from occurring between fork and exec --
it's legal to just call os.fork() and execute Python code in the
subprocess forever. I think the right solution might be to ignore
errors in file_close(). Can you try to whip up a patch for that and
test it?
|
|
msg56966 - (view) |
Author: Jonathan Amsterdam (jba) |
Date: 2007-10-30 17:52 |
|
It's an honor to hear from you, BDFL.
I'm just a Python user, not a developer, so the time it would take me
to check out, compile, test, etc. would dwarf the change itself
(removing two lines from file_dealloc).
There is another solution, following Java: file_dealloc could raise an
exception. Then the error would appear in normal contexts, but
(assuming the GC ignores exceptions in finalization) would be ignored
in GC.
There is also a simpler, less invasive change you could consider:
disabling GC in the child inside subprocess.Popen.
On 10/26/07, Guido van Rossum <report@bugs.python.org> wrote:
>
> Guido van Rossum added the comment:
>
> I don't think we can prevent GC from occurring between fork and exec --
> it's legal to just call os.fork() and execute Python code in the
> subprocess forever. I think the right solution might be to ignore
> errors in file_close(). Can you try to whip up a patch for that and
> test it?
>
> ----------
> nosy: +gvanrossum
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1336>
> __________________________________
>
|
|
msg56976 - (view) |
Author: Guido van Rossum (gvanrossum) |
Date: 2007-10-30 18:43 |
|
In Python, you can't raise an exception from file_dealloc. (Note how
it is a void function -- there's no way to return an error status.)
Disabling gc however makes sense. I'll think about that. Since you
seem to have the only reproducible occurrence of this bug, try this:
right after the "# Child" comment add a call to gc.disable(). You
should import gc at the top of the module with the rest of the
imports.
|
|
msg58222 - (view) |
Author: Jonathan Amsterdam (jba) |
Date: 2007-12-05 17:34 |
|
Sorry for the long delay -- I had to put the patch into production to
get the volume of work needed to reveal the problem. There has been no
occurrence for about two weeks now, so I am declaring victory.
(Previously it would happen about once a day.) Attached is
subprocess_nogc.py, which is subprocess.py with exactly the changes
you suggested.
On Oct 30, 2007 1:43 PM, Guido van Rossum <report@bugs.python.org> wrote:
>
> Guido van Rossum added the comment:
>
> In Python, you can't raise an exception from file_dealloc. (Note how
> it is a void function -- there's no way to return an error status.)
>
> Disabling gc however makes sense. I'll think about that. Since you
> seem to have the only reproducible occurrence of this bug, try this:
> right after the "# Child" comment add a call to gc.disable(). You
> should import gc at the top of the module with the rest of the
> imports.
>
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1336>
> __________________________________
>
|
|
msg58223 - (view) |
Author: Guido van Rossum (gvanrossum) |
Date: 2007-12-05 17:39 |
|
I'm sorry to ask you to do more work, but could you do me a favor and
send this in the form of a svn diff? That file has evolved quite a bit
and it's unclear what version you used as a baseline.
|
|
msg58239 - (view) |
Author: Thomas Herve (therve) |
Date: 2007-12-06 10:42 |
|
FWIW, we encountered roughly the same problem in Twisted. However, we
fixed it by disabling gc *before* fork, because the gc can occur just
after the fork. See http://twistedmatrix.com/trac/ticket/2483 for the
bug report, and http://twistedmatrix.com/trac/changeset/21686 for the
solution.
|
|
msg60010 - (view) |
Author: Gregory P. Smith (gregory.p.smith) |
Date: 2008-01-17 00:22 |
|
The uploaded file appears to be derives from Python 2.4.1 (tags/r241).
i've uploaded an actual diff between the two.
regardless, Thomas's point about disabling it -before- the fork() is true.
|
|
msg60011 - (view) |
Author: Gregory P. Smith (gregory.p.smith) |
Date: 2008-01-17 00:51 |
|
I've just attached a diff against trunk that I believe fixes this (based
off of jba's patch). Unit tests pass but that doesn't say a whole lot
given that this is a hard to reproduce timing/scheduling related bug.
Would someone please code review it?
|
|
msg60234 - (view) |
Author: Gregory P. Smith (gregory.p.smith) |
Date: 2008-01-19 21:05 |
|
fixed in trunk r60104. i'm backporting it to 2.5 and 2.4.
|
|
msg60238 - (view) |
Author: Gregory P. Smith (gregory.p.smith) |
Date: 2008-01-19 22:25 |
|
r60111 undid part of r60104 that added an unnecessary else and
indentation. svn diff -r60103:60111 of Lib/subprocess.py is a nice
clean patch.
|
|
msg60239 - (view) |
Author: Gregory P. Smith (gregory.p.smith) |
Date: 2008-01-19 22:35 |
|
r60113 in release25-maint, r60114 in release24-maint.
|
|
msg61477 - (view) |
Author: Guido van Rossum (gvanrossum) |
Date: 2008-01-22 02:00 |
|
Was this a security fix? We aren't supposed to backport non-security
fixes to 2.4, and Martin is actually planning to *reverse* the
non-security patches applied to 2.4 since 2.4.4 was released, to make it
easier to release a "pure security" source tree.
|
|
msg61480 - (view) |
Author: Gregory P. Smith (gregory.p.smith) |
Date: 2008-01-22 02:22 |
|
oh, whoops. not a security issue. should i undo it or let Martin?
|
|
msg61481 - (view) |
Author: Guido van Rossum (gvanrossum) |
Date: 2008-01-22 03:04 |
|
Martin can do it all in one fell sweep.
|
|
| Date |
User |
Action |
Args |
| 2008-01-22 03:04:22 | gvanrossum | set | nosy:
+ loewis messages:
+ msg61481 |
| 2008-01-22 02:22:44 | gregory.p.smith | set | messages:
+ msg61480 |
| 2008-01-22 02:00:08 | gvanrossum | set | messages:
+ msg61477 |
| 2008-01-19 22:35:26 | gregory.p.smith | set | status: open -> closed messages:
+ msg60239 |
| 2008-01-19 22:25:15 | gregory.p.smith | set | messages:
+ msg60238 |
| 2008-01-19 21:05:12 | gregory.p.smith | set | resolution: fixed messages:
+ msg60234 versions:
+ Python 2.6, Python 2.5 |
| 2008-01-17 00:51:01 | gregory.p.smith | set | files:
+ diff-subprocess-trunk-1336-01.txt assignee: gregory.p.smith messages:
+ msg60011 |
| 2008-01-17 00:22:50 | gregory.p.smith | set | files:
+ diff-against-subprocess-2.4.1.txt messages:
+ msg60010 |
| 2008-01-16 23:43:54 | gregory.p.smith | set | nosy:
+ gregory.p.smith |
| 2007-12-06 10:42:22 | therve | set | nosy:
+ therve messages:
+ msg58239 |
| 2007-12-05 17:39:34 | gvanrossum | set | keywords:
+ patch messages:
+ msg58223 |
| 2007-12-05 17:34:20 | jba | set | files:
+ subprocess_nogc.py messages:
+ msg58222 |
| 2007-10-30 18:43:34 | gvanrossum | set | messages:
+ msg56976 |
| 2007-10-30 17:52:43 | jba | set | messages:
+ msg56966 |
| 2007-10-27 23:53:49 | gagenellina | set | nosy:
+ gagenellina |
| 2007-10-26 18:00:47 | gvanrossum | set | nosy:
+ gvanrossum messages:
+ msg56805 |
| 2007-10-26 14:04:12 | jba | create | |
|