This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: subprocess.Popen hangs when child writes to stderr
Type: crash Stage:
Components: Library (Lib) Versions: Python 2.4, Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: ggenellina, gregory.p.smith, gvanrossum, ita1024, jba, jcea, linxiaojun, loewis, therve, torsten
Priority: normal Keywords: patch

Created on 2007-10-26 14:04 by jba, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
subprocess_nogc.py jba, 2007-12-05 17:34
diff-against-subprocess-2.4.1.txt gregory.p.smith, 2008-01-17 00:22 (using python 2.4.1 sources) diff --unified=4 subprocess.py subprocess_nogc.py
diff-subprocess-trunk-1336-01.txt gregory.p.smith, 2008-01-17 00:51 updated patch against trunk
subprocess_demo.py torsten, 2014-06-17 06:53
Messages (20)
msg56790 - (view) Author: Jonathan Amsterdam (jba) Date: 2007-10-26 14:04
This is under Linux (2.6).

I occasionally see subprocess.Popen() fail to return, and I have
finally figured out roughly what's going on. It involves the GC and
stderr.

1. os.fork()

2. Parent blocks reading from errpipe_read (subprocess.py:982)

3. In child, a GC occurs before the exec.

4. The GC attempts to free a file descriptor, calling file_dealloc.

5. That function gets an error closing the descriptor ("close failed:
[Errno 9] Bad file descriptor\n," is the string I extracted via gdb).

6. It attempts to write the error to stderr and blocks. Since it never
execs or writes to errpipe_write, both child and parent are hung.

Here is the pstack output on the child:

#0  0x006587a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x0072f11b in __write_nocancel () from /lib/tls/libc.so.6
#2  0x006d409f in _IO_new_file_write () from /lib/tls/libc.so.6
#3  0x006d42ec in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#4  0x006afce9 in buffered_vfprintf () from /lib/tls/libc.so.6
#5  0x006afe8b in vfprintf () from /lib/tls/libc.so.6
#6  0x080dd4af in mywrite ()
#7  0x080dd505 in PySys_WriteStderr ()
#8  0x08064cd0 in file_dealloc ()
#9  0x08079c88 in dict_dealloc ()
#10 0x0808931d in subtype_dealloc ()
#11 0x08079af3 in PyDict_Clear ()
#12 0x0807bb6a in dict_tp_clear ()
#13 0x080e0ead in collect ()
#14 0x080e1912 in _PyObject_GC_New ()
#15 0x0805e50b in PyInstance_NewRaw ()
#16 0x0805e5d7 in PyInstance_New ()
#17 0x0805bdc0 in PyObject_Call ()
#18 0x080aecef in PyEval_CallObjectWithKeywords ()
#19 0x080ca975 in PyErr_NormalizeException ()
#20 0x080b492c in PyEval_EvalFrame ()
#21 0x080b5eb2 in PyEval_EvalCodeEx ()
#22 0x080b3c83 in PyEval_EvalFrame ()
#23 0x080b5734 in PyEval_EvalFrame ()
#24 0x080b5eb2 in PyEval_EvalCodeEx ()
#25 0x080fce92 in function_call ()
#26 0x0805bdc0 in PyObject_Call ()
#27 0x080641e5 in instancemethod_call ()
#28 0x0805bdc0 in PyObject_Call ()
#29 0x0808ffce in slot_tp_init ()
#30 0x08088b3a in type_call ()
#31 0x0805bdc0 in PyObject_Call ()
#32 0x080b0f05 in PyEval_EvalFrame ()
#33 0x080b5eb2 in PyEval_EvalCodeEx ()
#34 0x080fce92 in function_call ()
#35 0x0805bdc0 in PyObject_Call ()
#36 0x080641e5 in instancemethod_call ()
#37 0x0805bdc0 in PyObject_Call ()
#38 0x0808ffce in slot_tp_init ()
#39 0x08088b3a in type_call ()
#40 0x0805bdc0 in PyObject_Call ()
#41 0x080b0f05 in PyEval_EvalFrame ()
#42 0x080b5734 in PyEval_EvalFrame ()
#43 0x080b5eb2 in PyEval_EvalCodeEx ()
#44 0x080fce92 in function_call ()
#45 0x0805bdc0 in PyObject_Call ()
#46 0x080641e5 in instancemethod_call ()
#47 0x0805bdc0 in PyObject_Call ()
#48 0x0808ffce in slot_tp_init ()
#49 0x08088b3a in type_call ()
#50 0x0805bdc0 in PyObject_Call ()
#51 0x080b0f05 in PyEval_EvalFrame ()
#52 0x080b5eb2 in PyEval_EvalCodeEx ()
#53 0x080fce92 in function_call ()
#54 0x0805bdc0 in PyObject_Call ()
#55 0x080b075f in PyEval_EvalFrame ()
#56 0x080b5734 in PyEval_EvalFrame ()
#57 0x080b5734 in PyEval_EvalFrame ()
#58 0x080b5734 in PyEval_EvalFrame ()
#59 0x080b5eb2 in PyEval_EvalCodeEx ()
#60 0x080b3c83 in PyEval_EvalFrame ()
#61 0x080b5734 in PyEval_EvalFrame ()
#62 0x080b5734 in PyEval_EvalFrame ()
#63 0x080b5eb2 in PyEval_EvalCodeEx ()
#64 0x080b3c83 in PyEval_EvalFrame ()
#65 0x080b5eb2 in PyEval_EvalCodeEx ()
#66 0x080b3c83 in PyEval_EvalFrame ()
#67 0x080b5eb2 in PyEval_EvalCodeEx ()
#68 0x080b3c83 in PyEval_EvalFrame ()
#69 0x080b5734 in PyEval_EvalFrame ()
#70 0x080b5734 in PyEval_EvalFrame ()
#71 0x080b5734 in PyEval_EvalFrame ()
#72 0x080b5734 in PyEval_EvalFrame ()
#73 0x080b5734 in PyEval_EvalFrame ()
#74 0x080b5eb2 in PyEval_EvalCodeEx ()
#75 0x080fce92 in function_call ()
#76 0x0805bdc0 in PyObject_Call ()
#77 0x080b075f in PyEval_EvalFrame ()
#78 0x080b5eb2 in PyEval_EvalCodeEx ()
#79 0x080b3c83 in PyEval_EvalFrame ()
#80 0x080b5eb2 in PyEval_EvalCodeEx ()
#81 0x080b3c83 in PyEval_EvalFrame ()
#82 0x080b5eb2 in PyEval_EvalCodeEx ()
#83 0x080b3c83 in PyEval_EvalFrame ()
#84 0x080b5734 in PyEval_EvalFrame ()
#85 0x080b5734 in PyEval_EvalFrame ()
#86 0x080b5eb2 in PyEval_EvalCodeEx ()
#87 0x080b601a in PyEval_EvalCode ()
#88 0x080d9ff4 in PyRun_FileExFlags ()
#89 0x080da8d6 in PyRun_SimpleFileExFlags ()
#90 0x08055617 in Py_Main ()
#91 0x08054e3f in main ()
msg56805 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-26 18:00
I don't think we can prevent GC from occurring between fork and exec --
it's legal to just call os.fork() and execute Python code in the
subprocess forever.  I think the right solution might be to ignore
errors in file_close().  Can you try to whip up a patch for that and
test it?
msg56966 - (view) Author: Jonathan Amsterdam (jba) Date: 2007-10-30 17:52
It's an honor to hear from you, BDFL.

I'm just a Python user, not a developer, so the time it would take me
to check out, compile, test, etc. would dwarf the change itself
(removing two lines from file_dealloc).

There is another solution, following Java: file_dealloc could raise an
exception. Then the error would appear in normal contexts, but
(assuming the GC ignores exceptions in finalization) would be ignored
in GC.

There is also a simpler, less invasive change you could consider:
disabling GC in the child inside subprocess.Popen.

On 10/26/07, Guido van Rossum <report@bugs.python.org> wrote:
>
> Guido van Rossum added the comment:
>
> I don't think we can prevent GC from occurring between fork and exec --
> it's legal to just call os.fork() and execute Python code in the
> subprocess forever.  I think the right solution might be to ignore
> errors in file_close().  Can you try to whip up a patch for that and
> test it?
>
> ----------
> nosy: +gvanrossum
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1336>
> __________________________________
>
msg56976 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-30 18:43
In Python, you can't raise an exception from file_dealloc. (Note how
it is a void function -- there's no way to return an error status.)

Disabling gc however makes sense. I'll think about that. Since you
seem to have the only reproducible occurrence of this bug, try this:
right after the "# Child" comment add a call to gc.disable(). You
should import gc at the top of the module with the rest of the
imports.
msg58222 - (view) Author: Jonathan Amsterdam (jba) Date: 2007-12-05 17:34
Sorry for the long delay -- I had to put the patch into production to
get the volume of work needed to reveal the problem. There has been no
occurrence for about two weeks now, so I am declaring victory.
(Previously it would happen about once a day.) Attached is
subprocess_nogc.py, which is subprocess.py with exactly the changes
you suggested.

On Oct 30, 2007 1:43 PM, Guido van Rossum <report@bugs.python.org> wrote:
>
> Guido van Rossum added the comment:
>
> In Python, you can't raise an exception from file_dealloc. (Note how
> it is a void function -- there's no way to return an error status.)
>
> Disabling gc however makes sense. I'll think about that. Since you
> seem to have the only reproducible occurrence of this bug, try this:
> right after the "# Child" comment add a call to gc.disable(). You
> should import gc at the top of the module with the rest of the
> imports.
>
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1336>
> __________________________________
>
msg58223 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-05 17:39
I'm sorry to ask you to do more work, but could you do me a favor and
send this in the form of a svn diff?  That file has evolved quite a bit
and it's unclear what version you used as a baseline.
msg58239 - (view) Author: Thomas Herve (therve) * Date: 2007-12-06 10:42
FWIW, we encountered roughly the same problem in Twisted. However, we
fixed it by disabling gc *before* fork, because the gc can occur just
after the fork. See http://twistedmatrix.com/trac/ticket/2483 for the
bug report, and http://twistedmatrix.com/trac/changeset/21686 for the
solution.
msg60010 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-17 00:22
The uploaded file appears to be derives from Python 2.4.1 (tags/r241).

i've uploaded an actual diff between the two.

regardless, Thomas's point about disabling it -before- the fork() is true.
msg60011 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-17 00:51
I've just attached a diff against trunk that I believe fixes this (based
off of jba's patch).  Unit tests pass but that doesn't say a whole lot
given that this is a hard to reproduce timing/scheduling related bug.

Would someone please code review it?
msg60234 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-19 21:05
fixed in trunk r60104.  i'm backporting it to 2.5 and 2.4.
msg60238 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-19 22:25
r60111 undid part of r60104 that added an unnecessary else and
indentation.  svn diff -r60103:60111 of Lib/subprocess.py is a nice
clean patch.
msg60239 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-19 22:35
r60113 in release25-maint, r60114 in release24-maint.
msg61477 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-22 02:00
Was this a security fix? We aren't supposed to backport non-security
fixes to 2.4, and Martin is actually planning to *reverse* the
non-security patches applied to 2.4 since 2.4.4 was released, to make it
easier to release a "pure security" source tree.
msg61480 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-22 02:22
oh, whoops.  not a security issue.  should i undo it or let Martin?
msg61481 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-22 03:04
Martin can do it all in one fell sweep.
msg146685 - (view) Author: (ita1024) Date: 2011-10-31 10:22
The following piece of code does not seem to be thread-safe:

gc.disable()
try:
    os.fork()
except:
    ...

If calling subprocess.Popen() is supposed to work from threads without any particular protection, then a lock is needed in subprocess.py:

try:
   spawn_lock.acquire()
   gc.disable()
   try:
       os.fork()
   except:
       ...
finally:
   spawn_lock.release()

Such issues are very difficult to reproduce and to track. The documentation should at least mention that calling gc.enable() may cause subprocess() to hang!

The following issue seems to be relevant: http://bugs.python.org/issue12739
msg146690 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-10-31 10:55
ita1024: please don't post to closed issues; your message here will be ignored.
msg220799 - (view) Author: Torsten Landschoff (torsten) * Date: 2014-06-17 06:53
> ita1024: please don't post to closed issues; your message here will be ignored.

Bugs for Python 2 will be ignored anyway so what can you do? I am currently fighting with the effects of using threads, subprocess.Popen and sqlite in Python2 and found this bug report.

Among other issues (like a hang for still unknown reasons when calling subprocess.Popen) I have also seen gc getting disabled for a long running process which has multiple threads and starts processes via subprocess.Popen from different threads.

The attached example subprocess_demo.py reproduces disabling GC for me in Python 2.7.6. I checked with Python 3.4 and it is fixed there. Looking at the source, Python 3 implements subprocess.Popen correctly by doing fork + execve at the C side, unless the caller really wants to run a preexec_fn and is prepared for the failure.

Another case for using Python 3...
msg276976 - (view) Author: Torsten Landschoff (torsten) * Date: 2016-09-19 13:07
Just got bitten by this problem again.

If anybody else still runs into this, the solution on python2.7 is to install the subprocess32 module from pypi.python.org and use that instead of subprocess.
msg404292 - (view) Author: EricLin (linxiaojun) Date: 2021-10-19 12:27
@torsten
Your comment here did a great help to me. I observed a long running python 2.7 process leaking memory. I did a lot of digging and eventually found that gc was disabled in this process.
But who disabled it? I really have no idea.
Luckily, I find this comment. The scenario of my process is exactly the same as you said, threading + subprocess. Finally...what a hard trip here.
Thanks a lot!
History
Date User Action Args
2022-04-11 14:56:27adminsetgithub: 45677
2021-10-19 18:49:22vstinnersetnosy: - vstinner
2021-10-19 12:27:07linxiaojunsetnosy: + linxiaojun
messages: + msg404292
2016-09-19 13:07:24torstensetmessages: + msg276976
2014-06-17 06:53:53torstensetfiles: + subprocess_demo.py
nosy: + torsten
messages: + msg220799

2012-10-26 00:04:45jceasetnosy: + jcea
2011-10-31 10:55:35loewissetmessages: + msg146690
2011-10-31 10:41:06vstinnersetnosy: + vstinner
2011-10-31 10:22:43ita1024setnosy: + ita1024
messages: + msg146685
2008-01-22 03:04:22gvanrossumsetnosy: + loewis
messages: + msg61481
2008-01-22 02:22:44gregory.p.smithsetmessages: + msg61480
2008-01-22 02:00:08gvanrossumsetmessages: + msg61477
2008-01-19 22:35:26gregory.p.smithsetstatus: open -> closed
messages: + msg60239
2008-01-19 22:25:15gregory.p.smithsetmessages: + msg60238
2008-01-19 21:05:12gregory.p.smithsetresolution: fixed
messages: + msg60234
versions: + Python 2.6, Python 2.5
2008-01-17 00:51:01gregory.p.smithsetfiles: + diff-subprocess-trunk-1336-01.txt
assignee: gregory.p.smith
messages: + msg60011
2008-01-17 00:22:50gregory.p.smithsetfiles: + diff-against-subprocess-2.4.1.txt
messages: + msg60010
2008-01-16 23:43:54gregory.p.smithsetnosy: + gregory.p.smith
2007-12-06 10:42:22thervesetnosy: + therve
messages: + msg58239
2007-12-05 17:39:34gvanrossumsetkeywords: + patch
messages: + msg58223
2007-12-05 17:34:20jbasetfiles: + subprocess_nogc.py
messages: + msg58222
2007-10-30 18:43:34gvanrossumsetmessages: + msg56976
2007-10-30 17:52:43jbasetmessages: + msg56966
2007-10-27 23:53:49ggenellinasetnosy: + ggenellina
2007-10-26 18:00:47gvanrossumsetnosy: + gvanrossum
messages: + msg56805
2007-10-26 14:04:12jbacreate