classification
Title: Doc: subprocess wait() may lead to dead lock
Type: behavior Stage:
Components: Documentation Versions: Python 3.0, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: christian.heimes, draghuram, gregory.p.smith, gvanrossum
Priority: normal Keywords:

Created on 2007-12-13 02:04 by christian.heimes, last changed 2008-08-04 01:04 by gregory.p.smith. This issue is now closed.

Messages (9)
msg58514 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-13 02:04
The subprocess docs need a warning that code like

p = subprocess.Popen(..., stdout=STDOUT)
p.wait()
p.stdout.read()

can block indefinitely if the program fills the stdout buffer. It needs
an example how to do it right but I don't know the best way to solve the
problem.
msg58516 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-13 02:13
Why not simply reverse the wait() and read() calls?
msg58518 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-13 02:25
Guido van Rossum wrote:
> Why not simply reverse the wait() and read() calls?

I don't think it is sufficient if the user uses more than one pipe. The
subprocess.communicate() and _communicate() methods are using threads
(Windows) or select (Unix) when multiple pipes for stdin, stderr and
stderr are involved.

The only safe way with multiple pipes is communicate([input]) unless the
process returns *lots* of data. The subprocess module is buffering the
data in memory if the user uses PIPE or STDIN.

The subprocess module also contains a XXX comment in the unix version of
_communicate()

   # XXX Rewrite these to use non-blocking I/O on the
   # file objects; they are no longer using C stdio!

Should I create another bug entry for it?

Christian
msg58538 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-13 17:49
> > Why not simply reverse the wait() and read() calls?
>
> I don't think it is sufficient if the user uses more than one pipe. The
> subprocess.communicate() and _communicate() methods are using threads
> (Windows) or select (Unix) when multiple pipes for stdin, stderr and
> stderr are involved.

That is done precisely to *avoid* blocking. I believe the only reason
your example blocks is because you wait before reading -- you should
do it the other way around, do all I/O first and *then* wait for the
process to exit.

> The only safe way with multiple pipes is communicate([input]) unless the
> process returns *lots* of data. The subprocess module is buffering the
> data in memory if the user uses PIPE or STDIN.

I disagree. I don't believe it will block unless you make the mistake
of waiting for the process first.

> The subprocess module also contains a XXX comment in the unix version of
> _communicate()
>
>    # XXX Rewrite these to use non-blocking I/O on the
>    # file objects; they are no longer using C stdio!
>
> Should I create another bug entry for it?

No, we have too many bug entries already.
msg58589 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-13 21:14
Guido van Rossum wrote:
> That is done precisely to *avoid* blocking. I believe the only reason
> your example blocks is because you wait before reading -- you should
> do it the other way around, do all I/O first and *then* wait for the
> process to exit.

I believe so, too. The subprocess docs aren't warning about the problem.
I've seen a fair share of programmers who fall for the trap - including
me a few weeks ago.

> I disagree. I don't believe it will block unless you make the mistake
> of waiting for the process first.

Consider yet another example

>>> p = Popen(someprogram, stdin=PIPE, stdout=PIPE)
>>> p.stdin.write(10MB of data)

someprogram processes the incoming data in small blocks. Let's say 1KB
and 1MB stdin and stdout buffer. It reads 1KB from stdin and writes 1KB
to stdout until the stdout buffer is full. The program stops and waits
for for Python to free the stdout buffer. However the python code is
still writing data to the limited stdin buffer.

>>> data = p.stout.read()

Is the scenario realistic?

I tried it.

*** This works although it is slow
$ cat img_0948.jpg | convert - png:- >test

*** This example does not work. The test file is created but no data is
written to the file.

p = subprocess.Popen(["convert", "-",  "png:-"],
                     stdin=subprocess.PIPE, stdout=subprocess.PIPE)

img = open("img_0948.jpg", "rb")
p.stdin.write(img.read())
with open("test", "wb") as f:
    f.write(p.stdout.read())

*** It works with communicate:
with open("test", "wb") as f:
    out, err = p.communicate(img.read())
    f.write(out)

Christian
msg58591 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-12-13 21:32
Look at #1256 for similar report. A doc change was suggested there as well.
msg58594 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-13 22:03
> I believe so, too. The subprocess docs aren't warning about the problem.
> I've seen a fair share of programmers who fall for the trap - including
> me a few weeks ago.

Yes, the docs should definitely address this.

> Consider yet another example
>
> >>> p = Popen(someprogram, stdin=PIPE, stdout=PIPE)
> >>> p.stdin.write(10MB of data)
>
> someprogram processes the incoming data in small blocks. Let's say 1KB
> and 1MB stdin and stdout buffer. It reads 1KB from stdin and writes 1KB
> to stdout until the stdout buffer is full. The program stops and waits
> for for Python to free the stdout buffer. However the python code is
> still writing data to the limited stdin buffer.

Hm. I thought this would be handled using threads or select but it
doesn't seem to be quite the case. communicate() does the right thing
but if you use p.stdin.write() directly you may indeed hang.
msg69396 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-07-07 20:47
i'll come up with something for the documentation on this.
msg70674 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-08-04 01:04
See the documentation update in trunk r65469.  It adds warnings about
both common pipe related pitfalls discussed in this bug.
History
Date User Action Args
2008-08-04 01:04:48gregory.p.smithsetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg70674
2008-07-07 20:47:39gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg69396
resolution: accepted
assignee: gregory.p.smith
type: behavior
2008-01-06 14:12:46christian.heimeslinkissue1256 superseder
2007-12-13 22:03:36gvanrossumsetmessages: + msg58594
2007-12-13 21:32:13draghuramsetnosy: + draghuram
messages: + msg58591
2007-12-13 21:14:17christian.heimessetmessages: + msg58589
2007-12-13 17:49:53gvanrossumsetmessages: + msg58538
2007-12-13 02:25:02christian.heimessetmessages: + msg58518
2007-12-13 02:13:04gvanrossumsetnosy: + gvanrossum
messages: + msg58516
2007-12-13 02:04:38christian.heimescreate