classification
Title: subprocess and deadlock avoidance
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: akira, brandjon, martin.panter, ncoghlan, rosslagerwall, v+python
Priority: normal Keywords:

Created on 2010-11-21 06:16 by v+python, last changed 2019-03-15 21:59 by BreamoreBoy.

Messages (9)
msg121871 - (view) Author: Glenn Linderman (v+python) * Date: 2010-11-21 06:16
.communicate is a nice API for programs that produce little output, and can be buffered.  While that may cover a wide range of uses, it doesn't cover launching CGI programs, such as is done in http.server.  Now there are nice warnings about that issue in the http.server documentation.

However, while .communicate has the building blocks to provide more general solutions, it doesn't expose them to the user, nor does it separate them into building blocks, rather it is a monolith inside ._communicate.

For example, it would be nice to have an API that would "capture a stream using a thread" which could be used for either stdout or stdin, and is what ._communicate does under the covers for both of them.

It would also be nice to have an API that would "pump a bytestream to .stdin as a background thread.  ._communicate doesn't provide that one, but uses the foreground thread for that.  And, it requires that it be fully buffered.  It would be most useful for http.server if this API could connect a file handle and an optional maximum read count to .stdin, yet do it in a background thread.

That would leave the foreground thread able to process stdout.  It is correct (but not what http.server presently does, but I'll be entering that enhancement request soon) for http.server to read the first line from the CGI program, transform it, add a few more headers, and send that to the browser, and then hook up .stdout to the browser (shutil.copyfileobj can be used for the rest of the stream).  However, there is no deadlock free way of achieving this sort of solution, capturing the stderr to be logged, not needing to buffer a potentially large file upload, and transforming the stdout, with the facilities currently provided by subprocess.  Breaking apart some of the existing building blocks, and adding an additional one for .stdin processing would allow a real http.server implementation, as well as being more general for other complex uses.

You see, for http.server, the stdin
msg122264 - (view) Author: Glenn Linderman (v+python) * Date: 2010-11-24 07:56
So I've experimented a bit, and it looks like simply exposing ._readerthread as an external API would handle the buffered case for stdout or stderr.  For http.server CGI scripts, I think it is fine to buffer stderr, as it should not be a high-volume channel... but not both stderr and stdout, as stdout can be huge.  And not stdin, because it can be huge also.

For stdin, something like the following might work nicely for some cases, including http.server (with revisions):

    def _writerthread(self, fhr, fhw, length):
        while length > 0:
            buf = fhr.read( min( 8196, length ))
            fhw.write( buf )
            length -= len( buf )
        fhw.close()

When the stdin data is buffered, but the application wishes to be stdout centric instead of stdin centric (like the current ._communicate code), a variation could be made replacing fhr by a data buffer, and writing it gradually (or fully) to the pipe, but from a secondary thread.

Happily, this sort of code (the above is extracted from a test version of http.server) can be implemented in the server, but would be more usefully provided by subprocess, in my opinion.

To include the above code inside subprocess would just be a matter of tweaking references to class members instead of parameters.
msg123058 - (view) Author: Glenn Linderman (v+python) * Date: 2010-12-02 06:59
Here's an updated _writerthread idea that handles more cases:

    def _writerthread(self, fhr, fhw, length=None):
        if length is None:
            flag = True
            while flag:
                buf = fhr.read( 512 )
                fhw.write( buf )
                if len( buf ) == 0:
                    flag = False
        else:
            while length > 0:
                buf = fhr.read( min( 512, length ))
                fhw.write( buf )
                length -= len( buf )
        # throw away additional data [see bug #427345]
        while select.select([fhr._sock], [], [], 0)[0]:
            if not fhr._sock.recv(1):
                break
        fhw.close()
msg123059 - (view) Author: Glenn Linderman (v+python) * Date: 2010-12-02 07:02
Sorry, left some extraneous code in the last message, here is the right code:

    def _writerthread(self, fhr, fhw, length=None):
        if length is None:
            flag = True
            while flag:
                buf = fhr.read( 512 )
                fhw.write( buf )
                if len( buf ) == 0:
                    flag = False
        else:
            while length > 0:
                buf = fhr.read( min( 512, length ))
                fhw.write( buf )
                length -= len( buf )
        fhw.close()
msg123454 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-12-06 04:41
The general idea is sound. My work colleagues have certainly had to implement their own reader/writer thread equivalents to keep subprocess from blocking.

It makes sense to provide more robust public support for such techniques in process itself.
msg123517 - (view) Author: Glenn Linderman (v+python) * Date: 2010-12-07 03:01
Looking at the code the way I've used it in my modified server.py:

            stderr = []
            stderr_thread = threading.Thread(target=self._readerthread,
                                             args=(p.stderr, stderr))
            stderr_thread.daemon = True
            stderr_thread.start()

            self.log_message("writer: %s" % str( nbytes ))
            stdin_thread = threading.Thread(target=self._writerthread,
                                            args=(self.rfile, p.stdin, nbytes))
            stdin_thread.daemon = True
            stdin_thread.start()

and later

            stderr_thread.join()
            stdin_thread.join()

            p.stderr.close()
            p.stdout.close()

            if stderr:
                stderr = stderr[ 0 ].decode("UTF-8")

It seems like this sort of code (possibly passing in the encoding) could be bundled back inside subprocess (I borrowed it from there).

It also seems from recent discussion on npopdev that the cheat-sheet "how to replace" other sys and os popen functions would be better done as wrapper functions for the various cases.  Someone pointed out that the hard cases probably aren't cross-platform, but that currently the easy cases all get harder when using subprocess than when using the deprecated facilities.  They shouldn't.  The names may need to be a bit more verbose to separate the various use cases, but each use case should remain at least as simple as the prior function.

So perhaps instead of just  subprocess.PIPE  to select particular handling for stdin, stdout, and stderr, subprocess should implement some varieties to handle attaching  different types of reader and writer threads to the handles... of course, parameters need to come along for the ride too: maybe the the additional variations would be object references with parameters supplied, instead of just a manifest constant like .PIPE.
msg123521 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-12-07 04:01
Or various incarnations of functools.partial applied to subprocess.Popen.
msg222894 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-12 22:53
@Glenn can you provide a formal patch so we can take this forward?
msg245560 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-06-20 12:50
Related: Issue 1260171, essentially proposing streaming readers and writers for communicate() instead of fixed buffers, but without using OS threads.
History
Date User Action Args
2019-03-15 21:59:23BreamoreBoysetnosy: - BreamoreBoy
2015-06-20 12:50:44martin.pantersetmessages: + msg245560
2015-03-22 15:11:03akirasetnosy: + akira
2015-03-22 07:19:35martin.pantersetnosy: + martin.panter
2014-07-12 22:53:20BreamoreBoysetnosy: + BreamoreBoy

messages: + msg222894
versions: + Python 3.5, - Python 3.4
2013-07-09 11:40:25christian.heimessetstage: needs patch
versions: + Python 3.4, - Python 3.3
2012-02-05 20:25:58brandjonsetnosy: + brandjon
2012-02-05 14:56:15rosslagerwallsetnosy: + rosslagerwall
2010-12-07 04:01:10ncoghlansetmessages: + msg123521
2010-12-07 03:01:38v+pythonsetmessages: + msg123517
2010-12-06 04:41:24ncoghlansetnosy: + ncoghlan

messages: + msg123454
versions: + Python 3.3, - Python 3.2
2010-12-02 07:02:46v+pythonsetmessages: + msg123059
2010-12-02 06:59:29v+pythonsetmessages: + msg123058
2010-11-24 07:56:19v+pythonsetmessages: + msg122264
2010-11-21 06:16:01v+pythoncreate