Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess and deadlock avoidance #54691

Closed
vpython mannequin opened this issue Nov 21, 2010 · 11 comments
Closed

subprocess and deadlock avoidance #54691

vpython mannequin opened this issue Nov 21, 2010 · 11 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@vpython
Copy link
Mannequin

vpython mannequin commented Nov 21, 2010

BPO 10482
Nosy @ncoghlan, @4kir4, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2010-11-21.06:16:01.170>
labels = ['type-feature', 'library']
title = 'subprocess and deadlock avoidance'
updated_at = <Date 2019-03-15.21:59:23.467>
user = 'https://bugs.python.org/vpython'

bugs.python.org fields:

activity = <Date 2019-03-15.21:59:23.467>
actor = 'BreamoreBoy'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2010-11-21.06:16:01.170>
creator = 'v+python'
dependencies = []
files = []
hgrepos = []
issue_num = 10482
keywords = []
message_count = 9.0
messages = ['121871', '122264', '123058', '123059', '123454', '123517', '123521', '222894', '245560']
nosy_count = 6.0
nosy_names = ['ncoghlan', 'v+python', 'akira', 'rosslagerwall', 'martin.panter', 'brandjon']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue10482'
versions = ['Python 3.5']

@vpython
Copy link
Mannequin Author

vpython mannequin commented Nov 21, 2010

.communicate is a nice API for programs that produce little output, and can be buffered. While that may cover a wide range of uses, it doesn't cover launching CGI programs, such as is done in http.server. Now there are nice warnings about that issue in the http.server documentation.

However, while .communicate has the building blocks to provide more general solutions, it doesn't expose them to the user, nor does it separate them into building blocks, rather it is a monolith inside ._communicate.

For example, it would be nice to have an API that would "capture a stream using a thread" which could be used for either stdout or stdin, and is what ._communicate does under the covers for both of them.

It would also be nice to have an API that would "pump a bytestream to .stdin as a background thread. ._communicate doesn't provide that one, but uses the foreground thread for that. And, it requires that it be fully buffered. It would be most useful for http.server if this API could connect a file handle and an optional maximum read count to .stdin, yet do it in a background thread.

That would leave the foreground thread able to process stdout. It is correct (but not what http.server presently does, but I'll be entering that enhancement request soon) for http.server to read the first line from the CGI program, transform it, add a few more headers, and send that to the browser, and then hook up .stdout to the browser (shutil.copyfileobj can be used for the rest of the stream). However, there is no deadlock free way of achieving this sort of solution, capturing the stderr to be logged, not needing to buffer a potentially large file upload, and transforming the stdout, with the facilities currently provided by subprocess. Breaking apart some of the existing building blocks, and adding an additional one for .stdin processing would allow a real http.server implementation, as well as being more general for other complex uses.

You see, for http.server, the stdin

@vpython vpython mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Nov 21, 2010
@vpython
Copy link
Mannequin Author

vpython mannequin commented Nov 24, 2010

So I've experimented a bit, and it looks like simply exposing ._readerthread as an external API would handle the buffered case for stdout or stderr. For http.server CGI scripts, I think it is fine to buffer stderr, as it should not be a high-volume channel... but not both stderr and stdout, as stdout can be huge. And not stdin, because it can be huge also.

For stdin, something like the following might work nicely for some cases, including http.server (with revisions):

    def _writerthread(self, fhr, fhw, length):
        while length > 0:
            buf = fhr.read( min( 8196, length ))
            fhw.write( buf )
            length -= len( buf )
        fhw.close()

When the stdin data is buffered, but the application wishes to be stdout centric instead of stdin centric (like the current ._communicate code), a variation could be made replacing fhr by a data buffer, and writing it gradually (or fully) to the pipe, but from a secondary thread.

Happily, this sort of code (the above is extracted from a test version of http.server) can be implemented in the server, but would be more usefully provided by subprocess, in my opinion.

To include the above code inside subprocess would just be a matter of tweaking references to class members instead of parameters.

@vpython
Copy link
Mannequin Author

vpython mannequin commented Dec 2, 2010

Here's an updated _writerthread idea that handles more cases:

    def _writerthread(self, fhr, fhw, length=None):
        if length is None:
            flag = True
            while flag:
                buf = fhr.read( 512 )
                fhw.write( buf )
                if len( buf ) == 0:
                    flag = False
        else:
            while length > 0:
                buf = fhr.read( min( 512, length ))
                fhw.write( buf )
                length -= len( buf )
        # throw away additional data [see bug python/cpython#34546]
        while select.select([fhr._sock], [], [], 0)[0]:
            if not fhr._sock.recv(1):
                break
        fhw.close()

@vpython
Copy link
Mannequin Author

vpython mannequin commented Dec 2, 2010

Sorry, left some extraneous code in the last message, here is the right code:

    def _writerthread(self, fhr, fhw, length=None):
        if length is None:
            flag = True
            while flag:
                buf = fhr.read( 512 )
                fhw.write( buf )
                if len( buf ) == 0:
                    flag = False
        else:
            while length > 0:
                buf = fhr.read( min( 512, length ))
                fhw.write( buf )
                length -= len( buf )
        fhw.close()

@ncoghlan
Copy link
Contributor

ncoghlan commented Dec 6, 2010

The general idea is sound. My work colleagues have certainly had to implement their own reader/writer thread equivalents to keep subprocess from blocking.

It makes sense to provide more robust public support for such techniques in process itself.

@vpython
Copy link
Mannequin Author

vpython mannequin commented Dec 7, 2010

Looking at the code the way I've used it in my modified server.py:

            stderr = []
            stderr_thread = threading.Thread(target=self._readerthread,
                                             args=(p.stderr, stderr))
            stderr_thread.daemon = True
            stderr_thread.start()

            self.log_message("writer: %s" % str( nbytes ))
            stdin_thread = threading.Thread(target=self._writerthread,
                                            args=(self.rfile, p.stdin, nbytes))
            stdin_thread.daemon = True
            stdin_thread.start()

and later

            stderr_thread.join()
            stdin_thread.join()

            p.stderr.close()
            p.stdout.close()

            if stderr:
                stderr = stderr[ 0 ].decode("UTF-8")

It seems like this sort of code (possibly passing in the encoding) could be bundled back inside subprocess (I borrowed it from there).

It also seems from recent discussion on npopdev that the cheat-sheet "how to replace" other sys and os popen functions would be better done as wrapper functions for the various cases. Someone pointed out that the hard cases probably aren't cross-platform, but that currently the easy cases all get harder when using subprocess than when using the deprecated facilities. They shouldn't. The names may need to be a bit more verbose to separate the various use cases, but each use case should remain at least as simple as the prior function.

So perhaps instead of just subprocess.PIPE to select particular handling for stdin, stdout, and stderr, subprocess should implement some varieties to handle attaching different types of reader and writer threads to the handles... of course, parameters need to come along for the ride too: maybe the the additional variations would be object references with parameters supplied, instead of just a manifest constant like .PIPE.

@ncoghlan
Copy link
Contributor

ncoghlan commented Dec 7, 2010

Or various incarnations of functools.partial applied to subprocess.Popen.

@BreamoreBoy
Copy link
Mannequin

BreamoreBoy mannequin commented Jul 12, 2014

@Glenn can you provide a formal patch so we can take this forward?

@vadmium
Copy link
Member

vadmium commented Jun 20, 2015

Related: bpo-1260171, essentially proposing streaming readers and writers for communicate() instead of fixed buffers, but without using OS threads.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@iritkatriel
Copy link
Member

Related: bpo-1260171, essentially proposing streaming readers and writers for communicate() instead of fixed buffers, but without using OS threads.

Should this be closed on the same basis?

CC @gpshead

@gpshead
Copy link
Member

gpshead commented May 26, 2023

Should this be closed on the same basis?

Yep, thanks!

@gpshead gpshead closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants