New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add non-blocking read and write methods to subprocess.Popen #41922
Comments
It would be terribly nice if the Popen class in the Why would this functionality be useful? Currently, Certainly one can pull the pipes out of the Popen There is a post by Paul Du Bois with an example using And a message from Neil Hodgeson stating that With this modification, creating Expect-like modules |
Logged In: YES More control of sub-processes would be great. Several times |
Logged In: YES I've got a version of subprocess that has this functionality About the only question is whether the functionality is Perhaps adding an optional argument 'wait_for_completion' to If the 'wait_for_completion' argument is untrue, then it How does that sound? |
Logged In: YES I suppose I should mention one side-effect of all this. It |
Logged In: YES How about if subprocesses have 3 new methods, send(input), send(input) would perform like socket.send(), sending as I currently have an implementation of the above on Windows |
Logged In: YES I've implemented this as a subclass of subprocess.Popen in Essentially this is a request for inclusion in the standard |
Logged In: YES I've implemented this as a subclass of subprocess.Popen in |
I would also like to see this feature. I'm using Josiah Carlson's recipe for the time being. I'm agnostic about whether the asynchronicity is a "mode" or just a case of using different functions. However, if the former is chosen, then one should be able to switch modes at will, because sometimes I want blocking and sometimes I don't. |
The way subprocess is currently written, if you were to use a blocking semantic, you would no longer be able to use an asynchronous semantic afterwards (it will read the result until there is nothing more to read). Note that this is the case whether it is mode or method based. |
If there were a blocking "read x bytes" type call, could you not do some non-blocking stuff afterwards? If you use the "read" method, with a byte limit, directly on the stdout member of Popen, that could return with your bytes, and then you do some non-blocking stuff afterwards. That's the external view of course, I suspect that you're saying there's some internal lower-level reason this is currently not possible. Thanks for your interim class BTW, it is proving to be very useful. |
Unless you use PeekNamedPipe on Windows, there is no guarantee that the pipe will return *any* data until the process ends, even if you say pipe.read(1) and there is 1k of data already sent. Note that the two async reading methods that I provide (recv() and recv_err()) take a 'maximum bytes' argument. Just like I have written a 'send_all()' utility function, I (or you) can easily write a 'recv_exact()' utility function that receives an exact number of bytes before returning. That functionality would be more or less required for one of the expected use-cases I specify in the recipe, "writing a multi-platform 'expect' module". Stick with async calls (with the utility recv_exact()) until you need to use the .communicate() method. |
Josiah: can this be closed? |
I don't believe this should be closed. The functionality is still |
Hello, I am working on this patch and some additional features for my |
Retargeting to 3.2 since 3.1 is now feature-frozen. |
PEP-3145 has been written in response to this request. |
What is the status? What is required to get this into 3.3? See also bpo-14872. |
Comments on Josiah's patch:
Additional comments on http://code.google.com/p/subprocdev
|
Personally, I would factor out the code for Popen.communicate() in to a Communicator class which wraps a Popen object and has a method
On Windows this would use threads, and on Unix, select. |
There's documentation, but you have to switch to the Python 3k branch -- As for the other criticisms, I'm sure there are plenty of things that need to |
How would this differ from the normal communicate()? It seems like there are two different ideas for why people want an "asynchronous subprocess": One is that they want to use communicate() but not be limited by memory issues. Another use case is for an expect-type interface where you read and write based on a timeout or some kind of delimiter like a newline. These should probably be addressed independently. See also bpo-10482. |
It would block until one of the following occurs:
The normal communicate() could be implemented by running this in a loop. The amount of data returned at once could be limited by an argument of the constructor. |
I started to review the patch 5: When I read unit tests, I realized that I don't like "write_nonblocking" name. It's too generic. A process has many files (more than just stdin, stdout, stderr: see pass_fds parameter of Popen). I would like an explicit "write_stdin_nonblocking" and "read_stdout_nonblocking". |
Victor, I addressed the majority of your comments except for a couple stylistic pieces. Your annoyance with the short poll time for Windows made me re-read the docs from MS, which made me realize that my interpretation was wrong. It also made me confirm Richard Oudkerk's earlier note about ReadFile on overlapped IOs. I left similar notes next to your comments. On the method naming side of things, note that you can only write to "stdin", and you can only read from "stdout" or "stderr". This is documented behavior, so I believe the method names are already quite reasonable (and BDFL approved ;)). |
If you use the short timeouts to make the wait interruptible then you can |
Richard: short timeouts are no longer an issue. Nothing to worry about :) |
First off, thank you everyone who has reviewed and commented so far. I very much appreciate your input and efforts. Does anyone have questions, comments, or concerns about the patch? If no one mentions anything in the next week or so, I'll ping the email thread. |
@Josiah: Modifications of the asyncio module should be done first in the Tulip project because we try to keep the same code base in Tulip, Python 3.4 and 3.5. You may duplicate the code the avoid this dependency? For the documentation, I don't think that you need ".. note" and ".. warning", just write paragraphs of text. |
I submitted an issue to the tulip/asyncio bug tracker: https://code.google.com/p/tulip/issues/detail?id=170 And I am uploading a new patch that only includes non-tulip/asyncio related changes, as tulip/asyncio changes will eventually be propagated to Python. |
FWIW while the Tulip changes should indeed go into the Tulip repo first, the rest should be committed to the CPython 3.5 tree first. Then I'll commit the Tulip changes, first to the Tulip repo, then to the CPython 3.4 branch (yes!) and then merge that into the 3.5 (default) branch. (Yes, I know it's complicated, but it beats other ways of keeping the Tulip and CPython trees in sync. And I have a script in the Tulip tree that automates most of the work.) |
It seems the current API doesn't distinguish between BlockingIOError (temporary error), BrokenPipeError (permanent error) and EOF (permanent |
First, with the use of Overlapped IO on Windows, BlockingIOError should never come up, and we don't even handle that exception. As for BrokenPipeError vs. EOF; while we do treat them more or less the same, we aren't killing the process or reporting that the process is dead, and we tell users to check the results of Popen.poll() in the docs. Could we bubble up BrokenPipeError? Sure. Does it make sense? In the case of Popen.communicate(), exposing the error and changing expected behavior doesn't make sense. I have implemented and would continue to lean towards continuing to hide BrokenPipeError on the additional API endpoints. Why? If you know that a non-blocking send or receive could result in BrokenPipeError, now you need to wrap all of your communication pieces in BrokenPipeError handlers. Does it give you more control? Yes. But the majority of consumers of this API (and most APIs in general) will experience failure and could lose critical information before they realize, "Oh wait, I need to wrap all of these communication pieces in exception handlers." I would be open to adding either a keyword-only argument to the methods and/or an instance attribute to enable/disable raising of BrokenPipeErrors during the non-blocking calls if others think that this added level of control. I would lean towards an instance attribute that is defaulted to False. |
Is it correct that the way to distinguish between "would block" and EOF
It is an argument to catch *all* OSErrors internally that is not a good idea. Popen.communicate() hides EPIPE in the current implementation therefore its behaviour |
FYI asyncio.Process.communicate() ignores BrokenPipeError and ConnectionResetError, whereas asyncio.Process.stdin.drain() (coroutine to wait until all bytes are written) raises a BrokenPipeError or ConnectionResetError if the child process exited. I think subprocess has the same design. (I modified recently asyncio to ignore BrokenPipeError in communicate(), it was a bug.) |
STINNER Victor <report@bugs.python.org> writes:
Do Popen.write_nonblocking() and Popen.read_nonblocking() methods |
IMO when you write directly to stdin and read from stdout/stderr of a If you forget to poll the process, your program may enter an unlimited |
Okay, I'm sorry for falling asleep at the wheel on this. At this point, I don't expect this to go in for 3.5 - but if it has a chance, I'll do what I need to do to get it there. Let me know. I agree with the .communicate() not raising on broken pipe, and read/write_nonblocking raising on broken pipe. I'll get a patch in with the comments on the existing code review, along with those changes on Monday or Tuesday. |
Regarding special cases on the reading side, I think there should be an easy way to distinguish them. The existing RawIOBase way is probably as good as any, unless you want to take the chance to invent a better API:
Beware that BlockingIOError is normally not raised by Python’s standard library, despite the BufferedIOBase documentation, and I think it should say that way (the implementation, that is). See bpo-13322. Regarding handling a broken pipe condition, it would be nice to have a way of signalling that to the caller, whether via an exception or other means. For example if the subprocess is a video player which gets closed early by the end user, it would be good to know to stop wasting time and bandwidth generating and piping video the the player. |
I'm going to be honest; seeing None being returned from a pipe read feels *really* broken to me. When I get None returned from an IO read operation, my first instinct is "there can't be anything else coming, why else would it return None?" After changing writing to raise BrokenPipeError on a closed pipe, I've also changed reading to do the same and added a test to verify the behavior. |
It is how it is done in similar cases (returning |
Returning None for non blocking I/O is standard in Python. |
Josiah’s code now seems to handle the four special cases like this:
|
Non-blocking IO returning a None on "no data currently available" is new to me. It is new to me simply because I don't recall it ever happening in Python 2.x with any socket IO that I ever dealt with, and socket IO is my primary source for non-blocking IO experience. From my experience, reading from a socket would always return a string, unless the connection was closed/broken, at which point you got a bad file descriptor exception. Writing would return 0 if the buffer was full, or a bad file descriptor exception if the connection was closed/broken. The implementation of asyncore/asynchat at least until 3.0 shows this in practice, and I doubt this has changed in the intervening years since I updated those libraries for 3.0. My choice to implement this based on BrokenPipeError was based on reading Victor(haypo)'s comments from 2014-07-23 and 2014-07-24, in which he stated that in asyncio, when it comes to process management, .communicate() hides exceptions so that communication can finish, but direct reading and writing surface those exceptions. I was attempting to mimic the asyncio interface simply because it made sense to me coming from the socket world, and because asyncio is where people are likely to go if the non-blocking subprocess support in subprocess is insufficient for their needs. Note that I also initially resisted raising an exception in those methods, but Victor(haypo)'s comments changed my mind.
Better is subjective when it comes to API, but different? Yes, it's already implemented in patch #8. BrokenPipeError exception when no more data is sendable/receivable. A 0 or b'' when writing to a full buffer, or reading from an empty buffer. This API tries to be the same as the asyncio.subprocess.Process() behavior when accessing stdin, stdout, and stderr directly.
In some contexts, yes. In others, no. The asyncio.StreamReader() coroutines .read(), .readline(), and .readexactly() return strings. Raw async or sync sockets don't return None on read. SSL-wrapped async or sync sockets don't return None on read. Asyncio low-level socket operations don't yield None on a (currently empty) read. In the context of the subprocess module as it exists in 3.4 (and 3.5 as it is unpatched), the only object that returns None on read when no data is available, but where data *might* be available in the future, is an underlying posix pipe (on posix platforms) - which isn't generally used directly. The purpose of this patch is to expose stdin, stdout, and stderr in a way that allows non-blocking reads and writes from the subprocess that also plays nicely with .communicate() as necessary. Directly exposing the pipes doesn't work due to API inconsistencies between Windows and posix, so we have to add a layer. I would argue that this layer of non-blocking (Windows and posix) pipe abstraction has much more in common with how asyncio deals with process IO, or how sockets deal with IO, than it does with pipes in general (the vast majority of which are blocking), so I would contend that it should have a similar API (no returning None). That said, considering that the expected use of non-blocking subprocess communication *does not* include using a multiplexer (select, poll, etc.), I have the sense that a few other higher-level methods might be warranted to ease use substantially, and which I would expect people would end up using much more often than the lower-level direct read/write operations (different names might make more sense here): |
Yes, I think the special return value of None is only standard to the “io” module, which was added in Python 2.6 and 3. And even there it is inconsistent. For what it’s worth, don’t have strong opinions on the special non-blocking and broken pipe cases, as long as they can be differentiated. So the current situation is fine by me. Just that potentially surprising differences with other APIs [e.g. read() -> b"" means empty buffer, not closed pipe] need documenting. The asyncio.StreamReader.read() etc methods are documented as coroutines, so the problem of differentiating an empty buffer from a closed pipe does not exist there. Just a thought: would it make sense for these methods to be decoupled from the subprocess class, and generalized to apply to any pipes, as a kind of Windows vs POSIX abstraction layer? Then you wouldn’t have to add duplicate methods like read_stderr_available() and read_stderr_exactly() all the time. You might also have a more portable way of doing non-blocking reads from sys.stdin, etc as well. |
Someone else can resurrect this concept and/ore patch if they care about this feature. Best of luck to future readers. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: