Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add non-blocking read and write methods to subprocess.Popen #41922

Closed
josiahcarlson mannequin opened this issue Apr 28, 2005 · 64 comments
Closed

add non-blocking read and write methods to subprocess.Popen #41922

josiahcarlson mannequin opened this issue Apr 28, 2005 · 64 comments
Labels
stdlib Python modules in the Lib dir topic-asyncio type-feature A feature request or enhancement

Comments

@josiahcarlson
Copy link
Mannequin

josiahcarlson mannequin commented Apr 28, 2005

BPO 1191964
Nosy @gvanrossum, @josiahcarlson, @vstinner, @giampaolo, @devdanzin, @bitdancer, @4kir4, @vadmium, @Janzert, @1st1, @eryksun
Files
  • sp_diff.txt
  • subprocess.patch: Add non-blocking subprocess access
  • subprocess_2.patch: replace select() with selectors
  • subprocess_3.patch: Remove looping write_nonblocking(), addresses comments
  • subprocess_4.patch: Fixes Windows writes, improves docs, removes universal newlines with non-blocking IO
  • subprocess_5.patch: Fixes a partial comment on a new test
  • subprocess_6.patch: Addressed comments, multi-platform cleanup
  • subprocess_7.patch: Removes tulip/asyncio related changes
  • subprocess_8.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-05-02.04:12:38.587>
    created_at = <Date 2005-04-28.20:40:52.000>
    labels = ['type-feature', 'library', 'expert-asyncio']
    title = 'add non-blocking read and write methods to subprocess.Popen'
    updated_at = <Date 2019-05-02.13:03:55.336>
    user = 'https://github.com/josiahcarlson'

    bugs.python.org fields:

    activity = <Date 2019-05-02.13:03:55.336>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-05-02.04:12:38.587>
    closer = 'josiahcarlson'
    components = ['Library (Lib)', 'asyncio']
    creation = <Date 2005-04-28.20:40:52.000>
    creator = 'josiahcarlson'
    dependencies = []
    files = ['8252', '34743', '34774', '34794', '34851', '34861', '34941', '35407', '38691']
    hgrepos = []
    issue_num = 1191964
    keywords = ['patch']
    message_count = 64.0
    messages = ['54505', '54506', '54507', '54508', '54509', '54510', '54511', '54512', '54513', '54514', '54515', '84692', '84719', '89309', '89433', '114495', '161326', '161352', '161356', '161362', '161400', '161409', '177139', '215171', '215178', '215411', '215420', '215426', '215496', '215582', '215668', '215708', '215709', '215711', '215826', '215833', '215956', '215969', '215990', '216226', '216442', '216445', '216665', '216669', '216692', '218826', '218861', '219373', '219374', '219432', '219616', '219656', '223799', '223827', '223828', '238874', '238887', '239277', '239280', '239282', '239299', '239366', '239372', '341245']
    nosy_count = 21.0
    nosy_names = ['gvanrossum', 'josiahcarlson', 'astrand', 'parameter', 'vstinner', 'techtonik', 'giampaolo.rodola', 'ajaksu2', 'ooooooooo', 'v+python', 'r.david.murray', 'cvrebert', 'ericpruitt', 'akira', 'Andrew.Boettcher', 'rosslagerwall', 'sbt', 'martin.panter', 'janzert', 'yselivanov', 'eryksun']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1191964'
    versions = ['Python 3.5']

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Apr 28, 2005

    It would be terribly nice if the Popen class in the
    subprocess module would allow a programmer to easily
    say "send some data right now (if I have some to send)
    and receive whatever information is available right
    now". Essentially the equivalent of
    asyncore.loop(count=1), only that it returns the data
    received, instead of placing it in a buffer.

    Why would this functionality be useful? Currently,
    when using the subprocess module with pipes, the
    interaction with a pipe is limited to "send data if
    desired, close the subprocess' stdin, read all output
    from the subprocess' stdout, ...".

    Certainly one can pull the pipes out of the Popen
    instance, and perform the necessary functions on posix
    systems (with select or poll), but the additional magic
    on WIndows is a little less obvious (but still possible).

    There is a post by Paul Du Bois with an example using
    win32pipe.PeekNamedPipe:
    http://groups-beta.google.com/group/comp.lang.python/msg/115e9332cc1ca09d?hl=en

    And a message from Neil Hodgeson stating that
    PeekNamedPipe works on anonymous pipes:
    http://mail.python.org/pipermail/python-dev/2000-May/004200.html

    With this modification, creating Expect-like modules
    for any platform, as well as embedded shells inside any
    GUI with a text input widget, becomes easy. Heck, even
    Idle could use it rather than a socket for its
    interactive interpreter.

    @josiahcarlson josiahcarlson mannequin assigned astrand Apr 28, 2005
    @josiahcarlson josiahcarlson mannequin added the type-feature A feature request or enhancement label Apr 28, 2005
    @josiahcarlson josiahcarlson mannequin assigned astrand Apr 28, 2005
    @josiahcarlson josiahcarlson mannequin added the type-feature A feature request or enhancement label Apr 28, 2005
    @parameter
    Copy link
    Mannequin

    parameter mannequin commented Apr 28, 2005

    Logged In: YES
    user_id=473331

    More control of sub-processes would be great. Several times
    in the past few years I've done stuff with os.system,
    popenN, or spawn* and it would be useful to have even more
    cross-platform consistency and support in this area.
    Anyway, this is just a vote to encurage other developers.

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented May 28, 2005

    Logged In: YES
    user_id=341410

    I've got a version of subprocess that has this functionality
    with pywin32. Making it work on *nix systems with usable
    select is trivial.

    About the only question is whether the functionality is
    desireable, and if so, what kind of API is reasonable.

    Perhaps adding an optional argument 'wait_for_completion' to
    the communicate method, which defaults to True for executing
    what is currently in the body of communicate.

    If the 'wait_for_completion' argument is untrue, then it
    does non-waiting asynchronous IO, returning either a 2 or
    3-tuple on completion. If input is None, it is a 2-tuple of
    (stdout, stderr). If input is a string, it is a 3-tuple of
    (bytes_sent, stdout, stderr).

    How does that sound?

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented May 29, 2005

    Logged In: YES
    user_id=341410

    I suppose I should mention one side-effect of all this. It
    requires three more functions from pywin32 be included in
    the _subprocess driver; win32file.ReadFile,
    win32file.WriteFile, and win32pipe.PeekNamedPipe . Read and
    Peek are for reading data from stdout and stderr, and Write
    is for the support for partial writes to stdin.

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Jun 26, 2005

    Logged In: YES
    user_id=341410

    How about if subprocesses have 3 new methods, send(input),
    recv(maxlen), and recv_stderr(maxlen).

    send(input) would perform like socket.send(), sending as
    much as it currently can, returning the number of bytes sent.
    recv(maxlen) and recv_stderr(maxlen) would recieve up to the
    provided number of bytes from the stdout or stderr pipes
    respectively.

    I currently have an implementation of the above on Windows
    and posix. I include the context diff against revision 1.20
    of subprocess.py in current CVS.

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Sep 21, 2005

    Logged In: YES
    user_id=341410

    I've implemented this as a subclass of subprocess.Popen in
    the Python cookbook, available here:
    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440554

    Essentially this is a request for inclusion in the standard
    library for Python 2.5 .

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Sep 21, 2005

    Logged In: YES
    user_id=341410

    I've implemented this as a subclass of subprocess.Popen in
    the Python cookbook, available here:
    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440554

    @ooooooooo
    Copy link
    Mannequin

    ooooooooo mannequin commented Dec 30, 2006

    I would also like to see this feature. I'm using Josiah Carlson's recipe for the time being. I'm agnostic about whether the asynchronicity is a "mode" or just a case of using different functions. However, if the former is chosen, then one should be able to switch modes at will, because sometimes I want blocking and sometimes I don't.

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Dec 30, 2006

    The way subprocess is currently written, if you were to use a blocking semantic, you would no longer be able to use an asynchronous semantic afterwards (it will read the result until there is nothing more to read). Note that this is the case whether it is mode or method based.

    @ooooooooo
    Copy link
    Mannequin

    ooooooooo mannequin commented Dec 31, 2006

    If there were a blocking "read x bytes" type call, could you not do some non-blocking stuff afterwards? If you use the "read" method, with a byte limit, directly on the stdout member of Popen, that could return with your bytes, and then you do some non-blocking stuff afterwards. That's the external view of course, I suspect that you're saying there's some internal lower-level reason this is currently not possible.

    Thanks for your interim class BTW, it is proving to be very useful.

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Dec 31, 2006

    Unless you use PeekNamedPipe on Windows, there is no guarantee that the pipe will return *any* data until the process ends, even if you say pipe.read(1) and there is 1k of data already sent.

    Note that the two async reading methods that I provide (recv() and recv_err()) take a 'maximum bytes' argument. Just like I have written a 'send_all()' utility function, I (or you) can easily write a 'recv_exact()' utility function that receives an exact number of bytes before returning. That functionality would be more or less required for one of the expected use-cases I specify in the recipe, "writing a multi-platform 'expect' module".

    Stick with async calls (with the utility recv_exact()) until you need to use the .communicate() method.

    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Mar 30, 2009

    Josiah: can this be closed?

    @devdanzin devdanzin mannequin added stdlib Python modules in the Lib dir labels Mar 30, 2009
    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Mar 30, 2009

    I don't believe this should be closed. The functionality is still
    desired by me and others who have posted on and off since the patch was
    created. This patch definitely needs some love (tests mostly).

    @ericpruitt
    Copy link
    Mannequin

    ericpruitt mannequin commented Jun 13, 2009

    Hello, I am working on this patch and some additional features for my
    Google Summer of Code project (subdev.blogspot.com) and will eventually
    attempt to get the code committed to Python 3.1 or 3.2 and Python 2.7. I
    will have the unit tests completed shortly and will post a patch for
    Python 2.7 and / or Python 3.1rc1.

    @bitdancer
    Copy link
    Member

    Retargeting to 3.2 since 3.1 is now feature-frozen.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Aug 21, 2010

    PEP-3145 has been written in response to this request.

    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented May 22, 2012

    What is the status? What is required to get this into 3.3? See also bpo-14872.

    @rosslagerwall rosslagerwall mannequin unassigned astrand May 22, 2012
    @sbt
    Copy link
    Mannequin

    sbt mannequin commented May 22, 2012

    Comments on Josiah's patch:

    • It uses pywin32 for PeekNamedPipe -- this is now available from _winapi.

    • I don't think send(), recv() and recv_exact() will work correctly if
      buffering is used -- an error should be raised in this case.

    • I think send(), recv(), recv_exact() should raise errors if the
      associated stream is None instead of returning 0 or ''.

    Additional comments on http://code.google.com/p/subprocdev

    • In a few places the code does "raise Exception(...)" -- some other
      exception class should be used.

    • It is not clear to me why TextIOWrapper should try to implement seek()
      and tell(). (The file-like wrapper for a socket does not.)

    • I don't like the hardwired timeouts for faking asynchronicity:
      Popen.asyncread(), Popen.asyncwrite() use 0.1 seconds, and
      TextIOWrapper.read() uses 1.25 seconds.

    • There is no binary counterpart to TextIOWrapper.

    • Using fcntl.fcntl() to switch back and forth between blocking and
      unblocking is unnecessary. select() guarantees the read/write on a pipe
      will succeed without blocking. (The same is not necessarily true with
      a socket.)

    • __closecheck() should be renamed _closecheck().

    • FileWrapper is just a wrapper for TextIOWrapper, except that it has an
      ignored newlines argument.

    • I can't see any documentation.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented May 22, 2012

    Personally, I would factor out the code for Popen.communicate() in to a Communicator class which wraps a Popen object and has a method

    communicate(input, timeout=None) -> (bytes_written, output, error)
    

    On Windows this would use threads, and on Unix, select.

    @ericpruitt
    Copy link
    Mannequin

    ericpruitt mannequin commented May 22, 2012

    There's documentation, but you have to switch to the Python 3k branch --
    http://code.google.com/p/subprocdev/source/browse/?name=python3k#hg%2Fdoc.

    As for the other criticisms, I'm sure there are plenty of things that need to
    be improved upon; I was not a very experienced when I started the project.

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented May 23, 2012

    Personally, I would factor out the code for Popen.communicate() in to a > Communicator class which wraps a Popen object and has a method

    communicate(input, timeout=None) -> (bytes_written, output, error)

    How would this differ from the normal communicate()?

    It seems like there are two different ideas for why people want an "asynchronous subprocess":

    One is that they want to use communicate() but not be limited by memory issues.
    I think a good API for this case is an asyncore style API or like the one from the patch in bpo-1260171.

    Another use case is for an expect-type interface where you read and write based on a timeout or some kind of delimiter like a newline.

    These should probably be addressed independently.

    See also bpo-10482.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented May 23, 2012

    How would this differ from the normal communicate()?

    It would block until one of the following occurs:

    • some data has been written to stdin,
    • some data has been read from stdout or stderr, or
    • timeout passes (if timeout is not None).

    The normal communicate() could be implemented by running this in a loop. The amount of data returned at once could be limited by an argument of the constructor.

    @vstinner
    Copy link
    Member

    I started to review the patch 5:
    http://bugs.python.org/review/1191964/#ps11598

    When I read unit tests, I realized that I don't like "write_nonblocking" name. It's too generic. A process has many files (more than just stdin, stdout, stderr: see pass_fds parameter of Popen). I would like an explicit "write_stdin_nonblocking" and "read_stdout_nonblocking".

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Apr 17, 2014

    Victor, I addressed the majority of your comments except for a couple stylistic pieces. Your annoyance with the short poll time for Windows made me re-read the docs from MS, which made me realize that my interpretation was wrong. It also made me confirm Richard Oudkerk's earlier note about ReadFile on overlapped IOs.

    I left similar notes next to your comments.

    On the method naming side of things, note that you can only write to "stdin", and you can only read from "stdout" or "stderr". This is documented behavior, so I believe the method names are already quite reasonable (and BDFL approved ;)).

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 17, 2014

    If you use the short timeouts to make the wait interruptible then you can
    use waitformultipleobjects (which automatically waits on an extra event
    object) instead of waitforsingleobject.

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Apr 17, 2014

    Richard: short timeouts are no longer an issue. Nothing to worry about :)

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented May 20, 2014

    First off, thank you everyone who has reviewed and commented so far. I very much appreciate your input and efforts.

    Does anyone have questions, comments, or concerns about the patch? If no one mentions anything in the next week or so, I'll ping the email thread.

    @vstinner
    Copy link
    Member

    @Josiah: Modifications of the asyncio module should be done first in the Tulip project because we try to keep the same code base in Tulip, Python 3.4 and 3.5. You may duplicate the code the avoid this dependency?

    For the documentation, I don't think that you need ".. note" and ".. warning", just write paragraphs of text.

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented May 30, 2014

    I submitted an issue to the tulip/asyncio bug tracker: https://code.google.com/p/tulip/issues/detail?id=170

    And I am uploading a new patch that only includes non-tulip/asyncio related changes, as tulip/asyncio changes will eventually be propagated to Python.

    @gvanrossum
    Copy link
    Member

    FWIW while the Tulip changes should indeed go into the Tulip repo first, the rest should be committed to the CPython 3.5 tree first. Then I'll commit the Tulip changes, first to the Tulip repo, then to the CPython 3.4 branch (yes!) and then merge that into the 3.5 (default) branch. (Yes, I know it's complicated, but it beats other ways of keeping the Tulip and CPython trees in sync. And I have a script in the Tulip tree that automates most of the work.)

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented May 31, 2014

    Does anyone have questions, comments, or concerns about the patch?

    It seems the current API doesn't distinguish between BlockingIOError (temporary error), BrokenPipeError (permanent error) and EOF (permanent
    non-error condition) -- everything is treated as EOF. Is it intentional?

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Jun 2, 2014

    First, with the use of Overlapped IO on Windows, BlockingIOError should never come up, and we don't even handle that exception.

    As for BrokenPipeError vs. EOF; while we do treat them more or less the same, we aren't killing the process or reporting that the process is dead, and we tell users to check the results of Popen.poll() in the docs.

    Could we bubble up BrokenPipeError? Sure. Does it make sense? In the case of Popen.communicate(), exposing the error and changing expected behavior doesn't make sense.

    I have implemented and would continue to lean towards continuing to hide BrokenPipeError on the additional API endpoints. Why? If you know that a non-blocking send or receive could result in BrokenPipeError, now you need to wrap all of your communication pieces in BrokenPipeError handlers. Does it give you more control? Yes. But the majority of consumers of this API (and most APIs in general) will experience failure and could lose critical information before they realize, "Oh wait, I need to wrap all of these communication pieces in exception handlers."

    I would be open to adding either a keyword-only argument to the methods and/or an instance attribute to enable/disable raising of BrokenPipeErrors during the non-blocking calls if others think that this added level of control. I would lean towards an instance attribute that is defaulted to False.

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Jun 3, 2014

    First, with the use of Overlapped IO on Windows, BlockingIOError should never come up, and we don't even handle that exception.

    Is it correct that the way to distinguish between "would block" and EOF
    is Popen.poll()? Is it possible to detect it in _nonblocking() methods
    and raise BlockingIOError manually (or return None for read_..()) in "would block" case?

    But the majority of consumers of this API (and most APIs in general) will experience failure and could lose critical information before they realize, "Oh wait, I need to wrap all of these communication pieces in exception handlers."

    It is an argument to catch *all* OSErrors internally that is not a good idea.

    Popen.communicate() hides EPIPE in the current implementation therefore its behaviour
    shouldn't change. But subprocess' code allows Popen.stdin.write() to raise "broken pipe".
    My questions are about _nonblocking() methods e.g., it is not obvious whether
    write_nonblocking() should mask EPIPE as EOF. If it does then It should be documented.

    @vstinner vstinner changed the title asynchronous Subprocess add non-blocking read and write methods to subprocess.Popen Jun 19, 2014
    @vstinner vstinner changed the title asynchronous Subprocess add non-blocking read and write methods to subprocess.Popen Jun 19, 2014
    @vstinner
    Copy link
    Member

    I have implemented and would continue to lean towards continuing to hide BrokenPipeError on the additional API endpoints.

    FYI asyncio.Process.communicate() ignores BrokenPipeError and ConnectionResetError, whereas asyncio.Process.stdin.drain() (coroutine to wait until all bytes are written) raises a BrokenPipeError or ConnectionResetError if the child process exited. I think subprocess has the same design.

    (I modified recently asyncio to ignore BrokenPipeError in communicate(), it was a bug.)

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Jul 24, 2014

    STINNER Victor <report@bugs.python.org> writes:

    > I have implemented and would continue to lean towards continuing to
    hide BrokenPipeError on the additional API endpoints.

    FYI asyncio.Process.communicate() ignores BrokenPipeError and
    ConnectionResetError, whereas asyncio.Process.stdin.drain() (coroutine
    to wait until all bytes are written) raises a BrokenPipeError or
    ConnectionResetError if the child process exited. I think subprocess
    has the same design.

    Do Popen.write_nonblocking() and Popen.read_nonblocking() methods
    belong to the second category? Should they raise BrokenPipeError?

    @vstinner
    Copy link
    Member

    Do Popen.write_nonblocking() and Popen.read_nonblocking() methods
    belong to the second category? Should they raise BrokenPipeError?

    IMO when you write directly to stdin and read from stdout/stderr of a
    child process, your code should be written to handle BrokenPipeError.
    You must decide how to handle them. Otherwise, you have to poll
    manually the child process using proc.poll() which is less efficient.

    If you forget to poll the process, your program may enter an unlimited
    loop which only occur in some cases (bug in the child process which
    exits before reading the whole stdin input).

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Mar 22, 2015

    Okay, I'm sorry for falling asleep at the wheel on this. At this point, I don't expect this to go in for 3.5 - but if it has a chance, I'll do what I need to do to get it there. Let me know.

    I agree with the .communicate() not raising on broken pipe, and read/write_nonblocking raising on broken pipe. I'll get a patch in with the comments on the existing code review, along with those changes on Monday or Tuesday.

    @vadmium
    Copy link
    Member

    vadmium commented Mar 22, 2015

    Regarding special cases on the reading side, I think there should be an easy way to distinguish them. The existing RawIOBase way is probably as good as any, unless you want to take the chance to invent a better API:

    • EOF (pipe closed, no more data ever) should be indicated by returning b""
    • Empty pipe (still open, future data possible) should be indicated by returning None

    Beware that BlockingIOError is normally not raised by Python’s standard library, despite the BufferedIOBase documentation, and I think it should say that way (the implementation, that is). See bpo-13322.

    Regarding handling a broken pipe condition, it would be nice to have a way of signalling that to the caller, whether via an exception or other means. For example if the subprocess is a video player which gets closed early by the end user, it would be good to know to stop wasting time and bandwidth generating and piping video the the player.

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Mar 25, 2015

    I'm going to be honest; seeing None being returned from a pipe read feels *really* broken to me. When I get None returned from an IO read operation, my first instinct is "there can't be anything else coming, why else would it return None?"

    After changing writing to raise BrokenPipeError on a closed pipe, I've also changed reading to do the same and added a test to verify the behavior.

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Mar 25, 2015

    I'm going to be honest; seeing None being returned from a pipe read feels *really* broken to me. When I get None returned from an IO read operation, my first instinct is "there can't be anything else coming, why else would it return None?"

    It is how it is done in similar cases (returning None to indicate
    "would block"). Do you know a better method that would allow to
    distinguish between EOF (no future data, ever) and "would block"
    (future data possible) without calling process.poll()?

    @vstinner
    Copy link
    Member

    Returning None for non blocking I/O is standard in Python.

    @vadmium
    Copy link
    Member

    vadmium commented Mar 26, 2015

    Josiah’s code now seems to handle the four special cases like this:

    • Reading a closed pipe raises BrokenPipeError. This is novel to me, and bends the current definition of the exception. But it appears to be the way Windows works at a lower level, and does make a bit of sense to me. However it is inconsistent with how normal reading from files, pipes and sockets works, which return the special value b"" in the equivalent case. Also various communications protocols use an empty packet to signal EOF.

    • Reading an empty pipe returns b"". This is consistent with partial reads, which I like. However it could be surprising for someone used to b"" signalling EOF. Alternatives are returning None like RawIOBase.read(), and raising BlockingIOError like os.write() and socket.send().

    • Writing to a closed pipe raises BrokenPipeError. This is consistent with how other APIs work, but since it is an asynchronous condition, a naive program may work sometimes without handling it, and then see suprious tracebacks or worse in other situations. Maybe dropping the data and setting a flag could be an alternative.

    • Writing to a full pipe returns zero. This is consistent with partial writes, which I like. Alternatives are returning None like RawIOBase.write(), and raising BlockingIOError like os.write(), socket.send(), BufferedIOBase.write().

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented Mar 26, 2015

    Non-blocking IO returning a None on "no data currently available" is new to me. It is new to me simply because I don't recall it ever happening in Python 2.x with any socket IO that I ever dealt with, and socket IO is my primary source for non-blocking IO experience.

    From my experience, reading from a socket would always return a string, unless the connection was closed/broken, at which point you got a bad file descriptor exception. Writing would return 0 if the buffer was full, or a bad file descriptor exception if the connection was closed/broken. The implementation of asyncore/asynchat at least until 3.0 shows this in practice, and I doubt this has changed in the intervening years since I updated those libraries for 3.0.

    My choice to implement this based on BrokenPipeError was based on reading Victor(haypo)'s comments from 2014-07-23 and 2014-07-24, in which he stated that in asyncio, when it comes to process management, .communicate() hides exceptions so that communication can finish, but direct reading and writing surface those exceptions.

    I was attempting to mimic the asyncio interface simply because it made sense to me coming from the socket world, and because asyncio is where people are likely to go if the non-blocking subprocess support in subprocess is insufficient for their needs.

    Note that I also initially resisted raising an exception in those methods, but Victor(haypo)'s comments changed my mind.

    Do you know a better method that would allow to distinguish between EOF (no future data, ever) and "would block" (future data possible) without calling process.poll()?

    Better is subjective when it comes to API, but different? Yes, it's already implemented in patch #8. BrokenPipeError exception when no more data is sendable/receivable. A 0 or b'' when writing to a full buffer, or reading from an empty buffer. This API tries to be the same as the asyncio.subprocess.Process() behavior when accessing stdin, stdout, and stderr directly.

    Returning None for non blocking I/O is standard in Python.

    In some contexts, yes. In others, no. The asyncio.StreamReader() coroutines .read(), .readline(), and .readexactly() return strings. Raw async or sync sockets don't return None on read. SSL-wrapped async or sync sockets don't return None on read. Asyncio low-level socket operations don't yield None on a (currently empty) read.

    In the context of the subprocess module as it exists in 3.4 (and 3.5 as it is unpatched), the only object that returns None on read when no data is available, but where data *might* be available in the future, is an underlying posix pipe (on posix platforms) - which isn't generally used directly.

    The purpose of this patch is to expose stdin, stdout, and stderr in a way that allows non-blocking reads and writes from the subprocess that also plays nicely with .communicate() as necessary. Directly exposing the pipes doesn't work due to API inconsistencies between Windows and posix, so we have to add a layer.

    I would argue that this layer of non-blocking (Windows and posix) pipe abstraction has much more in common with how asyncio deals with process IO, or how sockets deal with IO, than it does with pipes in general (the vast majority of which are blocking), so I would contend that it should have a similar API (no returning None).

    That said, considering that the expected use of non-blocking subprocess communication *does not* include using a multiplexer (select, poll, etc.), I have the sense that a few other higher-level methods might be warranted to ease use substantially, and which I would expect people would end up using much more often than the lower-level direct read/write operations (different names might make more sense here):
    .write_all(data, timeout=None)
    .read_available(bufsize=-1) (and .read_stderr_available)
    .read_exactly(bufsize, timeout=None) (and .read_stderr_exactly)

    @vadmium
    Copy link
    Member

    vadmium commented Mar 27, 2015

    Yes, I think the special return value of None is only standard to the “io” module, which was added in Python 2.6 and 3. And even there it is inconsistent.

    For what it’s worth, don’t have strong opinions on the special non-blocking and broken pipe cases, as long as they can be differentiated. So the current situation is fine by me. Just that potentially surprising differences with other APIs [e.g. read() -> b"" means empty buffer, not closed pipe] need documenting.

    The asyncio.StreamReader.read() etc methods are documented as coroutines, so the problem of differentiating an empty buffer from a closed pipe does not exist there.

    Just a thought: would it make sense for these methods to be decoupled from the subprocess class, and generalized to apply to any pipes, as a kind of Windows vs POSIX abstraction layer? Then you wouldn’t have to add duplicate methods like read_stderr_available() and read_stderr_exactly() all the time. You might also have a more portable way of doing non-blocking reads from sys.stdin, etc as well.

    @josiahcarlson
    Copy link
    Mannequin Author

    josiahcarlson mannequin commented May 2, 2019

    Someone else can resurrect this concept and/ore patch if they care about this feature. Best of luck to future readers.

    @josiahcarlson josiahcarlson mannequin closed this as completed May 2, 2019
    @josiahcarlson josiahcarlson mannequin closed this as completed May 2, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-asyncio type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants