Title: io documentation unclear about flush() and close() semantics for wrapped streams
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.4, Python 3.5, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: indygreg, josh.r
Priority: normal Keywords:

Created on 2019-02-26 21:47 by indygreg, last changed 2019-03-23 17:04 by indygreg.

Messages (4)
msg336715 - (view) Author: Gregory Szorc (indygreg) * Date: 2019-02-26 21:47
As part of implementing io.RawIOBase/io.BufferedIOBase compatible stream types for python-zstandard, I became confused about the expected behavior of flush() and close() when a stream is wrapping another stream.

The documentation doesn't lay out explicitly when flush() and close() on the outer stream should be proxied to the inner stream, if ever.

Here are some specific questions (it would be great to have answers to these in the docs):

1. When flush() is called on the outer stream, should flush() be called on the inner stream as well? Or should we simply write() to the inner stream and not perform a flush() on that inner stream?

2. When close() is called on the outer stream, should close() be called on the inner stream as well?

3. If close() is not automatically called on the inner stream during an outer stream's close(), should the outer stream trigger a flush() or any other special behavior on the inner stream? Or should it just write() any lingering data and then go away?

4. Are any of the answers from 1-3 impacted by whether the stream is a reader or writer? (Obviously reader streams don't have a meaningful flush() implementation.)

5. Are any of the answers from 1-3 impacted by whether the stream is in blocking/non-blocking mode?

6. Do any of the answers from 1-3 vary depending on whether behavior is incurred by the outer stream's __exit__?

(This issue was inspired by
msg336724 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-02-27 00:25
The general rule of thumb is to have the API behave as if no wrapping is occurring. The outermost layer should still adhere to the documented API requirements, so both flush and close should cascade (flush says it flushes the "buffers" plural; all user mode buffers should be flushed, no matter the layer), closing the outer layer without closing the inner makes it non-intuitive as to *how* you close the inner layers at all.

So the answers are:

1. flush all layers when told to flush
2. close all layers when told to close (exception for when you accept a fd, you can take a closefd argument and avoid closing the fd itself, but all Python layers should be closed)
3. N/A (close cascades)
4. No.
5. Non-blocking I/O doesn't seem to have a completely consistent story, but from the existing classes, I'm guessing outer layers should try to write to the underlying stream when told to do so, and raise BlockingIOError if they'd would block. The close story should be roughly the same; try to flush, raise BlockingIOError if it would block, then close (which shouldn't do much since the buffers are flushed).
6. No.

#5 is weird, and we could probably use a more coherent story for how non-blocking I/O should be done (asyncio provides one story, selectors another, and the io module seems to sort of gloss over the design strategy for non-blocking I/O).

You can derive all but #5 from how the built-in open behaves; it returns different types (FileIO, Buffered*, TextIOWrapper), but whatever it returns, you don't need to pay attention to the layered aspect if you don't want to (most don't). flush works as expected (no flushing each layer independently), close works as expected (no leaked file handles from only closing the outer layer), using it with a with statement is identical to explicitly calling close with a try/finally block.
msg338686 - (view) Author: Gregory Szorc (indygreg) * Date: 2019-03-23 16:46
Thank you for the detailed reply, Josh.

I generally agree with what you are saying. However, I have some follow-ups.

In your answer to #2, you seem to distinguish between an "fd" (implying POSIX file descriptor) and "Python layers" (implying a file object). Is this correct? Should a "closefd" argument only apply when receiving an integer file descriptor? Or should a "closefd" argument also apply when receiving any io.RawIOBase object? If it doesn't apply for any generic file object, is there any recommended way to control whether close() cascades? (My opinion is that "closefd" should apply to any object with a close(), not just integer file descriptors.)

lzma.LZMAFile and gzip.GZIPFile do *not* cascade close() when a file object (as opposed to a named file) is passed into the constructor. (Presumably bz2.BZ2File behaves the same way but the documentation doesn't state this.) This is inconsistent with your answer that close() should always cascade. It's worth noting that the docs for GZIPFile explicitly call out reasons why close() does not cascade.

None of these compression *File constructors accept a "closefd" argument to control the behavior. If the goal is for close() to cascade by default, then it seems useful for these *File types to support automatic close() cascade. Although changing the default would be a backwards compatibility break and I'm not sure that's feasible. But at least you'd have the ability to opt in to the behavior.

It's also worth calling out the behavior of io.BytesIO and io.StringIO. When you close() these streams, you cannot call .getvalue() to get the buffer content. This is consistent with the io.RawIOBase behavior of not allowing I/O after a close(). However, the implication is that if you wrap a BytesIO/StringIO and then a close() on the outer stream cascades into the BytesIO/StringIO, you won't be able to access the written data! In fact, I encountered this problem when writing python-zstandard's unit tests! I had to implement a custom type that allowed access to getvalue() post close() ( to work around it.

Assuming a "closefd" argument applies to all file objects (not just file descriptors) and that all stream wrappers should accept a "closefd" argument to control close() cascade, I think I have a path forward for python-zstandard: I simply add a "closefd" argument to the stream wrapper constructors. But if "closefd" doesn't apply to generic file objects, then I'm still not sure how to proceed, as I don't want to implement behavior that conflicts with the standard library's.
msg338690 - (view) Author: Gregory Szorc (indygreg) * Date: 2019-03-23 17:04
It's also worth noting that if the wrapped stream close() cascading behavior should be configurable, then many additional types in the standard library need a constructor argument to control this behavior. e.g. io.TextIOWrapper, io.BufferedReader, io.BufferedWriter, codecs.StreamReader, codecs.StreamWriter, etc.

In my influenced-by-Rust mind, the problem is similar to "ownership." The question we seem to be dancing around is whether the stream wrapper "owns" the inner stream or just "borrows" it. If it "owns," then close() cascading absolutely makes sense. But if it just "borrows" the inner stream, then close() cascading should not occur. There are viable use cases for both scenarios on pretty much any stream wrapper. Since existing stream wrappers automatically perform close() cascading and __del__ will call close(), I'd be a bit surprised if others weren't complaining about the inability to disable close() cascade on stream wrappers! e.g. it is perfectly viable to want to temporarily wrap a binary stream with an io.TextIOWrapper but the forced close() cascading makes this difficult to do since destruction of the outer stream will close() the inner stream. So you end up needing to keep references to outer streams alive to prevent this. Eww.
Date User Action Args
2019-03-23 17:04:31indygregsetmessages: + msg338690
2019-03-23 16:46:04indygregsetmessages: + msg338686
2019-02-27 00:25:20josh.rsetnosy: + josh.r
messages: + msg336724
2019-02-26 21:47:13indygregcreate