This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Should compression file-like objects provide .fileno(), misleading subprocess?
Type: Stage:
Components: IO Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, jonash, josh.r, martin.panter, pitrou, stutzbach
Priority: normal Keywords:

Created on 2015-06-02 01:03 by josh.r, last changed 2022-04-11 14:58 by admin.

Messages (7)
msg244626 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2015-06-02 01:03
subprocess, when accepting objects for stdin, stdout, and stderr, assumes that possessing a .fileno() means it's a legitimate object for use with the forked process; that the file descriptor is interchangeable with the object itself. But gzip, bz2 and lzma file-like objects all violate this rule; they provide .fileno(), but it's unadorned. Providing .fileno() on these objects is misleading, since they produce the uncompressed data (likely useless) which causes subprocess to pass the "wrong" data to the subprocess, or write uncompressed data from the process (the exception being processes that expect compressed data from stdin or write compressed data to stdout, but that usually just means the compressor utilities themselves).

Is subprocess's assumption about fileno() (that you can read equivalent data from it, modulo issues with flushing/seeking) intended? If so, should .fileno() be removed from the compressed file interfaces? If not, should subprocess attempt to perform further checking, document this wart, or something else?
msg244632 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2015-06-02 01:39
Apparently similar issue occurs when tarfile assumes a GzipFile can have its fileno() fstat-ed (see #22468). An awful lot of libraries seem to assume that fileno() will provide useful information about the data you'd read from the file-like object itself, but all the compressed file-like objects violate that expectation.
msg244652 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-06-02 05:34
Also related: Issue 23740, where the HTTP client assumes it can use stat() on the fileno() to determine the Content-Length.

Providing fileno() on file wrapper objects like GzipFile is certainly not necessary, but it could be useful. For instance in the tarfile case, the modification time, file mode, owner user, etc may still be useful even if the file size isn’t.

On the other hand, fileno() is a low level operation, so maybe it should only have been made available on light-weight RawIOBase objects or something. Even for a BufferedReader/Writer or TextIOWrapper, the data read from or written to the Python-level file object does not match the corresponding file descriptor operations. You could get deadlocks, data loss, etc, due to buffering. For example the commented-out-code near the bottom of <>.

The subprocess module documentation only says that the streams can be “existing file objects”. I think it should at least be clarified to say the file objects are taken to represent OS-level file descriptors.
msg244663 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2015-06-02 10:32
Blech, typo earlier "since they produce the *compressed* data (likely useless) when read as subprocess stdin". Context should make it obvious, but trying to be clear.
msg265455 - (view) Author: Jonas H. (jonash) * Date: 2016-05-13 08:48
I just hit this too. I'd say remove the fileno() method from wrapper objects like GzipFile. I'm happy to submit a patch.
msg265464 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-13 11:36
Where would you draw the line though? At one extreme, BufferedWriter is a wrapper, but if you removed BufferedWriter.fileno() it would break all the code that does

with open(os.devnull, "wb") as null:
    proc = subprocess.Popen(..., stdout=null)

Would you remove it from HTTPResponse? Some HTTP responses can be chunked, so the child process would see the chunk headers using fileno(). But other HTTP responses are more direct and would work smoothly with the subprocess module.

Considering the compatibility problems and other possible uses of fileno(), I suspect removing it would be a bad idea. “Throwing the baby out with the bathwater” comes to mind. A less drastic change would be to require an explicit fileno() call and only passing the file descriptor to subprocess.

There is already a bug open for improving the documentation: Issue 19992
msg265523 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-14 11:53
Also stumbled upon Issue 1705393: confusion with select() and buffered files
Date User Action Args
2022-04-11 14:58:17adminsetgithub: 68546
2016-05-14 11:53:23martin.pantersetmessages: + msg265523
2016-05-13 11:36:25martin.pantersetmessages: + msg265464
2016-05-13 08:48:56jonashsetnosy: + jonash
messages: + msg265455
2015-06-02 10:32:32josh.rsetmessages: + msg244663
2015-06-02 08:39:10serhiy.storchakasetnosy: + pitrou, benjamin.peterson, stutzbach
2015-06-02 05:34:16martin.pantersetnosy: + martin.panter
messages: + msg244652
2015-06-02 01:39:33josh.rsetmessages: + msg244632
2015-06-02 01:03:01josh.rcreate