Issue 24358: Should compression file-like objects provide .fileno(), misleading subprocess?

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68546

classification

Title:	Should compression file-like objects provide .fileno(), misleading subprocess?
Type:		Stage:
Components:	IO	Versions:	Python 3.6, Python 3.4, Python 3.5, Python 2.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	benjamin.peterson, jonash, josh.r, martin.panter, pitrou, stutzbach
Priority:	normal	Keywords:

Created on 2015-06-02 01:03 by josh.r, last changed 2022-04-11 14:58 by admin.

Messages (7)
msg244626 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2015-06-02 01:03
subprocess, when accepting objects for stdin, stdout, and stderr, assumes that possessing a .fileno() means it's a legitimate object for use with the forked process; that the file descriptor is interchangeable with the object itself. But gzip, bz2 and lzma file-like objects all violate this rule; they provide .fileno(), but it's unadorned. Providing .fileno() on these objects is misleading, since they produce the uncompressed data (likely useless) which causes subprocess to pass the "wrong" data to the subprocess, or write uncompressed data from the process (the exception being processes that expect compressed data from stdin or write compressed data to stdout, but that usually just means the compressor utilities themselves). Is subprocess's assumption about fileno() (that you can read equivalent data from it, modulo issues with flushing/seeking) intended? If so, should .fileno() be removed from the compressed file interfaces? If not, should subprocess attempt to perform further checking, document this wart, or something else?
msg244632 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2015-06-02 01:39
Apparently similar issue occurs when tarfile assumes a GzipFile can have its fileno() fstat-ed (see #22468). An awful lot of libraries seem to assume that fileno() will provide useful information about the data you'd read from the file-like object itself, but all the compressed file-like objects violate that expectation.
msg244652 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-06-02 05:34
Also related: Issue 23740, where the HTTP client assumes it can use stat() on the fileno() to determine the Content-Length. Providing fileno() on file wrapper objects like GzipFile is certainly not necessary, but it could be useful. For instance in the tarfile case, the modification time, file mode, owner user, etc may still be useful even if the file size isn’t. On the other hand, fileno() is a low level operation, so maybe it should only have been made available on light-weight RawIOBase objects or something. Even for a BufferedReader/Writer or TextIOWrapper, the data read from or written to the Python-level file object does not match the corresponding file descriptor operations. You could get deadlocks, data loss, etc, due to buffering. For example the commented-out-code near the bottom of <https://bugs.python.org/review/1191964/patch/11993/43982#newcode1901>. The subprocess module documentation only says that the streams can be “existing file objects”. I think it should at least be clarified to say the file objects are taken to represent OS-level file descriptors.
msg244663 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2015-06-02 10:32
Blech, typo earlier "since they produce the compressed data (likely useless) when read as subprocess stdin". Context should make it obvious, but trying to be clear.
msg265455 - (view)	Author: Jonas H. (jonash) *	Date: 2016-05-13 08:48
I just hit this too. I'd say remove the fileno() method from wrapper objects like GzipFile. I'm happy to submit a patch.
msg265464 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-05-13 11:36
Where would you draw the line though? At one extreme, BufferedWriter is a wrapper, but if you removed BufferedWriter.fileno() it would break all the code that does with open(os.devnull, "wb") as null: proc = subprocess.Popen(..., stdout=null) Would you remove it from HTTPResponse? Some HTTP responses can be chunked, so the child process would see the chunk headers using fileno(). But other HTTP responses are more direct and would work smoothly with the subprocess module. Considering the compatibility problems and other possible uses of fileno(), I suspect removing it would be a bad idea. “Throwing the baby out with the bathwater” comes to mind. A less drastic change would be to require an explicit fileno() call and only passing the file descriptor to subprocess. There is already a bug open for improving the documentation: Issue 19992
msg265523 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-05-14 11:53
Also stumbled upon Issue 1705393: confusion with select() and buffered files

History
Date	User	Action	Args
2022-04-11 14:58:17	admin	set	github: 68546
2016-05-14 11:53:23	martin.panter	set	messages: + msg265523
2016-05-13 11:36:25	martin.panter	set	messages: + msg265464
2016-05-13 08:48:56	jonash	set	nosy: + jonash messages: + msg265455
2015-06-02 10:32:32	josh.r	set	messages: + msg244663
2015-06-02 08:39:10	serhiy.storchaka	set	nosy: + pitrou, benjamin.peterson, stutzbach
2015-06-02 05:34:16	martin.panter	set	nosy: + martin.panter messages: + msg244652
2015-06-02 01:39:33	josh.r	set	messages: + msg244632
2015-06-02 01:03:01	josh.r	create