This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Cannot pipe GzipFile into subprocess
Type: behavior Stage:
Components: IO, Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Nehal Patel, SilentGhost, giampaolo.rodola, mherrmann.at
Priority: normal Keywords:

Created on 2020-06-06 00:56 by Nehal Patel, last changed 2022-04-11 14:59 by admin.

Messages (4)
msg370804 - (view) Author: Nehal Patel (Nehal Patel) Date: 2020-06-06 00:56
The following code produces incorrect behavior:

with gzip.open("foo.gz") as gz:
    res = subprocess.run("cat", stdin=gz, capture_output=True)

the contents of res.stdout are identical to the contents of "foo.gz" 

It seems the subprocess somehow gets a hold of the underlying file descriptor pointing to the compressed file, and ends up being fed the compressed bytes.
msg370815 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2020-06-06 09:09
> subprocess somehow gets a hold of the underlying file descriptor pointing to the compressed file, and ends up being fed the compressed bytes

That is exactly what happens, and I'd wager this is not going to change. You could easily pass the decoded bytes into the process using input parameter.
msg370820 - (view) Author: Nehal Patel (Nehal Patel) Date: 2020-06-06 12:44
In my use case, I was actually trying to stream a large gzip file from the cloud directly into subprocess without spilling onto disk or RAM i.e. the code actually  looked something more like:

r, w = os.pipe()
# ... launch a thread to feed r
with gzip.open(os.fdopen(w, 'rb')) as gz:
    res = subprocess.run("myexe", stdin=gz, capture_output=True)
## fyi, expected output is tiny
 
(In my case, I could modify the executable to expect compressed input, so I chose that solution.  Another possibility would have been to use subprocess.POpen twice, once with  'gzcat' and second with 'myexe')

I agree that given how libgz works, it would be difficult to fix the  problem.  I would suggest finding a way to alert the user about this issue because it will in general be a very confusing situation when this happens.
msg404854 - (view) Author: Michael Herrmann (mherrmann.at) Date: 2021-10-23 06:17
I just encountered what seems to be the inverse problem of this issue: #45585
History
Date User Action Args
2022-04-11 14:59:32adminsetgithub: 85062
2021-10-23 06:17:00mherrmann.atsetnosy: + mherrmann.at
messages: + msg404854
2020-06-06 12:44:48Nehal Patelsetmessages: + msg370820
2020-06-06 09:09:24SilentGhostsetnosy: + SilentGhost, giampaolo.rodola
messages: + msg370815
components: + Library (Lib)
2020-06-06 00:56:53Nehal Patelcreate