Message 151203 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gregory.p.smith
Recipients	gregory.p.smith
Date	2012-01-13.22:31:54
SpamBayes Score	3.3023584e-13
Marked as misclassified	No
Message-id	<1326493916.44.0.574522946696.issue13781@psf.upfronthosting.co.za>
In-reply-to

Content
gzip.GzipFile accepts a fileobj parameter with an open file object. Unfortunately gzip requires a filename be embedded in the gzip file and the gzip module code uses fileobj.name to get that. This results in the fake "<fdopen>" name from posixmodule.c being embedded in the output gzipped file when using Python 2.x. This causes problems when ungzipping these files with gzip -d or ungzip implementations that always rely on the embedded filename when writing their output file rather than stripping a suffix from the input filename as they cannot open a file called "<fdopen>" or if they do, each successive ungzip overwrites the previous... On Python 3.x the problem is different, the gzip module fails entirely when given an os.fdopen()'ed file object: $ ./python gzip_fdopen_prob.py out_file <_io.BufferedWriter name='FOO.gz'> out_fd 3 fd_out_file <_io.BufferedWriter name=3> fd_out_file.name 3 Traceback (most recent call last): File "gzip_fdopen_prob.py", line 13, in <module> gz_out_file = gzip.GzipFile(fileobj=fd_out_file) File "/home/gps/oss/cpython/default/Lib/gzip.py", line 184, in __init__ self._write_gzip_header() File "/home/gps/oss/cpython/default/Lib/gzip.py", line 221, in _write_gzip_header fname = os.path.basename(self.name) File "/home/gps/oss/cpython/default/Lib/posixpath.py", line 132, in basename i = p.rfind(sep) + 1 AttributeError: 'int' object has no attribute 'rfind' (code attached) The os.fdopen()'ed file object is kindly using the integer file descriptor as its .name attribute. That might or might not be an issue, but regardless of that: 1) GzipFile should not fail in this case. 2) GzipFile should never embed a fake made up filename in its output. Fixing the gzip module to catch errors and use an empty b'' filename for the gzip code in the above error is easy. What should be done about the .name attribute on fake file objects? I don't think it should exist at all. (another quick test shows that gzip in python 3.x can't output to a BytesIO fileobj at all, it thinks it is readonly)

gzip.GzipFile accepts a fileobj parameter with an open file object.

Unfortunately gzip requires a filename be embedded in the gzip file and the gzip module code uses fileobj.name to get that.

This results in the fake "<fdopen>" name from posixmodule.c being embedded in the output gzipped file when using Python 2.x.  This causes problems when ungzipping these files with gzip -d or ungzip implementations that always rely on the embedded filename when writing their output file rather than stripping a suffix from the input filename as they cannot open a file called "<fdopen>" or if they do, each successive ungzip overwrites the previous...


On Python 3.x the problem is different, the gzip module fails entirely when given an os.fdopen()'ed file object:


$ ./python gzip_fdopen_prob.py 
out_file <_io.BufferedWriter name='FOO.gz'>
out_fd 3
fd_out_file <_io.BufferedWriter name=3>
fd_out_file.name 3
Traceback (most recent call last):
  File "gzip_fdopen_prob.py", line 13, in <module>
    gz_out_file = gzip.GzipFile(fileobj=fd_out_file)
  File "/home/gps/oss/cpython/default/Lib/gzip.py", line 184, in __init__
    self._write_gzip_header()
  File "/home/gps/oss/cpython/default/Lib/gzip.py", line 221, in _write_gzip_header
    fname = os.path.basename(self.name)
  File "/home/gps/oss/cpython/default/Lib/posixpath.py", line 132, in basename
    i = p.rfind(sep) + 1
AttributeError: 'int' object has no attribute 'rfind'

(code attached)

The os.fdopen()'ed file object is kindly using the integer file descriptor as its .name attribute.  That might or might not be an issue, but regardless of that:

1) GzipFile should not fail in this case.
2) GzipFile should never embed a fake made up filename in its output.

Fixing the gzip module to catch errors and use an empty b'' filename for the gzip code in the above error is easy.

What should be done about the .name attribute on fake file objects?  I don't think it should exist at all.

(another quick test shows that gzip in python 3.x can't output to a BytesIO fileobj at all, it thinks it is readonly)

History
Date	User	Action	Args
2012-01-13 22:31:56	gregory.p.smith	set	recipients: + gregory.p.smith
2012-01-13 22:31:56	gregory.p.smith	set	messageid: <1326493916.44.0.574522946696.issue13781@psf.upfronthosting.co.za>
2012-01-13 22:31:55	gregory.p.smith	link	issue13781 messages
2012-01-13 22:31:55	gregory.p.smith	create