Issue 914340: gzip.GzipFile to accept stream as fileobj.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Unsupported provider

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/40026

classification

Title:	gzip.GzipFile to accept stream as fileobj.
Type:		Stage:
Components:	Extension Modules	Versions:	Python 2.4

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	antialize, belyi, georg.brandl, loewis, lucas_malor
Priority:	normal	Keywords:	patch

Created on 2004-03-11 18:45 by belyi, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
gzip-stream.patch	belyi, 2004-03-11 20:04

Messages (8)
msg45495 - (view)	Author: Igor Belyi (belyi)	Date: 2004-03-11 18:45
When gzip.GzipFile is initialized with a fileobj which does not have tell() and seek() methods (non-rewinding stream) it throws exception. The interesting thing is that it doesn't have to. The following patch updates gzip.py to allow any stream with just a read() method to be used. This is helpful if you want to be able to do something like: gzip.GzipFile(fileobj=urllib.urlopen("file:///README.gz")).readlines() or use GzipFile with sys.stdin stream. But keep in mind that seek() and rewind() methond of the GzipFile() won't for such stream even with the patch. Igor
msg45496 - (view)	Author: Igor Belyi (belyi)	Date: 2004-03-11 20:04
Logged In: YES user_id=995711 Previous revision of the patch does not work correctly with mutliple compressed members in one stream. I've updated the patch file.
msg45497 - (view)	Author: Igor Belyi (belyi)	Date: 2004-03-19 04:27
Logged In: YES user_id=995711 I thought I need to add a little bit more verbose explanation for the changes... Current implementation of GzipFile() uses tell() and seek() to scroll stream of data in the following 2 cases: 1. When EOF is reached and the last 8 bytes of the file contain checksum and uncompress data size 2. When after decompression there's left some 'unused_data' meaning that a stream may contains more than one compressed item. What my change does it introduces 2 helper buffers: 'inputbuf' which keeps read but unused data from the stream and 'last8' which keeps last 8 'used' bytes Plus, my change introduces helper method _read_internal() which is used instead of the direct call to self.fileobj.read(). In this method data from the stream are read as needed with the call to self.fileobj.read() and correct values of 'inputbuf' and ''last8' are maintained. When case 1 above happen we use 'last8' buffer to read checksum and size. When case 2 above happen we add value of the 'unused_data' to inputbuf. There's one more instance of the self.fileobj.seek() call left in rewind() method but it is used only when rewind() or seek() methods of GzipFile class are used. And it won't be logical to expect those methods to work if the underlying fileobj does not support them. Igor
msg45498 - (view)	Author: Igor Belyi (belyi)	Date: 2004-03-19 14:14
Logged In: YES user_id=995711 I thought I need to add a little bit more verbose explanation for the changes... Current implementation of GzipFile() uses tell() and seek() to scroll stream of data in the following 2 cases: 1. When EOF is reached and the last 8 bytes of the file contain checksum and uncompress data size 2. When after decompression there's left some 'unused_data' meaning that a stream may contains more than one compressed item. What my change does it introduces 2 helper buffers: 'inputbuf' which keeps read but unused data from the stream and 'last8' which keeps last 8 'used' bytes Plus, my change introduces helper method _read_internal() which is used instead of the direct call to self.fileobj.read(). In this method data from the stream are read as needed with the call to self.fileobj.read() and correct values of 'inputbuf' and ''last8' are maintained. When case 1 above happen we use 'last8' buffer to read checksum and size. When case 2 above happen we add value of the 'unused_data' to inputbuf. There's one more instance of the self.fileobj.seek() call left in rewind() method but it is used only when rewind() or seek() methods of GzipFile class are used. And it won't be logical to expect those methods to work if the underlying fileobj does not support them. Igor
msg45499 - (view)	Author: Jakob Truelsen (antialize)	Date: 2006-06-19 08:35
Logged In: YES user_id=379876 Is there any reson this patch is not accepted? If this patch is accepted then I have a patch to urlib2 to (automaticaly) accept gzipped content as described here http://www.http-compression.com/#client_request, if there is some reson this patch is not acceptable please detail, so it can be fixed, in tired of using popen and gunzip :)
msg45500 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2007-03-06 14:51
The patch in this form is incomplete: it lacks test suite changes. Can somebody please provide patches to Lib/test/test_gzip.py that exercises this new functionality?
msg45501 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2007-03-08 20:59
It looks like Patch #1675951 provides the same feature, plus speedups.
msg45502 - (view)	Author: Lucas Malor (lucas_malor)	Date: 2007-03-15 15:16
There's a problem with this path. If previously in my code I read some bytes of the the GzipFile object, _read_gzip_header returns IOError, 'Not a gzipped file', because it starts to read at the current position, not at the start. Unluckily seek() could not be used for urllib objects. I don't see any possible workaround.

History
Date	User	Action	Args
2022-04-11 14:56:03	admin	set	github: 40026
2011-03-20 21:28:29	ned.deily	link	issue11608 superseder
2004-03-11 18:45:17	belyi	create