This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: GZipFile failure on large files
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Robert.Rohde, georg.brandl, ned.deily, pitrou, serhiy.storchaka
Priority: normal Keywords:

Created on 2010-10-07 01:57 by Robert.Rohde, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg118091 - (view) Author: Robert Rohde (Robert.Rohde) Date: 2010-10-07 01:57
I attempted to use GZipFile to process a 1.93 GB file that expands to 18.8 GB.

This consistently produces the same corrupted output file that has approximately, but not exactly, the right output file size.

I bypassed GZipFile by calling the 7-Zip executable to open the compressed file.  This works correctly and consistently.

I haven't tried to figure out how GZipFile works, but I assume that this failure is probably related to the very large size of the files I am working with.  I've used GZipFile before on much smaller files with no apparent problems.  I have no idea what precisely goes wrong, or how to fix it, but I felt it was important to note that GZipFile isn't working for at least some very large files.
msg118164 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-10-08 04:47
Since you mention 7-zip, does that mean you are seeing the problem on a Windows platform?  If so, exactly which version of Windows and what kind of system?  Also, unless someone recognizes this as a duplicate of an earlier issue, there may not be much action on it unless you can supply a test case to reproduce the problem.
msg118169 - (view) Author: Robert Rohde (Robert.Rohde) Date: 2010-10-08 07:52
It's Windows 7 Ultimate (64-bit) on a very high end system.

I don't think it would be very practical to distribute a 2 GB test file.  Though I might be able to get it to a couple people if someone wanted to really study the issue.

Though if it is an integer overflow (or something like that), then I would suspect that GZipFile would show corruption most of the time once the files got large enough.  For example, it might occur for all files expanding to larger than 2^32 bytes (4 GB).  (That's just speculation, I haven't tested it except to note that it failed the very first time I tried to use a file this large.)

Perhaps someone familiar with the code could look for places where integers might overflow?
msg118177 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-08 10:42
Can you show a snippet of the code (or descrive it in detail) that "processes" the GzipFile? Right now it's not obvious which operations you are doing.
msg199753 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-10-13 18:26
Closing due to lack of feedback.
History
Date User Action Args
2022-04-11 14:57:07adminsetgithub: 54249
2013-10-13 18:26:22georg.brandlsetstatus: pending -> closed

nosy: + georg.brandl
messages: + msg199753

resolution: not a bug
2012-12-04 10:10:49serhiy.storchakasetstatus: open -> pending
2012-12-02 22:56:50serhiy.storchakasetnosy: + serhiy.storchaka
2010-10-08 10:42:26pitrousetversions: + Python 3.1, Python 3.2
nosy: + pitrou

messages: + msg118177

components: + Library (Lib), - Windows
stage: test needed ->
2010-10-08 07:52:18Robert.Rohdesetmessages: + msg118169
2010-10-08 04:47:03ned.deilysetnosy: + ned.deily
messages: + msg118164

components: + Windows, - Library (Lib)
stage: test needed
2010-10-07 01:57:08Robert.Rohdecreate