classification
Title: zipfile has problem reading zip files over 2GB
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: alanmcintyre, alonwas, amaury.forgeotdarc, loewis, pitrou
Priority: normal Keywords: needs review, patch

Created on 2008-08-10 09:27 by alonwas, last changed 2008-09-05 23:43 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
large.c alonwas, 2008-08-13 06:15
largezip.patch pitrou, 2008-08-17 12:19
Messages (14)
msg70968 - (view) Author: (alonwas) Date: 2008-08-10 09:27
zipfile complains about "Bad magic number for central directory" when I
give it files over 2GB. I believe the problem is that the offset for the
central directory should be read as an unsigned long rather than as a
signed long. Modifying structEndArchive from "<4s4H2lH" to "<4s4H2LH"
(note the capital L) should probably fix it. When the offset is >2^31
you get a negative offset and the code fails to find the central
directory. I'll appreciate it if someone more knowledgeable looks at the
problem and the suggested fix, Thanks, Alon
msg70987 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-08-10 17:51
What Python version exactly are you using? This might have been fixed in
2.5.2, with r60117.
msg71003 - (view) Author: (alonwas) Date: 2008-08-11 08:41
Hi,
I'm using 2.5.2 (r252:60911),
Thanks,
Alon

On Sun, 2008-08-10 at 17:51 +0000, Martin v. Löwis wrote:
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
> What Python version exactly are you using? This might have been fixed in
> 2.5.2, with r60117.
> 
> ----------
> nosy: +loewis
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3535>
> _______________________________________
msg71025 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-11 17:29
Do you have a public URL for such a zip file?
msg71076 - (view) Author: (alonwas) Date: 2008-08-13 06:15
Hi Antoine,
The problem happens for files between 2GB and 4GB. I can't really send
you a link to such a big file. To reproduce the problem, you can
generate one. I created (and attach) a tiny C program that helps
generate one. If you want to, you can run it, save its output to a file
and then add it to a zip file (it should compress around 12%). The
resulting zip file will fail to open from python using the zipfile
package because of the bug I mentioned. Please let me know whether this
is enough information to reproduce,
Thanks,
Alon

On Mon, 2008-08-11 at 17:30 +0000, Antoine Pitrou wrote:
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
> Do you have a public URL for such a zip file?
> 
> ----------
> nosy: +pitrou
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3535>
> _______________________________________
msg71101 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-13 22:11
> The problem happens for files between 2GB and 4GB. I can't really send
> you a link to such a big file. To reproduce the problem, you can
> generate one.

The problem is that the "zip" command fails to create a zip file larger
than 2GB (I get "zip I/O error: Invalid argument"). And even if it
didn't fail the internal structure of the zip file might not be exactly
the same as with other compression tools. That's why I was asking you
for an existing file.

If I give you an ssh/sftp access somewhere, would you be able to upload
such a file?
msg71265 - (view) Author: (alonwas) Date: 2008-08-17 10:48
Antoine,
I had a similar problem with zip version 2.32, but this is fixed in
version 3.0 (or on 64-bit architectures). Would you be able to give it a
try with the newer version (which can be obtained from info-zip.org)?
Unfortunately, my upload bandwidth will not allow me to upload such a
big file.
Thanks,
Alon

On Wed, 2008-08-13 at 22:11 +0000, Antoine Pitrou wrote:
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
> > The problem happens for files between 2GB and 4GB. I can't really send
> > you a link to such a big file. To reproduce the problem, you can
> > generate one.
> 
> The problem is that the "zip" command fails to create a zip file larger
> than 2GB (I get "zip I/O error: Invalid argument"). And even if it
> didn't fail the internal structure of the zip file might not be exactly
> the same as with other compression tools. That's why I was asking you
> for an existing file.
> 
> If I give you an ssh/sftp access somewhere, would you be able to upload
> such a file?
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3535>
> _______________________________________
msg71269 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-17 12:19
Alon, can you try with the following patch? It seems to fix it here.
msg72590 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-05 13:14
Alan, do you have an opinion on this?
msg72630 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-09-05 21:25
Your patch seems like a better way to detect whether a file is written
as Zip64, and it seems to be able to properly handle extracting a >2GB
file from a >2GB archive, so I'd vote to include it.  

I tested it with r66233, using a file made from the output of large.c,
zipped with the built-in archiver on OS X 10.4.11.  All regression tests
pass, including test_zipfile64, on both Linux and OS X.
msg72631 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-05 21:34
Alan, do you have commit access? Otherwise the patch needs approval from
another core developer.
msg72632 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-09-05 21:40
No, I don't have commit access at the moment.
msg72649 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-05 23:21
I also agree with the patch. This seems the correct way to detect the
Zip64 format.
msg72651 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-05 23:43
Fixed in r66240, r66241. Thanks!
History
Date User Action Args
2008-09-05 23:43:33pitrousetstatus: open -> closed
resolution: fixed
messages: + msg72651
2008-09-05 23:21:22amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg72649
2008-09-05 21:53:24pitrousetkeywords: + needs review
2008-09-05 21:40:48alanmcintyresetmessages: + msg72632
2008-09-05 21:34:55pitrousetmessages: + msg72631
2008-09-05 21:25:28alanmcintyresetmessages: + msg72630
2008-09-05 13:14:05pitrousetnosy: + alanmcintyre
messages: + msg72590
versions: + Python 3.1, Python 2.7, - Python 2.6, Python 3.0
2008-08-17 12:19:14pitrousetfiles: + largezip.patch
priority: normal
messages: + msg71269
keywords: + patch
versions: + Python 2.6, Python 3.0, - Python 2.5
2008-08-17 10:48:20alonwassetmessages: + msg71265
2008-08-13 22:11:13pitrousetmessages: + msg71101
2008-08-13 06:15:33alonwassetfiles: + large.c
messages: + msg71076
2008-08-11 17:30:00pitrousetnosy: + pitrou
messages: + msg71025
2008-08-11 08:41:07alonwassetmessages: + msg71003
2008-08-10 17:51:13loewissetnosy: + loewis
messages: + msg70987
2008-08-10 09:27:27alonwascreate