This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile.py - pack filesize as unsigned allows files > 2 gig
Type: Stage:
Components: Library (Lib) Versions: Python 2.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, laurelboa
Priority: normal Keywords:

Created on 2003-02-04 01:54 by laurelboa, last changed 2022-04-10 16:06 by admin. This issue is now closed.

Messages (2)
msg14417 - (view) Author: Jimmy Burgett (laurelboa) Date: 2003-02-04 01:54
Python 2.2.2 
Windows XP (all serice packs installed)
Windows 2000 (all service packs installed)

The filesize and compressed file size numbers in the zip 
header need to "struct.packed" as unsigned ints, not 
signed ints. This allows zipfile.py to compress files 
greater than 2 gigabytes in size. Currently, an attempt 
to compress such a large file gives you this error:

Traceback (most recent call last):
  File "<interactive input>", line 1, in ?
  File "C:\Python22\lib\zipfile.py", line 426, in write
    zinfo.file_size))
OverflowError: long int too large to convert to int

where the line in question is:
self.fp.write(struct.pack("<lll", zinfo.CRC, 
zinfo.compress_size,
     zinfo.file_size))

I believe that the four changes below are all that is 
needed. This is from version 2.2.2, but zipfile.py in 2.3a1 
still had the file size packed/unpacked as a signed 
integer.

I have not tested whether the ziplib routines can seek 
past the 2 gig boundary in order to extract a file whose 
beginning is past the 2 gig boundary. My application 
requires compressing very large files one at a time and 
zipfile.py lets me use either WinZip or the built-in 
Windows "unzip" function for extraction.
These changes allow that use.

-------------- Change Line #28

# Here are some struct module formats for reading 
headers
structEndArchive = "<4s4H2lH"     # 9 items, end of 
archive, 22 bytes
stringEndArchive = "PK\005\006"   # magic number for 
end of archive record
structCentralDir = "<4s4B4H3l5H2l"# 19 items, central 
directory, 46 bytes

to

structCentralDir = "<4s4B4HlLL5H2L"# 19 items, central 
directory, 46 bytes

--------------- change line #306

    def printdir(self):
        """Print a table of contents for the zip file."""
        print "%-46s %19s %12s" % ("File 
Name", "Modified    ", "Size")
        for zinfo in self.filelist:
            date = "%d-%02d-%02d %02d:%02d:%02d" % 
zinfo.date_time
            print "%-46s %s %12d" % (zinfo.filename, date, 
zinfo.file_size)
to 
            print "%-46s %s %12u" % (zinfo.filename, date, 
zinfo.file_size)


---------------- change line #425

        # Seek backwards and write CRC and file sizes
        position = self.fp.tell()       # Preserve current 
position in file
        self.fp.seek(zinfo.header_offset + 14, 0)
        self.fp.write(struct.pack("<lll", zinfo.CRC, 
zinfo.compress_size,
              zinfo.file_size))
to
        self.fp.write(struct.pack("<lLL", zinfo.CRC, 
zinfo.compress_size,
              zinfo.file_size))

---------------- change line #450
        if zinfo.flag_bits & 0x08:
            # Write CRC and file sizes after the file data
            self.fp.write(struct.pack("<lll", zinfo.CRC, 
zinfo.compress_size,
                  zinfo.file_size))

to
            self.fp.write(struct.pack("<lLL", zinfo.CRC, 
zinfo.compress_size,
                  zinfo.file_size))
msg14418 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2004-07-10 19:20
Logged In: YES 
user_id=357491

Fixed in rev. 1.33 for Python 2.4.a2 and rev. 1.31.8.1 for Python 2.3.5 .  
Thanks, Jimmy.  Just added to your fixes but they did help.
History
Date User Action Args
2022-04-10 16:06:29adminsetgithub: 37903
2003-02-04 01:54:29laurelboacreate