Message 123891 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	KevinH
Recipients	KevinH
Date	2010-12-13.18:57:48
SpamBayes Score	1.2323476e-14
Marked as misclassified	No
Message-id	<1292266671.15.0.914079935044.issue10694@psf.upfronthosting.co.za>
In-reply-to

Content
The current version of zipfile.py is not robust to slight errors at the end of zip archives. Many file servers improperly append a new line to the end of files that do not have a new line when they are uploaded from a browser. This bug ends up adding 0x0d 0xa to the end of the zip archive. This in turn makes zipfile.py eventually throw a "Not a zip file" exception when no other zip tools seem to have trouble with them. Even unzip -t passes these "problem" zip archives with flying colours. I hate to have to extract and create my own zipfile.py script just to be robust to zip archives that are commonly found on the net and that are handled more robustly by other software. So please consider changing this code from _EndRecData below to simply ignore any trailing data after the proper stringEndArchive and structEndArchive are found instead of looking for the comment and verifying if the comment is properly formatted and throwing an exception if not correct. Ignoring the "comment" seems to be more robust in this case as everything needed to unpack the zip archive has been found. # Either this is not a ZIP file, or it is a ZIP file with an archive # comment. Search the end of the file for the "end of central directory" # record signature. The comment is the last item in the ZIP file and may be # up to 64K long. It is assumed that the "end of central directory" magic # number does not appear in the comment. maxCommentStart = max(filesize - (1 << 16) - sizeEndCentDir, 0) fpin.seek(maxCommentStart, 0) data = fpin.read() start = data.rfind(stringEndArchive) if start >= 0: # found the magic number; attempt to unpack and interpret recData = data[start:start+sizeEndCentDir] endrec = list(struct.unpack(structEndArchive, recData)) comment = data[start+sizeEndCentDir:] # check that comment length is correct if endrec[_ECD_COMMENT_SIZE] == len(comment): # Append the archive comment and start offset endrec.append(comment) endrec.append(maxCommentStart + start) if endrec[_ECD_OFFSET] == 0xffffffff: # There is apparently a "Zip64 end of central directory" # structure present, so go look for it return _EndRecData64(fpin, start - filesize, endrec) return endrec This will in turn make the Python implementation of zipfile.py more robust to data improperly appended when some zip archives are uploaded or downloaded (similar to how other zip tools handle this issue). Thank you for your time and consideration.

The current version of zipfile.py is not robust to slight errors at the end of zip archives.  Many file servers **improperly** append a new line to the end of files that do not have a new line when they are uploaded from a browser.  This bug ends up adding 0x0d 0xa to the end of the zip archive.  This in turn makes zipfile.py eventually throw a "Not a zip file" exception when no other zip tools seem to have trouble with them.  Even unzip -t passes these "problem" zip archives with flying colours.

I hate to have to extract and create my own zipfile.py script just to be robust to zip archives that are commonly found on the net and that are handled more robustly by other software.

So please consider changing this code from _EndRecData below to simply ignore any trailing data after the proper stringEndArchive and structEndArchive are found instead of looking for the comment and verifying if the comment is properly formatted and throwing an exception if not correct.  Ignoring the "comment" seems to be more robust in this case as everything needed to unpack the zip archive has been found.


    # Either this is not a ZIP file, or it is a ZIP file with an archive
    # comment.  Search the end of the file for the "end of central directory"
    # record signature. The comment is the last item in the ZIP file and may be
    # up to 64K long.  It is assumed that the "end of central directory" magic
    # number does not appear in the comment.
    maxCommentStart = max(filesize - (1 << 16) - sizeEndCentDir, 0)
    fpin.seek(maxCommentStart, 0)
    data = fpin.read()
    start = data.rfind(stringEndArchive)
    if start >= 0:
        # found the magic number; attempt to unpack and interpret
        recData = data[start:start+sizeEndCentDir]
        endrec = list(struct.unpack(structEndArchive, recData))
        comment = data[start+sizeEndCentDir:]
        # check that comment length is correct
        if endrec[_ECD_COMMENT_SIZE] == len(comment):
            # Append the archive comment and start offset
            endrec.append(comment)
            endrec.append(maxCommentStart + start)
            if endrec[_ECD_OFFSET] == 0xffffffff:
                # There is apparently a "Zip64 end of central directory"
                # structure present, so go look for it
                return _EndRecData64(fpin, start - filesize, endrec)
            return endrec


This will in turn make the Python implementation of zipfile.py more robust to data improperly appended when some zip archives are uploaded or downloaded (similar to how other zip tools handle this issue).

Thank you for your time and consideration.

History
Date	User	Action	Args
2010-12-13 18:57:51	KevinH	set	recipients: + KevinH
2010-12-13 18:57:51	KevinH	set	messageid: <1292266671.15.0.914079935044.issue10694@psf.upfronthosting.co.za>
2010-12-13 18:57:49	KevinH	link	issue10694 messages
2010-12-13 18:57:48	KevinH	create