Title: The zipfile module does not check files' CRCs, including in ZipFile.testzip
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.6
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: BreamoreBoy, barry, dougturk, nirai, pitrou
Priority: Keywords: patch

Created on 2009-12-10 06:59 by dougturk, last changed 2010-08-13 19:13 by barry. This issue is now closed.

File name Uploaded Description Edit dougturk, 2009-12-10 06:59 Example corrupted zip file
fix_zipfile_crc_issue_7467.patch dougturk, 2010-07-13 13:41 Patch against trunk
zipcrc.patch pitrou, 2010-08-09 22:24
zipcrc2.patch pitrou, 2010-08-12 14:05
zipcrc.patch pitrou, 2010-08-12 22:58 2.6 patch
Messages (12)
msg96194 - (view) Author: Douglas Turk (dougturk) Date: 2009-12-10 06:59
The zipfile module does not calculate the CRC of files in a zipfile as
they are read as of Python 2.6. The old function in Python
2.5 used to do this, and ZipFile.testzip appears to rely on
to check the CRC. This means that ZipFile.testzip does not check the CRC
as it is documented to.

It would be useful if ZipExtFile could check the CRC once it had read
all the data, because this would mean that e.g. ZipFile.extract would
also automatically check the CRC. Perhaps could raise a
BadZipFile exception if it reaches EOF and the CRC is wrong.

Steps to reproduce:
-Create a zip file, then change a byte in a file's contents (easiest if
files are stored, not deflated).
-Run ZipFile.testzip on the edited file. Note that it does not report a
CRC failure, but other zip tools do.

I can provide a patch in a couple of days if that's useful.
msg109989 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-11 10:59
Douglas could you please provide a patch.
msg113428 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-09 16:53
Patch looks good to me.
msg113477 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-09 21:14
Actually, there are two places where the internal buffer is trimmed from consumed data:

            self._readbuffer = self._readbuffer[self._offset:] + data
            self._offset = 0

At this point, it seems self._crc_offset should also be reset to zero, otherwise some data will be forgotten on the next read() call.
msg113493 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-09 22:24
This is an updated patch with fixed (hopefully) CRC logic in the face of buffering, and additional tests.
msg113494 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-09 22:27
Nir, want to take a look?
msg113669 - (view) Author: Nir Aides (nirai) (Python triager) Date: 2010-08-12 09:15
I think patch may be simplified. Instead of keeping track of CRC offset, invoke it directly on the 'data' variable being added to _readbuffer. 

Also the call to _update_crc() before the return from read1() looks redundant.

Finally, is it possible to determine end of file if length of 'data' (computed for crc) is 0?
msg113680 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-12 14:05
Ok, here is a new patch with a simpler logic. I've also added tests with a deflated zipfile entry with a bad CRC (in addition to the one with a stored zipfile entry).

> Finally, is it possible to determine end of file if length of 'data'
> (computed for crc) is 0?

I'm not sure I understand the question.
msg113682 - (view) Author: Nir Aides (nirai) (Python triager) Date: 2010-08-12 14:56
But you answered my question with code :) self._file_size is now unused and may be removed.
msg113684 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-12 15:31
Ah, indeed. I've committed an updated patch in r83959 (py3k), r83962 (3.1) and r83961 (2.7). Thank you!
msg113719 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-12 22:58
At Barry's request, here is a patch for potential backport to 2.6.
msg113817 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010-08-13 19:13
After chatting with __ap__ on irc, i'm going to reject this patch for 2.6.6.
Date User Action Args
2010-08-13 19:13:33barrysetstatus: open -> closed
priority: release blocker ->

nosy: + barry
messages: + msg113817
2010-08-12 22:58:18pitrousetfiles: + zipcrc.patch

messages: + msg113719
2010-08-12 17:32:01pitrousetstatus: closed -> open
2010-08-12 17:31:15barrysetpriority: normal -> release blocker
versions: + Python 2.6, - Python 3.1, Python 2.7, Python 3.2
2010-08-12 15:31:46pitrousetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg113684

stage: patch review -> resolved
2010-08-12 14:56:33niraisetmessages: + msg113682
2010-08-12 14:05:52pitrousetfiles: + zipcrc2.patch

messages: + msg113680
2010-08-12 09:15:25niraisetmessages: + msg113669
2010-08-09 22:27:57pitrousetnosy: + nirai
messages: + msg113494
2010-08-09 22:24:52pitrousetfiles: + zipcrc.patch

messages: + msg113493
2010-08-09 21:14:47pitrousetmessages: + msg113477
2010-08-09 16:53:04pitrousetnosy: + pitrou
messages: + msg113428

assignee: pitrou
resolution: accepted
stage: needs patch -> patch review
2010-07-13 13:42:00dougturksetfiles: + fix_zipfile_crc_issue_7467.patch
keywords: + patch
2010-07-11 10:59:23BreamoreBoysetversions: + Python 3.1, Python 3.2, - Python 2.6
nosy: + BreamoreBoy

messages: + msg109989

stage: needs patch
2009-12-10 06:59:30dougturkcreate