Title: zipfile hangs on certain zip files
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.6, Python 2.5
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: alanmcintyre, ehuss, gvanrossum, jafo, jcea, loewis
Priority: high Keywords: patch

Created on 2007-12-14 01:31 by ehuss, last changed 2008-07-03 12:53 by loewis. This issue is now closed.

File name Uploaded Description Edit
zipfile.patch ehuss, 2007-12-14 01:31
zipfile-unsigned-fixes.diff alanmcintyre, 2008-01-09 07:00
zipfile-unsigned-fixes2.diff alanmcintyre, 2008-01-12 10:24
zipfile-patch-1622.diff alanmcintyre, 2008-05-26 19:58
Messages (18)
msg58606 - (view) Author: Eric Huss (ehuss) Date: 2007-12-14 01:31
Creating a ZipFile object with a certain type of zip file can cause it
to go into an infinite loop.  The problem is the new extra field parsing
routine.  It unpacks integers as a signed value, which if they are
sufficiently large (over 32767), then it will loop forever.

There are many places in the zipfile module that incorrectly unpack
values as signed integers, but this is the only place that I can
determine that causes a serious problem.

Attached is a fix.
msg59405 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-01-06 20:55
Based on the ZIP spec (I'm using the one here:, I'm inclined
to agree. There's a general note that says "All fields unless otherwise
noted are unsigned and stored in Intel low-byte:high-byte,
low-word:high-word order." I can't find any explicit mention of any
fields that should be signed.

Since I find myself poking around in the ZIP file format today, I may
try changing all the format strings to unsigned, and writing up some
tests to check cases that can be run in a reasonable amount of time.  If
I get anything usable I'll post a patch.
msg59423 - (view) Author: Eric Huss (ehuss) Date: 2008-01-07 00:49
Some of this work has already been done, see issue 1189216.

You'll obviously need to keep the CRC unpack as signed because the
binascii module uses signed values.

Some header values are stored as 0xffffffff to denote the value is stored
in the 64-bit extended fields.  There are a few places in the zipfile
module that use -1 to achieve this.  This will cause a deprecation warning
when passed into the struct module:
DeprecationWarning: struct integer overflow masking is deprecated

The places that check for this appear to always check for both -1 and
0xfffffff, so that should be safe.  Could potentially even remove the -1
check, not sure why it's checking for both.  In general, you should
probably review the 64-bit support with a fine-toothed comb.

There are some other signed fields in the spec, but I don't think any of
them apply to the zipfile module.  Some of them are timestamps in the
extra fields (Extended Timestamp and Info-ZIP Mac Extra fields).

I'd also keep an eye on other zipfile bugs that appear to be sign-related
(which you appear to have already looked at), such as issues 1189216,
1060, 1760357, 1526.
msg59431 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-07 04:25
Alan and Eric, can I depend on you two to review each other's code and
let me know when you think the patch is ready to be submitted?
msg59470 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-01-07 17:45
Well I can't promise it will be swift, since my winter vacation ended
today, but I'll keep on top of it as best I can. :)
msg59581 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-01-09 07:00
Here's the first draft of a patch (zipfile-unsigned-fixes.diff) that
does a few things:

- Interpret fields as unsigned unless required (CRC, etc.)
- Corrects reading of ZIP files with large archive comments
- Replaces hard-coded structure sizes with module-level variables
(calculated using struct.calcsize)
- When writing a file in ZipFile.close(), if the ZipFile instance has a
comment value, this value is now written as the archive comment
- Renamed some of the module variables to better match the names used in
the ZIP file spec
- General cleanup & addition of comments to (hopefully) make the code a
little easier to follow

Test & doc changes are included, & this code currently passes regression
test suite with -uall on OS X.
msg59678 - (view) Author: Eric Huss (ehuss) Date: 2008-01-10 19:45
Alan, your changes look good to me, but it is missing my patch in this bug
that fixes the sign issue in _decodeExtra.  While you're there, you might
as well change the other 3 unpack lines to use a capital Q.

msg59683 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-01-10 22:24
Thanks for the reminder, Eric; I'll include that and post the updated patch.

As a side note, on OS X, running regrtest with -uall or -ulargefile
still skips test_zipfile4 for some reason.  I'll have a look at that
before submitting the next version of the patch as well.
msg59709 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-01-11 14:18
Currently the extra field data is only parsed when it contains Zip64
extended information.  It could probably be broken up into a list of
(id, data) pairs at a minimum for all data types, since the spec says
that's the structure that "should" be used.  

I don't know whether the results of that parse should be kept in a new
slot; putting it in the ZipInfo.extra slot would probably be bad for
2.6, since users might be counting on the current behavior of a raw
chunk of data still being there.  Does anybody else have any suggestions
for this?
msg59809 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-01-12 10:24
Here's an updated patch (zipfile-unsigned-fixes2.diff) that contains
Eric's fixes.  I also changed the structure for the Zip64 extension data
to be unsigned.  I think this should cover all the deficiencies caused
by the improper use of unsigned values.

Note: if this patch is committed, then issue 1746 should be closed since
it is fixed with this patch.
msg59810 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-01-12 10:35
I just noticed that my changes for issue 1526 are included in this
patch.  Eric, if you have time could you have a look at that issue and
see if you think I addressed it properly?  If not I can back them out
into a separate patch for that issue.
msg63854 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2008-03-18 02:49
Eric: Can you review the latest version of this patch?
msg64576 - (view) Author: Eric Huss (ehuss) Date: 2008-03-27 02:12
Sorry for the long delay.  Yes, the latest patch looks very good to me.

msg65548 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-04-16 13:16
Is there anything else that needs to be addressed before this can be
committed?  At the moment I don't know of anything, just wanted to make
sure somebody wasn't waiting on me.  

As a reminder, issues #1526 and #1746 should be closed if this patch is
msg67244 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-23 14:58
Alan, it appears that the patch doesn't apply anymore, in r63553. Can
you please update it?
msg67245 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-05-23 15:00
Sure, I'll look at it later today or over the weekend.  I should
probably break out the parts that apply to other issues and update those
patches as well.
msg67388 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2008-05-26 19:58
Here's an updated patch (zipfile-patch-1622.diff).  I didn't get around
to separating the fixes for #1526 and #1746.

All the tests (including test_zipfile64) pass after applying this patch.
msg69193 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-07-03 12:53
Thanks for the patch. This is now committed as r64688
Date User Action Args
2008-07-03 12:53:33loewissetstatus: open -> closed
resolution: accepted
messages: + msg69193
2008-06-30 04:49:59loewissetpriority: normal -> high
2008-05-26 19:58:32alanmcintyresetfiles: + zipfile-patch-1622.diff
messages: + msg67388
2008-05-23 15:00:22alanmcintyresetmessages: + msg67245
2008-05-23 14:58:14loewissetmessages: + msg67244
2008-04-16 13:16:41alanmcintyresetmessages: + msg65548
2008-03-27 02:12:48ehusssetmessages: + msg64576
2008-03-18 02:49:52jafosetassignee: loewis
messages: + msg63854
nosy: + jafo, loewis
2008-01-20 20:04:03christian.heimessetpriority: normal
2008-01-16 02:36:00jceasetnosy: + jcea
2008-01-12 10:35:40alanmcintyresetmessages: + msg59810
2008-01-12 10:24:38alanmcintyresetfiles: + zipfile-unsigned-fixes2.diff
messages: + msg59809
2008-01-11 14:18:32alanmcintyresetmessages: + msg59709
2008-01-10 22:24:01alanmcintyresetmessages: + msg59683
2008-01-10 19:45:39ehusssetmessages: + msg59678
2008-01-09 07:00:09alanmcintyresetfiles: + zipfile-unsigned-fixes.diff
messages: + msg59581
2008-01-07 17:45:41alanmcintyresetmessages: + msg59470
2008-01-07 04:25:20gvanrossumsetmessages: + msg59431
2008-01-07 00:49:26ehusssetmessages: + msg59423
2008-01-06 20:55:39alanmcintyresetnosy: + alanmcintyre
messages: + msg59405
2007-12-14 01:43:07gvanrossumsetkeywords: + patch
nosy: + gvanrossum
2007-12-14 01:31:17ehusscreate