This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author belopolsky
Recipients belopolsky
Date 2013-12-21.20:58:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1387659539.36.0.629484655922.issue20048@psf.upfronthosting.co.za>
In-reply-to
Content
This problem happens when I unpack a file from a 200+ MB zip archive as follows:

with zipfile.ZipFile(archive) as z:
    data = b''
    with z.open(filename, 'rU') as f:
        for line in f:
      	    data += line


I cannot reduce it to a test case suitable for posting here, but the culprit is the following code in zipfile.py:

    def peek(self, n=1):
        """Returns buffered bytes without advancing the position."""
        if n > len(self._readbuffer) - self._offset:
            chunk = self.read(n)
            self._offset -= len(chunk)

See http://hg.python.org/cpython/file/81f8375e60ce/Lib/zipfile.py#l605

The problem occurs when peek() is called on the boundary of the uncompress buffer and read() goes through more than one readbuffer.  The result is that self._offset is smaller than len(chunk) leading to a non-sensical negative self._offset upon return from peek().

This problem does not seem to appear in 3.x since 028e8e0b03e8.
History
Date User Action Args
2013-12-21 20:58:59belopolskysetrecipients: + belopolsky
2013-12-21 20:58:59belopolskysetmessageid: <1387659539.36.0.629484655922.issue20048@psf.upfronthosting.co.za>
2013-12-21 20:58:59belopolskylinkissue20048 messages
2013-12-21 20:58:58belopolskycreate