classification
Title: zipfile's readline() drops data in universal newline mode
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: 14371 Superseder:
Assigned To: Nosy List: alanmcintyre, belopolsky, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-12-21 20:58 by belopolsky, last changed 2013-12-21 21:55 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile_peek.patch serhiy.storchaka, 2013-12-21 21:24 review
Messages (5)
msg206779 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2013-12-21 20:58
This problem happens when I unpack a file from a 200+ MB zip archive as follows:

with zipfile.ZipFile(archive) as z:
    data = b''
    with z.open(filename, 'rU') as f:
        for line in f:
      	    data += line


I cannot reduce it to a test case suitable for posting here, but the culprit is the following code in zipfile.py:

    def peek(self, n=1):
        """Returns buffered bytes without advancing the position."""
        if n > len(self._readbuffer) - self._offset:
            chunk = self.read(n)
            self._offset -= len(chunk)

See http://hg.python.org/cpython/file/81f8375e60ce/Lib/zipfile.py#l605

The problem occurs when peek() is called on the boundary of the uncompress buffer and read() goes through more than one readbuffer.  The result is that self._offset is smaller than len(chunk) leading to a non-sensical negative self._offset upon return from peek().

This problem does not seem to appear in 3.x since 028e8e0b03e8.
msg206784 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-21 21:24
Does this patch fix a bug?
msg206785 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2013-12-21 21:29
It does!
msg206788 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-12-21 21:52
New changeset 8b097d07488d by Serhiy Storchaka in branch '2.7':
Issue #20048: Fixed ZipExtFile.peek() when it is called on the boundary of
http://hg.python.org/cpython/rev/8b097d07488d
msg206789 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-21 21:55
Than you for your report and irrefragable analysis.
History
Date User Action Args
2014-01-22 13:07:00serhiy.storchakalinkissue20343 superseder
2013-12-21 21:55:42serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg206789

stage: patch review -> resolved
2013-12-21 21:52:31python-devsetnosy: + python-dev
messages: + msg206788
2013-12-21 21:29:03belopolskysetmessages: + msg206785
2013-12-21 21:24:48serhiy.storchakasetfiles: + zipfile_peek.patch
keywords: + patch
messages: + msg206784

stage: patch review
2013-12-21 21:21:15belopolskysettype: behavior
components: + Library (Lib)
2013-12-21 21:20:32belopolskysetkeywords: - 3.2regression
2013-12-21 21:17:24belopolskysetkeywords: + 3.2regression, - gsoc
nosy: + alanmcintyre
2013-12-21 21:00:23belopolskysetkeywords: + gsoc
nosy: + serhiy.storchaka
dependencies: + Add support for bzip2 compression to the zipfile module
2013-12-21 20:58:59belopolskycreate