This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile truncates extracted files
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder: zipfile's readline() drops data in universal newline mode
View: 20048
Assigned To: Nosy List: Christian.Pérez, serhiy.storchaka
Priority: normal Keywords:

Created on 2014-01-22 11:05 by Christian.Pérez, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg208779 - (view) Author: Christian Pérez (Christian.Pérez) Date: 2014-01-22 11:05
To reproduce the error:

$ wget http://dbnsfp.houstonbioinformatics.org/dbNSFPzip/dbNSFP2.0.zip

$ python
Python 2.7.4 (default, Sep 26 2013, 03:20:26) 
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from zipfile import ZipFile
>>> 
>>> with ZipFile("dbNSFP2.0.zip", "r") as zf:
...   with zf.open("dbNSFP2.0_variant.chr20", "U") as f:
...     count = 0
...     for line in f:
...       count += 1
... 

>>> print count
964352

$ unzip -p dbNSFP2.0.zip dbNSFP2.0_variant.chr20 | wc -l
2161277

may it be related to issue 6759 ?
msg208781 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-22 11:14
Could you please test with in-development version Python 2.7.6+? Unlikely it is related to issue6759, but perhaps it is related to issue20048.
msg208791 - (view) Author: Christian Pérez (Christian.Pérez) Date: 2014-01-22 12:20
With 2.7.6+ works fine.

Python 2.7.6+ (2.7:c28e07377b03, Jan 22 2014, 12:56:14) 
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from zipfile import ZipFile
[54979 refs]
>>> 
[54979 refs]
>>> with ZipFile("dbNSFP2.0.zip", "r") as zf:
...     with zf.open("dbNSFP2.0_variant.chr20", "U") as f:
...             count = 0
...             for line in f:
...                     count += 1
... 
[55930 refs]
>>> print count
2161277
[55930 refs]

I tested it with Python 3.3 and works fine too.
msg208800 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-22 13:10
Thank you for your report Christian. This bug already fixed in issue20048. Wait for Python 2.7.7 bugfix release.
History
Date User Action Args
2022-04-11 14:57:57adminsetgithub: 64542
2014-01-22 13:10:17serhiy.storchakasetmessages: + msg208800
2014-01-22 13:07:00serhiy.storchakasetstatus: open -> closed
superseder: zipfile's readline() drops data in universal newline mode
resolution: out of date
stage: resolved
2014-01-22 12:20:12Christian.Pérezsetmessages: + msg208791
2014-01-22 11:14:04serhiy.storchakasetmessages: + msg208781
2014-01-22 11:06:36serhiy.storchakasetnosy: + serhiy.storchaka
2014-01-22 11:05:09Christian.Pérezcreate