This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile: Allow reading duplicate filenames
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, pysquared, scott.dial
Priority: normal Keywords: patch

Created on 2007-08-15 22:37 by pysquared, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile_56308.diff pysquared, 2007-08-15 22:37
zipfile_62920.diff pysquared, 2008-05-19 23:49 Updated + doc -> reST
Messages (4)
msg53038 - (view) Author: Graham Horler (pysquared) Date: 2007-08-15 22:37
Allow open() 'name' parameter to be a ZipInfo object, which allows opening archive members with duplicate filenames.  Also allow read() 'name' parameter to be a ZipInfo object, as it calls open() directly.

I got sent a zip file which had duplicate names in it, and the only way I could see to extract it using zipfile.py was to apply this patch.

The infolist() and namelist() methods will return information for duplicate filenames, but the open() method takes only a name.

This patch also updated the docs for zipfile.py.

Python 2.1 -> 2.5 zipfile.py module does not have an open() method, but it would be trivial to backport this patch to enhance the read() method.


# Test:
# write() optionally warns, but still allows,
# creating duplicate file names:
import zipfile
zf = zipfile.ZipFile('dupzip.zip', 'w')
zf.debug = 1
zf.writestr('dupname', 'Hello')
zf.writestr('dupname', 'World')
zf.close()
# Print 'Hello' 'World'
zfr = zipfile.ZipFile('dupzip.zip', 'r')
for inf in zfr.infolist():
  print repr(zfr.read(inf))
zfr.close()
msg67081 - (view) Author: Scott Dial (scott.dial) Date: 2008-05-19 23:28
In the patch you commented "why is 'filepos' computed next? It's never
referenced." The answer is that back at r54152 (#1121142) the method was
rewrote removing any reference to 'filepos', but the patch author failed
to remove that line. Please remove it.
msg67082 - (view) Author: Graham Horler (pysquared) Date: 2008-05-19 23:49
Updated to latest revision, and converted documentation part of the 
patch to reST.

Removed the line that pointlessly computes 'filepos', as requested by 
Scott Dial.

(Please excuse my reST, I'm new to it and it's getting late).
msg67116 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-20 08:25
Thanks, reviewed, added tests and committed as r63499.
History
Date User Action Args
2022-04-11 14:56:26adminsetgithub: 45317
2008-05-20 08:26:00georg.brandlsetstatus: open -> closed
nosy: + georg.brandl
resolution: accepted
messages: + msg67116
2008-05-19 23:50:23pysquaredsetfiles: + zipfile_62920.diff
messages: + msg67082
2008-05-19 23:29:04scott.dialsetnosy: + scott.dial
messages: + msg67081
2008-05-19 22:58:39benjamin.petersonsettype: enhancement
versions: + Python 2.6
2007-08-15 22:37:50pysquaredcreate