I think the built-in library zipfile.py does not handle correctly the path build by os.path.join().
For example, assuming we have a zipfile named Test.zip which contanins a member xml/hello.xml,
if you want to extract the member out, then you hava to use 'xml/hello.xml', if using os.path.join('xml','hello.xml'), you will get an error.
Platform: Windows7, Python3.4
>>> import zipfile,os
>>> f=zipfile.ZipFile("Test.zip",'r')
>>> f.extract('xml/hello.xml','.') # OK.
>>> f.extract(os.path.join('xml','hello.xml'),'.') # does not work.
If we fixed the zipfile.py, inside the method getinfo(self,name) of class ZipFile:
before:
def getinfo(self, name):
"""Return the instance of ZipInfo given 'name'."""
info = self.NameToInfo.get(name)
if info is None:
raise KeyError(
'There is no item named %r in the archive' % name)
return info
after:
def getinfo(self, name):
"""Return the instance of ZipInfo given 'name'."""
if os.sep=='\\' and os.sep in name:
name=name.replace('\\','/')
info = self.NameToInfo.get(name)
if info is None:
raise KeyError(
'There is no item named %r in the archive' % name)
return info
Then this line work!
>>> f.extract(os.path.join('xml','hello.xml'),'.') # OK!
of course, this line also work:
>>> f.extract('xml/hello.xml','.') # also OK!
I think it is a bug. Why?
Let's we take a closer look at the method:
info = self.NameToInfo.get(name)
The keys of NameToInfo are always the format "xxx/yyy" according to the document of class ZipInfo:
def __init__(self, filename="NoName", date_time=(1980,1,1,0,0,0)):
# This is used to ensure paths in generated ZIP files always use
# forward slashes as the directory separator, as required by the
# ZIP format specification.
if os.sep != "/" and os.sep in filename:
filename = filename.replace(os.sep, "/")
Hence the method getinfo(self,name) of class ZipFile always get KeyError for the path build os.path.join('xxx','yyy')
Thnaks for reading!
Bocheng.
|