Title: zipfile can not handle the path build by os.path.join()
Messages (2)
msg259524 - (view) Author: 張伯誠 (張伯誠) Date: 2016-02-04 02:19
I think the built-in library does not handle correctly the path build by os.path.join().

For example, assuming we have a zipfile named which contanins a member xml/hello.xml,
if you want to extract the member out, then you hava to use 'xml/hello.xml', if using os.path.join('xml','hello.xml'), you will get an error. 

Platform: Windows7, Python3.4

>>> import zipfile,os
>>> f=zipfile.ZipFile("",'r')
>>> f.extract('xml/hello.xml','.') # OK.
>>> f.extract(os.path.join('xml','hello.xml'),'.') # does not work.

If we fixed the, inside the method getinfo(self,name) of class ZipFile:

    def getinfo(self, name):
        """Return the instance of ZipInfo given 'name'."""      
        info = self.NameToInfo.get(name)

        if info is None:
            raise KeyError(
                'There is no item named %r in the archive' % name)

        return info

    def getinfo(self, name):
        """Return the instance of ZipInfo given 'name'."""
        if os.sep=='\\' and os.sep in name:
        info = self.NameToInfo.get(name)
        if info is None:
            raise KeyError(
                'There is no item named %r in the archive' % name)

        return info

Then this line work!
>>>  f.extract(os.path.join('xml','hello.xml'),'.') # OK!

of course, this line also work:
>>> f.extract('xml/hello.xml','.') # also OK!

I think it is a bug. Why?
Let's we take a closer look at the method:

info = self.NameToInfo.get(name)

The keys of NameToInfo are always the format "xxx/yyy" according to the document of class ZipInfo:

    def __init__(self, filename="NoName", date_time=(1980,1,1,0,0,0)):
        # This is used to ensure paths in generated ZIP files always use
        # forward slashes as the directory separator, as required by the
        # ZIP format specification.
        if os.sep != "/" and os.sep in filename:
            filename = filename.replace(os.sep, "/")

Hence the method getinfo(self,name) of class ZipFile always get KeyError for the path build os.path.join('xxx','yyy')

Thnaks for reading!
msg261987 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2016-03-18 18:59
I don't think this is a bug.  The ZIP format specification requires the use of forward slashes[0]:

   4.4.17 file name: (Variable) The name of the file, with optional relative path.
       The path stored MUST not contain a drive or
       device letter, or a leading slash.  All slashes
       MUST be forward slashes '/' as opposed to
       backwards slashes '\' for compatibility with Amiga
       and UNIX file systems etc.

os.path.join() will use different path separators depending on the system.  If you don't want to hardcode the slashes in a string literal, you can simply use '/'.join(...) instead of os.path.join().

