classification
Title: zipfile can not handle the path build by os.path.join()
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: alanmcintyre, ezio.melotti, serhiy.storchaka, twouters, 張伯誠
Priority: normal Keywords:

Created on 2016-02-04 02:19 by 張伯誠, last changed 2016-03-18 18:59 by ezio.melotti. This issue is now closed.

Messages (2)
msg259524 - (view) Author: 張伯誠 (張伯誠) Date: 2016-02-04 02:19
I think the built-in library zipfile.py does not handle correctly the path build by os.path.join().

For example, assuming we have a zipfile named Test.zip which contanins a member xml/hello.xml,
if you want to extract the member out, then you hava to use 'xml/hello.xml', if using os.path.join('xml','hello.xml'), you will get an error. 

Platform: Windows7, Python3.4

>>> import zipfile,os
>>> f=zipfile.ZipFile("Test.zip",'r')
>>> f.extract('xml/hello.xml','.') # OK.
>>> f.extract(os.path.join('xml','hello.xml'),'.') # does not work.

If we fixed the zipfile.py, inside the method getinfo(self,name) of class ZipFile:

before:
    def getinfo(self, name):
        """Return the instance of ZipInfo given 'name'."""      
        info = self.NameToInfo.get(name)

        if info is None:
            raise KeyError(
                'There is no item named %r in the archive' % name)

        return info

after:
    def getinfo(self, name):
        """Return the instance of ZipInfo given 'name'."""
        if os.sep=='\\' and os.sep in name:
            name=name.replace('\\','/')
            
        info = self.NameToInfo.get(name)
        if info is None:
            raise KeyError(
                'There is no item named %r in the archive' % name)

        return info

Then this line work!
>>>  f.extract(os.path.join('xml','hello.xml'),'.') # OK!

of course, this line also work:
>>> f.extract('xml/hello.xml','.') # also OK!


I think it is a bug. Why?
Let's we take a closer look at the method:

info = self.NameToInfo.get(name)

The keys of NameToInfo are always the format "xxx/yyy" according to the document of class ZipInfo:

    def __init__(self, filename="NoName", date_time=(1980,1,1,0,0,0)):
 
        # This is used to ensure paths in generated ZIP files always use
        # forward slashes as the directory separator, as required by the
        # ZIP format specification.
        
        if os.sep != "/" and os.sep in filename:
            
            filename = filename.replace(os.sep, "/")

Hence the method getinfo(self,name) of class ZipFile always get KeyError for the path build os.path.join('xxx','yyy')


Thnaks for reading!
Bocheng.
msg261987 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2016-03-18 18:59
I don't think this is a bug.  The ZIP format specification requires the use of forward slashes[0]:

   4.4.17 file name: (Variable)

       4.4.17.1 The name of the file, with optional relative path.
       The path stored MUST not contain a drive or
       device letter, or a leading slash.  All slashes
       MUST be forward slashes '/' as opposed to
       backwards slashes '\' for compatibility with Amiga
       and UNIX file systems etc.

os.path.join() will use different path separators depending on the system.  If you don't want to hardcode the slashes in a string literal, you can simply use '/'.join(...) instead of os.path.join().

[0]: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
History
Date User Action Args
2016-03-18 18:59:15ezio.melottisetstatus: open -> closed

nosy: + ezio.melotti
messages: + msg261987

resolution: not a bug
stage: resolved
2016-02-27 10:33:01anish.shahsetnosy: - anish.shah
2016-02-04 15:43:00SilentGhostsetnosy: + twouters, alanmcintyre, serhiy.storchaka

versions: + Python 3.6, - Python 3.4
2016-02-04 07:24:37anish.shahsetnosy: + anish.shah
2016-02-04 02:19:28張伯誠create