This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Laurent.Mazuel
Recipients Laurent.Mazuel
Date 2014-01-21.15:05:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1390316753.93.0.767122614569.issue20329@psf.upfronthosting.co.za>
In-reply-to
Content
Hello,

Considering a zip file which contains utf-8 filenames (as uploaded zip file), the following code fails if launched in a Posix shell.

>>> with zipfile.ZipFile("test_ut8.zip") as fd:
...     fd.extractall()
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/python/3.3/lib/python3.3/zipfile.py", line 1225, in extractall
    self.extract(zipinfo, path, pwd)
  File "/opt/python/3.3/lib/python3.3/zipfile.py", line 1213, in extract
    return self._extract_member(member, path, pwd)
  File "/opt/python/3.3/lib/python3.3/zipfile.py", line 1276, in _extract_member
    open(targetpath, "wb") as target:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-14: ordinal not in range(128)

With shell:
$ locale
LANG=POSIX
...

But filesystem is not encoding dependant. On a Unix system, filename are only bytes, there is no reason to refuse to unzip a zip file (in fact, "unzip" command line don't fail to unzip the file in a Posix shell).

Since "open" can take "bytes" filename, changing the line 1276 from
> open(targetpath)
to:
> open(targetpath.encode("utf-8"))

fixes the problem.

zipfile should not care about the encoding of the filename and should use the bytes sequence filename extracted directly from the bytes sequence of the zipfile. Having "ZipInfo.filename" as a string (and not bytes) is great for an API, but is not needed to open/write a file on the disk. Then, ZipInfo should store the direct bytes sequences of filename as a "bytes_filename" field and use it in the "open" of "extract".

In addition, considering the patch of bug 10614, the right patch could use the new "ZipInfo.encoding" field:
> open(targetpath.encode(member.encoding))
History
Date User Action Args
2014-01-21 15:05:54Laurent.Mazuelsetrecipients: + Laurent.Mazuel
2014-01-21 15:05:53Laurent.Mazuelsetmessageid: <1390316753.93.0.767122614569.issue20329@psf.upfronthosting.co.za>
2014-01-21 15:05:53Laurent.Mazuellinkissue20329 messages
2014-01-21 15:05:53Laurent.Mazuelcreate