Message208648
Hello,
Considering a zip file which contains utf-8 filenames (as uploaded zip file), the following code fails if launched in a Posix shell.
>>> with zipfile.ZipFile("test_ut8.zip") as fd:
... fd.extractall()
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/opt/python/3.3/lib/python3.3/zipfile.py", line 1225, in extractall
self.extract(zipinfo, path, pwd)
File "/opt/python/3.3/lib/python3.3/zipfile.py", line 1213, in extract
return self._extract_member(member, path, pwd)
File "/opt/python/3.3/lib/python3.3/zipfile.py", line 1276, in _extract_member
open(targetpath, "wb") as target:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-14: ordinal not in range(128)
With shell:
$ locale
LANG=POSIX
...
But filesystem is not encoding dependant. On a Unix system, filename are only bytes, there is no reason to refuse to unzip a zip file (in fact, "unzip" command line don't fail to unzip the file in a Posix shell).
Since "open" can take "bytes" filename, changing the line 1276 from
> open(targetpath)
to:
> open(targetpath.encode("utf-8"))
fixes the problem.
zipfile should not care about the encoding of the filename and should use the bytes sequence filename extracted directly from the bytes sequence of the zipfile. Having "ZipInfo.filename" as a string (and not bytes) is great for an API, but is not needed to open/write a file on the disk. Then, ZipInfo should store the direct bytes sequences of filename as a "bytes_filename" field and use it in the "open" of "extract".
In addition, considering the patch of bug 10614, the right patch could use the new "ZipInfo.encoding" field:
> open(targetpath.encode(member.encoding)) |
|
Date |
User |
Action |
Args |
2014-01-21 15:05:54 | Laurent.Mazuel | set | recipients:
+ Laurent.Mazuel |
2014-01-21 15:05:53 | Laurent.Mazuel | set | messageid: <1390316753.93.0.767122614569.issue20329@psf.upfronthosting.co.za> |
2014-01-21 15:05:53 | Laurent.Mazuel | link | issue20329 messages |
2014-01-21 15:05:53 | Laurent.Mazuel | create | |
|