This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python 2.7.4 breaks ZipFile extraction of zip files with unicode member paths
Type: behavior Stage: resolved
Components: Windows Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, Vhati, amaury.forgeotdarc, benjamin.peterson, catalin.iacob, christian.heimes, ezio.melotti, georg.brandl, gregory.p.smith, koobs, larry, loewis, ned.deily, neologix, pitrou, python-dev, r.david.murray, schmir, serhiy.storchaka, terry.reedy, twb, vstinner
Priority: release blocker Keywords: patch

Created on 2013-04-08 03:20 by Vhati, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
Kestrel Cruiser.zip Vhati, 2013-04-08 04:40
zipfile_extract_unicode.patch serhiy.storchaka, 2013-04-08 10:03 review
test_extract_unicode_filenames_skip.patch serhiy.storchaka, 2013-04-20 21:18 Skip test_extract_unicode_filenames review
Messages (24)
msg186264 - (view) Author: Vhati (Vhati) Date: 2013-04-08 03:20
Python 2.7.4 fails while extracting zip files when 'member' is a unicode path.

---
Traceback (most recent call last):
  ...
    my_zip.extract(item, tmp_folder_path)
  File "D:\Apps\Python274\lib\zipfile.py", line 1024, in extract
    return self._extract_member(member, path, pwd)
  File "D:\Apps\Python274\lib\zipfile.py", line 1057, in _extract_member
    arcname = arcname.translate(table)
TypeError: character mapping must return integer, None or unicode
---
2.7.3 had no problems because the call to translate() is new.


The following, copied from ZipFile.py, will recreate the error.
--
import string
illegal = ':<>|"?*'
table = string.maketrans(illegal, '_' * len(illegal))

arcname = "hi"
arcname = arcname.translate(table)
# ascii strings are fine

arcname = u"hi"
arcname = arcname.translate(table)
# unicode fails
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# TypeError: character mapping must return integer, None or unicode
---

I tried using unicode literals for the illegal string and maketrans underscore arg, but that didn't work. Suggestions?


Here's a link to the doc for translate().
http://docs.python.org/2/library/stdtypes.html#str.translate
msg186265 - (view) Author: Vhati (Vhati) Date: 2013-04-08 03:37
Apparently namelist() can return either ascii or unicode strings for its members, depending on the archive. Obviously this'd apply to literal unicode strings as well.
msg186273 - (view) Author: Vhati (Vhati) Date: 2013-04-08 04:40
Oops, passing a unicode literal to extract()'s member arg wouldn't be sufficient.

The extract() method quietly converts strings to ZipInfo objects via getinfo(member_string). Then _extract_member() takes the filename attribute of that ZipInfo object, which causes problems when when THAT is unicode.

So I guess this bug only applies to archives with unicode member paths.

Attached is one such file to aid in troubleshooting.
msg186275 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-04-08 04:46
It appears that this is a consequence of the changes in issue 6972, in particular change 4d1948689ee1.
msg186285 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-08 10:03
Yes, it's my fault. Here is a patch (with test) which fixes this regression in 2.7. This is 2.7 only issue, in Python 3 arcnames always are unicode. Please test on Windows.
msg186326 - (view) Author: Vhati (Vhati) Date: 2013-04-08 18:32
The 2013-04-08 patch worked on Windows XP.
msg186444 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-09 19:01
Perhaps this would deserve a 2.7.5?
msg186490 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-04-10 13:11
Yes; I won't have time for a few days, though.
msg186493 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-04-10 13:16
I guess I will join with 3.2 and 3.3 for #17666.
msg186494 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-04-10 13:18
Perhaps we should hold off for a week or two to see if any other critical problems show up.
msg186496 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-04-10 14:04
Yes, although the new releases will get the standard rc period anyway.
msg186660 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-04-12 18:23
A week's notice to push any almost ready IDLE bugfixes before the .rc's would be nice. (I am assuming there are some, but would have to ask Roger.)
msg186703 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-13 09:29
New changeset d02507c9f973 by Serhiy Storchaka in branch '2.7':
Issue #17656: Fix extraction of zip files with unicode member paths.
http://hg.python.org/cpython/rev/d02507c9f973
msg187430 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2013-04-20 14:38
heads-up: Tests are still failing on FreeBSD (gcc & clang) buildbots:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%20dtrace%202.7/builds/472/steps/test/logs/stdio
http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%20dtrace%2Bclang%202.7/builds/468/steps/test/logs/stdio
msg187453 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-04-20 20:05
it seems like file() can't handle unicode file names on FreeBSD. The FS encoding is 'US-ASCII' on Snakebite's FreeBSD box.

> /home/cpython/users/christian.heimes/2.7/Lib/zipfile.py(1078)_extract_member()
-> with self.open(member, pwd=pwd) as source, \
(Pdb) self.open(member, pwd=pwd)
<zipfile.ZipExtFile object at 0x801eb5fd0>
(Pdb) n
> /home/cpython/users/christian.heimes/2.7/Lib/zipfile.py(1079)_extract_member()
-> file(targetpath, "wb") as target:
(Pdb) file(targetpath, "wb")
*** UnicodeEncodeError: 'ascii' codec can't encode characters in position 47-48: ordinal not in range(128)
(Pdb) sys.getfilesystemencoding()
'US-ASCII'
msg187461 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-20 21:18
Here is a patch which skips test_extract_unicode_filenames if no Unicode filesystem semantics on this platform.
msg187474 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-04-20 22:37
I guess that  test_extract_unicode_filenames_skip.patch will not fix the failing test. The test fails because u"\xf6.txt" cannot be encoded to sys.getfilesystemencoding() (which is ASCII on the FreeBSD buildbot). You should test u"\xf6.txt". You should move the try/except inside the function.
msg188583 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-06 20:19
The test is still failling:

http://buildbot.python.org/all/builders/AMD64 OpenIndiana 2.7/builds/1670/steps/test/logs/stdio

"""
======================================================================
ERROR: test_extract_unicode_filenames (test.test_zipfile.TestsWithSourceFile)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/export/home/buildbot/64bits/2.7.cea-indiana-amd64/build/Lib/test/test_zipfile.py", line 436, in test_extract_unicode_filenames
    writtenfile = zipfp.extract(fname)
  File "/export/home/buildbot/64bits/2.7.cea-indiana-amd64/build/Lib/zipfile.py", line 1024, in extract
    return self._extract_member(member, path, pwd)
  File "/export/home/buildbot/64bits/2.7.cea-indiana-amd64/build/Lib/zipfile.py", line 1079, in _extract_member
    file(targetpath, "wb") as target:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 85-86: ordinal not in range(128)

"""
msg188730 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-05-08 18:53
New changeset 8952fa2c475f by Serhiy Storchaka in branch '2.7':
Issue #17656: Skip test_extract_unicode_filenames if the FS encoding
http://hg.python.org/cpython/rev/8952fa2c475f
msg188731 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-08 18:54
Sorry, I thought I had corrected this test.
msg188767 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-09 12:18
Shouldn't it left opened until regression fix release has released.
msg188769 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-09 12:25
I don't think so. The bug is fixed, and the fix will be in the release.
msg188779 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2013-05-09 14:38
http://mail.python.org/pipermail/python-dev/2013-April/125761.html asked to leave bugs open.
msg188780 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-09 15:10
Ah, fair enough.
History
Date User Action Args
2022-04-11 14:57:44adminsetgithub: 61856
2013-05-27 16:07:10laci112setcomponents: + Windows, - Library (Lib), Unicode
2013-05-13 00:06:27benjamin.petersonsetstatus: open -> closed
2013-05-09 15:10:01pitrousetstatus: closed -> open

messages: + msg188780
2013-05-09 14:38:35Arfreversetmessages: + msg188779
2013-05-09 12:25:52pitrousetmessages: + msg188769
2013-05-09 12:18:31serhiy.storchakasetmessages: + msg188767
2013-05-09 00:34:05pitrousetstatus: open -> closed
2013-05-08 20:40:20serhiy.storchakasetresolution: fixed
2013-05-08 18:54:39serhiy.storchakasetmessages: + msg188731
2013-05-08 18:53:24python-devsetmessages: + msg188730
2013-05-06 20:19:40neologixsetnosy: + neologix
messages: + msg188583
2013-04-30 19:05:15serhiy.storchakasetassignee: serhiy.storchaka
2013-04-20 22:37:58vstinnersetmessages: + msg187474
2013-04-20 21:18:05serhiy.storchakasetfiles: + test_extract_unicode_filenames_skip.patch

messages: + msg187461
2013-04-20 20:05:23christian.heimessetmessages: + msg187453
2013-04-20 19:26:41serhiy.storchakasetnosy: + vstinner
2013-04-20 14:38:57koobssetnosy: + koobs
messages: + msg187430
2013-04-13 16:47:24serhiy.storchakasetstage: patch review -> resolved
2013-04-13 09:29:03python-devsetmessages: + msg186703
2013-04-12 18:23:41terry.reedysetnosy: + terry.reedy
messages: + msg186660
2013-04-10 16:44:24gregory.p.smithsetpriority: high -> release blocker
2013-04-10 14:04:01georg.brandlsetmessages: + msg186496
2013-04-10 13:18:52ned.deilysetmessages: + msg186494
2013-04-10 13:16:59georg.brandlsetmessages: + msg186493
2013-04-10 13:11:14benjamin.petersonsetmessages: + msg186490
2013-04-09 19:01:10pitrousetnosy: + pitrou
messages: + msg186444
2013-04-09 11:54:00christian.heimessetnosy: + christian.heimes
2013-04-08 18:32:16Vhatisetmessages: + msg186326
2013-04-08 10:03:01serhiy.storchakasetfiles: + zipfile_extract_unicode.patch
priority: normal -> high

components: + Library (Lib)
versions: - Python 3.2, Python 3.3, Python 3.4
keywords: + patch
type: crash -> behavior
messages: + msg186285
stage: patch review
2013-04-08 04:59:13gregory.p.smithsettitle: Python 2.7.4 Breaks ZipFile Extraction -> Python 2.7.4 breaks ZipFile extraction of zip files with unicode member paths
versions: + Python 3.2, Python 3.3, Python 3.4
2013-04-08 04:46:01loewissetnosy: + loewis, georg.brandl, gregory.p.smith, amaury.forgeotdarc, larry, schmir, benjamin.peterson, ned.deily, Arfrever, r.david.murray, twb, catalin.iacob, python-dev, serhiy.storchaka
messages: + msg186275
2013-04-08 04:40:49Vhatisetfiles: + Kestrel Cruiser.zip

messages: + msg186273
2013-04-08 03:37:23Vhatisetmessages: + msg186265
2013-04-08 03:20:59Vhaticreate