msg186264 - (view) |
Author: Vhati (Vhati) |
Date: 2013-04-08 03:20 |
Python 2.7.4 fails while extracting zip files when 'member' is a unicode path.
---
Traceback (most recent call last):
...
my_zip.extract(item, tmp_folder_path)
File "D:\Apps\Python274\lib\zipfile.py", line 1024, in extract
return self._extract_member(member, path, pwd)
File "D:\Apps\Python274\lib\zipfile.py", line 1057, in _extract_member
arcname = arcname.translate(table)
TypeError: character mapping must return integer, None or unicode
---
2.7.3 had no problems because the call to translate() is new.
The following, copied from ZipFile.py, will recreate the error.
--
import string
illegal = ':<>|"?*'
table = string.maketrans(illegal, '_' * len(illegal))
arcname = "hi"
arcname = arcname.translate(table)
# ascii strings are fine
arcname = u"hi"
arcname = arcname.translate(table)
# unicode fails
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# TypeError: character mapping must return integer, None or unicode
---
I tried using unicode literals for the illegal string and maketrans underscore arg, but that didn't work. Suggestions?
Here's a link to the doc for translate().
http://docs.python.org/2/library/stdtypes.html#str.translate
|
msg186265 - (view) |
Author: Vhati (Vhati) |
Date: 2013-04-08 03:37 |
Apparently namelist() can return either ascii or unicode strings for its members, depending on the archive. Obviously this'd apply to literal unicode strings as well.
|
msg186273 - (view) |
Author: Vhati (Vhati) |
Date: 2013-04-08 04:40 |
Oops, passing a unicode literal to extract()'s member arg wouldn't be sufficient.
The extract() method quietly converts strings to ZipInfo objects via getinfo(member_string). Then _extract_member() takes the filename attribute of that ZipInfo object, which causes problems when when THAT is unicode.
So I guess this bug only applies to archives with unicode member paths.
Attached is one such file to aid in troubleshooting.
|
msg186275 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2013-04-08 04:46 |
It appears that this is a consequence of the changes in issue 6972, in particular change 4d1948689ee1.
|
msg186285 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-04-08 10:03 |
Yes, it's my fault. Here is a patch (with test) which fixes this regression in 2.7. This is 2.7 only issue, in Python 3 arcnames always are unicode. Please test on Windows.
|
msg186326 - (view) |
Author: Vhati (Vhati) |
Date: 2013-04-08 18:32 |
The 2013-04-08 patch worked on Windows XP.
|
msg186444 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-04-09 19:01 |
Perhaps this would deserve a 2.7.5?
|
msg186490 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2013-04-10 13:11 |
Yes; I won't have time for a few days, though.
|
msg186493 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2013-04-10 13:16 |
I guess I will join with 3.2 and 3.3 for #17666.
|
msg186494 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2013-04-10 13:18 |
Perhaps we should hold off for a week or two to see if any other critical problems show up.
|
msg186496 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2013-04-10 14:04 |
Yes, although the new releases will get the standard rc period anyway.
|
msg186660 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2013-04-12 18:23 |
A week's notice to push any almost ready IDLE bugfixes before the .rc's would be nice. (I am assuming there are some, but would have to ask Roger.)
|
msg186703 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-04-13 09:29 |
New changeset d02507c9f973 by Serhiy Storchaka in branch '2.7':
Issue #17656: Fix extraction of zip files with unicode member paths.
http://hg.python.org/cpython/rev/d02507c9f973
|
msg187430 - (view) |
Author: Kubilay Kocak (koobs)  |
Date: 2013-04-20 14:38 |
heads-up: Tests are still failing on FreeBSD (gcc & clang) buildbots:
http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%20dtrace%202.7/builds/472/steps/test/logs/stdio
http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%20dtrace%2Bclang%202.7/builds/468/steps/test/logs/stdio
|
msg187453 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2013-04-20 20:05 |
it seems like file() can't handle unicode file names on FreeBSD. The FS encoding is 'US-ASCII' on Snakebite's FreeBSD box.
> /home/cpython/users/christian.heimes/2.7/Lib/zipfile.py(1078)_extract_member()
-> with self.open(member, pwd=pwd) as source, \
(Pdb) self.open(member, pwd=pwd)
<zipfile.ZipExtFile object at 0x801eb5fd0>
(Pdb) n
> /home/cpython/users/christian.heimes/2.7/Lib/zipfile.py(1079)_extract_member()
-> file(targetpath, "wb") as target:
(Pdb) file(targetpath, "wb")
*** UnicodeEncodeError: 'ascii' codec can't encode characters in position 47-48: ordinal not in range(128)
(Pdb) sys.getfilesystemencoding()
'US-ASCII'
|
msg187461 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-04-20 21:18 |
Here is a patch which skips test_extract_unicode_filenames if no Unicode filesystem semantics on this platform.
|
msg187474 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-04-20 22:37 |
I guess that test_extract_unicode_filenames_skip.patch will not fix the failing test. The test fails because u"\xf6.txt" cannot be encoded to sys.getfilesystemencoding() (which is ASCII on the FreeBSD buildbot). You should test u"\xf6.txt". You should move the try/except inside the function.
|
msg188583 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2013-05-06 20:19 |
The test is still failling:
http://buildbot.python.org/all/builders/AMD64 OpenIndiana 2.7/builds/1670/steps/test/logs/stdio
"""
======================================================================
ERROR: test_extract_unicode_filenames (test.test_zipfile.TestsWithSourceFile)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/export/home/buildbot/64bits/2.7.cea-indiana-amd64/build/Lib/test/test_zipfile.py", line 436, in test_extract_unicode_filenames
writtenfile = zipfp.extract(fname)
File "/export/home/buildbot/64bits/2.7.cea-indiana-amd64/build/Lib/zipfile.py", line 1024, in extract
return self._extract_member(member, path, pwd)
File "/export/home/buildbot/64bits/2.7.cea-indiana-amd64/build/Lib/zipfile.py", line 1079, in _extract_member
file(targetpath, "wb") as target:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 85-86: ordinal not in range(128)
"""
|
msg188730 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-05-08 18:53 |
New changeset 8952fa2c475f by Serhiy Storchaka in branch '2.7':
Issue #17656: Skip test_extract_unicode_filenames if the FS encoding
http://hg.python.org/cpython/rev/8952fa2c475f
|
msg188731 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-05-08 18:54 |
Sorry, I thought I had corrected this test.
|
msg188767 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-05-09 12:18 |
Shouldn't it left opened until regression fix release has released.
|
msg188769 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-05-09 12:25 |
I don't think so. The bug is fixed, and the fix will be in the release.
|
msg188779 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) *  |
Date: 2013-05-09 14:38 |
http://mail.python.org/pipermail/python-dev/2013-April/125761.html asked to leave bugs open.
|
msg188780 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-05-09 15:10 |
Ah, fair enough.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:44 | admin | set | github: 61856 |
2013-05-27 16:07:10 | laci112 | set | components:
+ Windows, - Library (Lib), Unicode |
2013-05-13 00:06:27 | benjamin.peterson | set | status: open -> closed |
2013-05-09 15:10:01 | pitrou | set | status: closed -> open
messages:
+ msg188780 |
2013-05-09 14:38:35 | Arfrever | set | messages:
+ msg188779 |
2013-05-09 12:25:52 | pitrou | set | messages:
+ msg188769 |
2013-05-09 12:18:31 | serhiy.storchaka | set | messages:
+ msg188767 |
2013-05-09 00:34:05 | pitrou | set | status: open -> closed |
2013-05-08 20:40:20 | serhiy.storchaka | set | resolution: fixed |
2013-05-08 18:54:39 | serhiy.storchaka | set | messages:
+ msg188731 |
2013-05-08 18:53:24 | python-dev | set | messages:
+ msg188730 |
2013-05-06 20:19:40 | neologix | set | nosy:
+ neologix messages:
+ msg188583
|
2013-04-30 19:05:15 | serhiy.storchaka | set | assignee: serhiy.storchaka |
2013-04-20 22:37:58 | vstinner | set | messages:
+ msg187474 |
2013-04-20 21:18:05 | serhiy.storchaka | set | files:
+ test_extract_unicode_filenames_skip.patch
messages:
+ msg187461 |
2013-04-20 20:05:23 | christian.heimes | set | messages:
+ msg187453 |
2013-04-20 19:26:41 | serhiy.storchaka | set | nosy:
+ vstinner
|
2013-04-20 14:38:57 | koobs | set | nosy:
+ koobs messages:
+ msg187430
|
2013-04-13 16:47:24 | serhiy.storchaka | set | stage: patch review -> resolved |
2013-04-13 09:29:03 | python-dev | set | messages:
+ msg186703 |
2013-04-12 18:23:41 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg186660
|
2013-04-10 16:44:24 | gregory.p.smith | set | priority: high -> release blocker |
2013-04-10 14:04:01 | georg.brandl | set | messages:
+ msg186496 |
2013-04-10 13:18:52 | ned.deily | set | messages:
+ msg186494 |
2013-04-10 13:16:59 | georg.brandl | set | messages:
+ msg186493 |
2013-04-10 13:11:14 | benjamin.peterson | set | messages:
+ msg186490 |
2013-04-09 19:01:10 | pitrou | set | nosy:
+ pitrou messages:
+ msg186444
|
2013-04-09 11:54:00 | christian.heimes | set | nosy:
+ christian.heimes
|
2013-04-08 18:32:16 | Vhati | set | messages:
+ msg186326 |
2013-04-08 10:03:01 | serhiy.storchaka | set | files:
+ zipfile_extract_unicode.patch priority: normal -> high
components:
+ Library (Lib) versions:
- Python 3.2, Python 3.3, Python 3.4 keywords:
+ patch type: crash -> behavior messages:
+ msg186285 stage: patch review |
2013-04-08 04:59:13 | gregory.p.smith | set | title: Python 2.7.4 Breaks ZipFile Extraction -> Python 2.7.4 breaks ZipFile extraction of zip files with unicode member paths versions:
+ Python 3.2, Python 3.3, Python 3.4 |
2013-04-08 04:46:01 | loewis | set | nosy:
+ loewis, georg.brandl, gregory.p.smith, amaury.forgeotdarc, larry, schmir, benjamin.peterson, ned.deily, Arfrever, r.david.murray, twb, catalin.iacob, python-dev, serhiy.storchaka messages:
+ msg186275
|
2013-04-08 04:40:49 | Vhati | set | files:
+ Kestrel Cruiser.zip
messages:
+ msg186273 |
2013-04-08 03:37:23 | Vhati | set | messages:
+ msg186265 |
2013-04-08 03:20:59 | Vhati | create | |