This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile.ZipFile().extractall() header mismatch for non-ASCII characters
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, M..Z., amaury.forgeotdarc, eli.bendersky, georg.brandl, loewis, python-dev, r.david.murray, vstinner
Priority: normal Keywords: patch

Created on 2010-12-31 13:38 by M..Z., last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bug_zipfile_extractall.zip M..Z., 2010-12-31 13:38 ZIP with three files that can reproduce the problem
zipfile.diff loewis, 2010-12-31 14:25
issue10801_test.1.patch eli.bendersky, 2011-01-01 07:56
Messages (18)
msg124964 - (view) Author: M. Zilmer (M..Z.) Date: 2010-12-31 13:38
Trying to unpack a ZIP file where some packet files contain danish letters results in:

    zipfile.BadZipFile: File name in directory 'filename_with_æoå.txt'
    and header b'filename_with_\x91o\x86.txt' differ.

Using Py 3.2b2 on Win7.

Unpack the attached ZIP file and run the Py script, which will show the problem using the enclosed two ZIP files.
msg124966 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-31 14:25
The attached patch fixes it for me. No time to write tests right now.
msg124978 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-31 21:34
FWIW, having just looked at related code in zipfile recently, this patch looks correct to me.
msg124991 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2011-01-01 05:12
I'll try to produce a test in the next hour or two
msg124992 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2011-01-01 07:54
I'm attaching a patch with a test for Martin's fix. I had trouble programmatically generating a "bad" zip for this bug, since it has different encodings for the header and filename (probably created by WinZip?). So I created a directory in test/ and placed the problematic zipfile M.Z. submitted in there, and wrote an appropriate test in test_zipfile.py

I verified the test fails on py3k trunk before Martin's fix, and succeeds after it, both by running the test file directly and through regrtest.

Note: Tested only on Ubuntu
msg124994 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-01-01 10:09
Committed patch and test in r87604.
msg124995 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-01-01 10:35
OK, looks like there is a problem on some buildbots: 

http://www.python.org/dev/buildbot/all/builders/AMD64%20Gentoo%20Wide%203.x/builds/863/steps/test/logs/stdio
msg124998 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-01-01 12:16
OK, I think r87606 fixed it: it doesn't extract the files, instead calls only open().
msg124999 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2011-01-01 12:44
Georg, did you figure out the root cause of the problem on that buildbot? Seeing it fails in open(targetpath, "wb"), extracting the file may have failed if the bot had no write permissions to the current directory, but the ascii encoding error is not what I'd expect in such a case.
msg125000 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-01-01 12:47
Well, it looks like the filesystem encoding is set to ASCII on these machines.
msg135838 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-12 14:35
Issue #12048 is a duplicate of this bug, but with Python 3.1. Should we backport the fix to Python 3.1?
msg136228 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-18 11:43
New changeset 1f0f0e317873 by Victor Stinner in branch '3.1':
Backport commit 33543b4e0e5d from Python 3.2: #10801: In zipfile, support
http://hg.python.org/cpython/rev/1f0f0e317873
msg136229 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-18 11:48
New changeset 243c78fbbb49 by Victor Stinner in branch '3.1':
Ooops, add the missing file of the backport of commit 33543b4e0e5d from Python
http://hg.python.org/cpython/rev/243c78fbbb49
msg136559 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2011-05-22 18:31
These changes cause test failure on 3.1 branch when verbose mode is disabled:

# python3.1 -m test.regrtest test_zipfile
test_zipfile
test test_zipfile produced unexpected output:
**********************************************************************
*** line 2 of actual output doesn't appear in expected output after line 1:
+ /usr/lib64/python3.1/test/zip_cp437_header.zip
**********************************************************************
1 test failed:
    test_zipfile
msg136566 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-22 20:13
New changeset 9ef8fc5454cb by Victor Stinner in branch '3.1':
Issue #10801: Remove a debug print() from test_zipfile
http://hg.python.org/cpython/rev/9ef8fc5454cb
msg136567 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-22 20:13
> These changes cause test failure on 3.1 branch when verbose mode is disabled

What a shame! I commited a debug "print()":
    1.39 +        print(fname)

It should be fixed by my last commit.
msg136682 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-05-23 17:04
Victor: you should have a look at <http://bitbucket.org/birkenfeld/hgcodesmell>.
msg138081 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-06-10 14:35
New changeset 33b7428e65b4 by Victor Stinner in branch '3.1':
Issue #10801: Fix test_unicode_filenames() of test_zipfile
http://hg.python.org/cpython/rev/33b7428e65b4
History
Date User Action Args
2022-04-11 14:57:10adminsetgithub: 55010
2011-06-10 14:35:35python-devsetmessages: + msg138081
2011-05-23 17:04:30georg.brandlsetmessages: + msg136682
2011-05-22 20:13:48vstinnersetmessages: + msg136567
2011-05-22 20:13:12python-devsetmessages: + msg136566
2011-05-22 18:31:36Arfreversetnosy: + Arfrever
messages: + msg136559
2011-05-18 11:48:43python-devsetmessages: + msg136229
2011-05-18 11:43:29python-devsetnosy: + python-dev
messages: + msg136228
2011-05-12 14:35:47vstinnersetmessages: + msg135838
2011-01-01 12:47:39georg.brandlsetnosy: loewis, georg.brandl, amaury.forgeotdarc, vstinner, r.david.murray, eli.bendersky, M..Z.
messages: + msg125000
2011-01-01 12:44:57eli.benderskysetnosy: loewis, georg.brandl, amaury.forgeotdarc, vstinner, r.david.murray, eli.bendersky, M..Z.
messages: + msg124999
2011-01-01 12:16:56georg.brandlsetstatus: open -> closed
nosy: loewis, georg.brandl, amaury.forgeotdarc, vstinner, r.david.murray, eli.bendersky, M..Z.
messages: + msg124998
2011-01-01 10:35:58georg.brandlsetstatus: closed -> open
nosy: loewis, georg.brandl, amaury.forgeotdarc, vstinner, r.david.murray, eli.bendersky, M..Z.
messages: + msg124995
2011-01-01 10:09:59georg.brandlsetstatus: open -> closed

nosy: + georg.brandl
messages: + msg124994

resolution: accepted
2011-01-01 07:56:51eli.benderskysetfiles: + issue10801_test.1.patch
nosy: loewis, amaury.forgeotdarc, vstinner, r.david.murray, eli.bendersky, M..Z.
2011-01-01 07:56:35eli.benderskysetfiles: - issue10801_test.1.patch
nosy: loewis, amaury.forgeotdarc, vstinner, r.david.murray, eli.bendersky, M..Z.
2011-01-01 07:54:15eli.benderskysetfiles: + issue10801_test.1.patch
nosy: loewis, amaury.forgeotdarc, vstinner, r.david.murray, eli.bendersky, M..Z.
messages: + msg124992
2011-01-01 05:12:14eli.benderskysetnosy: + eli.bendersky
messages: + msg124991
2010-12-31 21:34:04r.david.murraysetnosy: + r.david.murray
messages: + msg124978
2010-12-31 14:25:54loewissetfiles: + zipfile.diff

messages: + msg124966
keywords: + patch
nosy: loewis, amaury.forgeotdarc, vstinner, M..Z.
2010-12-31 14:03:42pitrousetnosy: + amaury.forgeotdarc, loewis, vstinner
2010-12-31 13:38:34M..Z.create