This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile can't extract file
Type: behavior Stage: patch review
Components: Library (Lib), Windows Versions: Python 3.7, Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Jim.Jewett, MorganRamsay, NewerCookie, Sean Goodwin, amaury.forgeotdarc, apolkosnik, berker.peksag, chuck, francismb, georg.brandl, gregory.p.smith, ncoghlan, ronaldoussoren, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2009-09-04 19:56 by NewerCookie, last changed 2022-04-11 14:56 by admin.

Files
File name Uploaded Description Edit
test.zip NewerCookie, 2009-09-04 19:56 mildly corrupt zipfile to test error handling
zlib_forward_slash.patch chuck, 2009-09-19 17:01 review
zipfile_276_filename_mismatch_v2.patch apolkosnik, 2014-04-30 16:10 patch with warnings against 2.7.6
zipfile_340_filename_mismatch_v3.patch apolkosnik, 2014-04-30 18:28 patch with warnings against 3.4.0
zipfile_276_filename_mismatch_v3.patch apolkosnik, 2014-04-30 19:11 patch with print against 2.7.6
Pull Requests
URL Status Linked Edit
PR 14212 open python-dev, 2019-06-18 21:36
Messages (61)
msg92265 - (view) Author: Kim Kyung Don (NewerCookie) Date: 2009-09-04 19:57
The following exception occured when I tried to extract on Windows.

"zipfile.BadZipfile: File name in directory "test\test2.txt" and header
"test/test2.txt" differ."

It seems like problem about slash.
I tested using by zipfile Revision 72893.
msg92297 - (view) Author: Kim Kyung Don (NewerCookie) Date: 2009-09-06 04:02
P.S
I tested extraction by using 7-zip.
It works fine.
msg92309 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2009-09-06 12:40
The zipfile is technically incorrect, the zipfile specification prescribes 
that all filenames use '/' as the directory separator.

Even without that caveat the file is corrupt because the zipfile directory 
header and the per-file header don't agree on the name of the file.

That said: IMHO the current code in zipfile.ZipFile.open is too strict, it 
shouldn't raise an error when the two names aren't exactly the same 
because there are valid reasons for them to be different (such as renaming 
a file in the zipfile without rewriting the entire zipfile).
msg92326 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2009-09-06 18:58
FileRoller doesn't complain about the mismatched slashes either.  Where
did the ZIP come from, by the way?  I seem to recall that there have
been other instances in which ZIP applications were more "forgiving"
than the zipfile module.  How far should zipfile go in bending the
interpretation of the ZIP standard?  

As far as the renaming goes, it seems the standard says the header name
should be used if the two names are different.  If nobody else has time
to make a patch and tests I can take a stab at it in the next few days.
msg92330 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2009-09-06 20:41
alan: I don't quite understand which filename you want to use when the 
name in the per-file header and the central directory don't match. 

Where in the standard is this prescribed? I couldn't find anything in 
the PKWare zipfile appnote [1]

My preference would be to use the central directory as the canonical 
value because scanning the entire zipfile to read the per-file header 
would give a significant overhead. This might not be very noticable with 
small zipfiles, but I regularly use zipfiles with over 100K files in 
them in those files a scan of the zipfile is prohibitively expensive.

Furthermore, when the two are different the most reasonably explaination 
is that an in-place edit of the zipfile changed the directory without 
rewriting the entire zipfile (just like you can "delete" files from a 
zipfile by dumping them from the directory rather than completely 
rewriting the entire archive)

[1] 
APPNOTE.TXT - .ZIP File Format Specification Version: 6.3.2 
Revised: September 28, 2007 
Copyright (c) 1989 - 2007 PKWARE Inc., All Rights Reserved.
msg92335 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2009-09-06 21:26
Sorry about the confusion--I think I confused myself by looking at the
bit about CRC checksums in the "Info-ZIP Unicode Path Extra Field"
section before I posted.  I meant to say that the central directory name
looks preferred over the per-file header.

n section J, under "file name (Variable)" there's a bit that says:

"If input came from standard input, there is no file name field.  If
encrypting the central directory and general purpose bit flag 13 is set
indicating masking, the file name stored in the Local Header will not be
the actual file name.  A masking value consisting of a unique
hexadecimal value will be stored."

So in these cases the central directory name has to be used.  And, as
you pointed out, some operations like "deleting" a member from the
archive are implemented by editing the central directory, so it would
seem that the central directory should be used if there's a conflict.
msg92516 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2009-09-11 18:57
In the case at issue, the file name is the same (contrary to the error
message). The two representations of the *path* are different, but
equivalent. There is no ambiguity: the file should be put in directory
'test' and named 'test2.txt'. So I think zipfile should do what 7zip
does and do just that.

An actual filename difference might be argued differently.
msg92874 - (view) Author: Jan (chuck) * Date: 2009-09-19 17:01
I added a patch to replace back slashes by forward slashes in three 
places, only one if them actually relevant to the errors in the attached 
.zip file.

I kept the exception for mismatching filenames, but if you think it is 
appropriate to remove it I could do that as well.
msg116384 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-09-14 11:26
I agree with the change, but the code should be factorized in a function (normalize_filename for example)
msg116385 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010-09-14 11:35
I'd prefer if the code no longer checked if the filename in the directory matches the name in the per-file header.

The reason of that is that the two don't have to match: it is relatively cheap to rename a file in the zipfile by rewriting the directory while rewriting the entire zipfile can be pretty expensive when zipfiles get large.

It's probably worthwhile to test what other zipfile tools do in the respect (e.g., create a zipfile where the filename in the header doesn't match the name in the directory and extract that zip using a number of popular tools).


(I have a slightly odd perspective on this because I regularly deal with zipfiles containing over 100K files and over 10GByte of data).
msg200165 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2013-10-17 20:42
I've got bitten by a different variation of this bug.

In my case the issue can be summarized by:
zipfile.BadZipfile: File name in directory "Windows\TEMP\test.tmp" and header "C:\Windows\TEMP\test.tmp" differ.

Attached is a patch for Python27/lib/zipfile.py. I understand that it might not be the best approach, but at least we just compare the filenames without caring much about those pesky paths preceding them.
msg201842 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2013-10-31 18:56
Just tested my patch on mac, and it appears that it didn't work on OSX (and likely on other unix platforms too).

Conclusion... os.path.basename() will not do anything to windows paths when running on unix.

I'm thinking that instead of bailing at 'File name in directory "%s" and header "%s" differ.', the library should just print a warning, and continue.
msg208970 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-01-23 17:59
I'm in a similar situation, my test file raises this:

File name in directory "windows\TEMP\\test123.txt" and header "C:\windows\TEMP\\test123.txt" differ.

It turns out that I can't find any cross platform procedures for processing the paths between the different platforms. And there are other things like doing it in portable way; os.path.split() nor os.path.basename() won't touch windows paths on un*x, etc...

So, I'd like to propose an easy way, just allow the process to extract the files (and print a warning message) rather that just raising an exception (raise BadZipfile,...) and stopping the extraction altogether.
msg208973 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-01-23 18:20
This one has the parentheses for print, so that it works in python 3.x. Also, the default fallback behavior in this case is to use the filename from the zips' directory (the first path in the warning).
msg208975 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-01-23 18:34
As I wrote in msg116385 I'd prefer to drop the consistency check completely because updating data like the filename in the central directory is a cheap way to rename files without completely rewriting the zip file.
msg208982 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-01-23 20:04
Can we get this simple "fix" implemented in time for the next 2.7.x release?!

Thank you!
msg208983 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2014-01-23 20:09
print() is not a good way to emit the warning; please use the warnings module.
msg208985 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-23 20:12
> As I wrote in msg116385 I'd prefer to drop the consistency check completely
> because updating data like the filename in the central directory is a cheap
> way to rename files without completely rewriting the zip file.

It should at least left as debugging print.

It can't be a warning, because it depends not on user's actions, but on 
external data. But user still should be able to investigate uncommon zipfiles 
by setting the debug attribute.
msg208987 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-01-23 20:22
Excellent, please see my third attempt.
msg209023 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-01-23 23:52
Adam this is not a security issue (2.6, 3.1, 3.2), nor a future issue that must wait for 3.5.
msg211562 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-02-18 21:54
It might not be a regular "security" issue, but it is not extracting some files that it should. There's a possible scenario, where it can be a security issue.
msg217533 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-29 17:58
Gentlemen,

Is there's any way this fix can be included in any version?
Currently, the fact that the exception is thrown makes extracting some zip files impossible with this library, and rolling your own is a bit painful. (either using a wrapper around 7zip to handle those or just provide cloned/patched versions for every major python version).

This ridiculous behavior is really not consistent with other ZIP implementations (7zip just ignores the mismatch).

Thank you for your time and effort.
msg217546 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-04-29 20:41
Adam P, please don't screw around with the version headers. If you want to claim that this is a security issue of the type we care about (threats to the public internet) for patching old releases, and severe enough that we should do anything about it, send a detailed explanation with links to evidence to security response team. Simple writing 'a possible scenario' is insufficient.
msg217551 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-29 21:08
For the version headers, I've added the versions featuring the broken behavior. That's all.

I'm not saying that this is 

I'm extracting malware from the Central Quarantine files, and the vendor's implementation is broken and is causing this issue for me on every single file inside the archive.

Let's say, I've got a wrapper script that feeds the contents of a zip file to be scanned with this, because of this behavior, the wrapper will error out... Customers will say your product sucks, etc.

Does this really take an act of god to fix this?
msg217554 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-29 21:15
Also, this behavior is present on all platforms and all versions of Python (zipfile Library), so maybe the headers should be adjusted there too.

I'm not saying that this is necessarily a big freaking hole, but by using this, one can prevent files from being extracted using this simple trick.
msg217556 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-29 21:21
If I got a file scanner in my mail gateway implemented with this, one can easily avoid getting the contents of zip-files scanned. Is that enough of a security impact?
msg217558 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-29 21:42
I've also tested with WinZip, and Windows Explorer, on windows. Both extract the contents of test.zip without a warning (just like 7zip on Windows did). This behavior counts as Denial Of Service if the zipfile Library is used to extract files, besides lots of formats use ZIP as an envelope; DOCX, APK, JAR, EPUB come to mind.
msg217561 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-04-29 23:12
Adam P. I politely asked you to leave the headers alone. Since you ignored that, LEAVE THE HEADERS ALONE! If you continue, you will eventually get banned. 

An issue gets dealt with when a volunteer core developer makes it his top priority. In the past 24 hours, patches were pushed for 16 different issues, but not this one. Sorry, but that is how it is.
msg217569 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 04:16
Terry, I apologize about the second change of headers, somehow I must have used the submission form to post the comment from a tab that had the old content, and the headers didn't refresh there. I assure you that it was not my intention to change them again.
msg217570 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 04:23
In any event, I think that zipfile_stupid3.patch would be the best trivial fix to this issue.
msg217571 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-04-30 04:34
The check can be simplified further to "if self.debug and fname != zinfo.orig_filename:", but the conversion to a debugging print seems reasonable to me.
msg217572 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 05:16
Patch against 2.7.6 attached.
msg217573 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-04-30 05:19
Nick, do you agree that this should be treated as a bug (apply to all 3 versions)?
Should debug messages be 'print'ed, sent to stderr, or go through the warnings module?
msg217574 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 05:23
Patch against zipfile 3.4.0 attached.
msg217575 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 05:27
update
msg217576 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 05:30
Once again patch against 2.7.6
msg217578 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2014-04-30 07:23
Don't use print (to stdout) or sys.stderr directly.  There are already many other uses of warnings.warn within the zipfile module.  Be consistent with those.

Existing zipfile warnings seem to favor lazily importing warnings when its needed rather than a top level 'import warnings'.  While I find that annoying, there are sometimes reasons to do it and the minimally invasive change that is consistent with the rest of the existing code is to do the same thing here.

something similar to:

+            if self.debug and fname != zinfo.orig_filename:
+                import warnings
+                warnings.warn(
+                        'Warning: Filename in directory "%s" and header "%s" differ.' % (
+                            zinfo.orig_filename, fname))
msg217616 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-04-30 12:54
As Greg suggested, the important thing is to follow the precedent set by
other debug messages in the module.
msg217624 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 15:51
Attached is a patch with warnings against 2.7.6
msg217625 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 15:52
Attached is a patch with warnings against 3.4.0
msg217627 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 16:10
Attached is a patch with warnings against 2.7.6 (this one should be good to go)
msg217634 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2014-04-30 18:01
--- a/zipfile.py	Wed Apr 30 11:27:16 2014
+++ b/zipfile.py	Wed Apr 30 11:27:01 2014
@@ -1174,8 +1174,9 @@
             else:
                 fname_str = fname.decode("cp437")
 
-            if fname_str != zinfo.orig_filename:
-                raise BadZipFile(
+            if self.debug and fname_str != zinfo.orig_filename:
+                import warnings
+                warnings.warn(
                     'File name in directory %r and header %r differ.'
                     % (zinfo.orig_filename, fname))

Also, you need to add ``stacklevel=2`` to warnings.warn().
msg217635 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 18:28
3.4.0 pathc with stacklevel=2
msg217636 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-04-30 18:31
I'm leaving it as "needs patch" because it isn't clear exactly what a committer should do.  

I think the current intent is to make the changes listed in zipfile_???_filename_mismatch_v2.patch (which are not listed as reviewable -- but the changes are indeed sufficiently straightforward that the the files -- if need be -- could be edited by hand as if they were made originally by the committer.)

This change is small enough (warning instead of raise) that a test case is probably not strictly required, but it would be helpful.

test.zip would presumably be useful data for a test case.

There is dispute over whether this would be an enhancement (more generous with what to accept), a bug fix, or a security *regression* because it still allows old vulnerable files to stick around unreplaced (or to hide from a malware scanner), but no longer raises an Exception to get attention.  (warnings are often ignored)




zlib_forward_slash.patch would also be good (and might even be a security fix, by allowing the new versions to be installed), but is not ready to be committed, as 
(A) it repeats the logic inline instead of using the newly defined helper method
(B) it doesn't have a test case (test1.zip should help when creating one)
(C) it has neither a doc change nor an explicit (and dubious) statement that this is just a bug fix and wouldn't need to be listed in the versionchanged. 


There is also a question of how general the filename correction should be, particularly with respect to windows drives and capitalization.  The one in this patch seems to be the minimal change, and is explicitly supported by the zip spec.
msg217638 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-04-30 18:33
Presumably the stacklevel applies to all versions; verifying that it warns about the right code location is important enough to require a test case.
msg217641 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 19:05
I just looked through 2.7.6 version of zipfile, and the the error handling there is either through using raise() or print(). So, inline with the guidance provided for 2.7.6, perhapswe should stick with print() instead of warning.warn(). I'll post that a bit later.

test.zip up there is the test case for this change. Is there any other test case needed?
msg217642 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-04-30 19:08
Adam, please stop deleting the files.  It makes for a lot of noise to those on the nosy list, and is unnecessary.

Just make sure you increment the version number on the files you upload and it will be fine.

Thanks.
msg217643 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 19:11
Jim, 

I've got some test cases where the zlib_forward_slash.patch doesn't cut it. That was the reason for trying a broader approach with filename_mismatch patches.
msg217647 - (view) Author: Francis MB (francismb) * Date: 2014-04-30 20:13
A small question related to: "zipfile_276_filename_mismatch_v3.patch"

--- a/zipfile.py	Wed Apr 30 11:44:38 2014
+++ b/zipfile.py	Wed Apr 30 15:10:38 2014
@@ -970,10 +970,10 @@
             if fheader[_FH_EXTRA_FIELD_LENGTH]:
                 zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
 
-            if fname != zinfo.orig_filename:
-                raise BadZipfile, \
+            if self.debug and fname != zinfo.orig_filename:
+                print( \
                         'File name in directory "%s" and header "%s" differ.' % (
-                            zinfo.orig_filename, fname)
+                            zinfo.orig_filename, fname))

Shouldn't a change from raising an exception to a print be somewhere documented?

Thanks
msg217648 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2014-04-30 20:42
The bug was that BadZipFile was being raised when it shouldn't be so I wouldn't worry about documenting the behavior change other than in the Misc/NEWS entry that the ultimate commiter writes up.
msg217659 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-04-30 21:51
Is there anything else that you need me to provide?
msg217660 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-04-30 22:00
On Wed, Apr 30, 2014 at 3:05 PM, Adam Polkosnik wrote:

> test.zip up there is the test case for this change. Is there any other test case needed?

ah; I see the confusion.  test.zip is test *data*.  When I asked for a
test *case*, I meant something that ensures the data will be used to
actually run the test automatically.

Typically, that would involve adding something to
Lib/test/test_zipfile.py.  I'm guessing it would be easiest to add a
new class inheriting from unittest.TestCase and opening test.zip in
the setUp, then using a bunch of assert* methods to verify that the
file was read and interpreted correctly.

-jJ
msg217661 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-04-30 22:13
On Wed, Apr 30, 2014 at 3:11 PM, Adam Polkosnik

> I've got some test cases where the zlib_forward_slash.patch doesn't cut it.

My recommendation (and I could be convinced otherwise) would be to replace

    if fname_str != zinfo.orig_filename:
        raise ...

with something more like

    self.filename_check(fname_str,  zinfo.orig_filename)

and a default implementation of filename_check that does nothing if
they're equal; calls the slash replace (since the standard supports
that correction); does nothing else if they're now equal; emits a
warning (or prints, in 2.7.6) otherwise.

In 2.7.6, you would have to keep the new methods private, but in 3.5,
users could override filename_check to handle the windows path
normalization, or whatever other problems you have documented.
msg217743 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-05-02 05:14
Jim,

The problems documented here are related to two cases (both apparently arriving from world of windows): 
1. two relative paths with inverted slash in one of them (test\test2.txt vs test/test2.txt)
2. relative path vs absolute path (windows\temp\test.txt vs c:\windows\temp\test.txt)

The extraction part seems to be doing a good job at writing the files into sane locations.

IMHO, there's no point in trying to replace slashes or otherwise "normalize", as this would fix the cases where the presence of an inverted slashes should be noted in debug output. 
By the same token stripping the drive letter from the absolute path part would just deprive  us from noticing such intricacies in these special zip files.
msg217753 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-05-02 14:55
On Fri, May 2, 2014 at 1:14 AM, Adam Polkosnik
> The problems documented here are related to two cases (both apparently arriving from world of windows):

Good!  I had thought you had even more!

> 1. two relative paths with inverted slash in one of them (test\test2.txt vs test/test2.txt)

My understanding from earlier -- and I may have been reading too much
into some of the comments -- is that the standard defined \filename as
an inferior alias for /filename and supported the fix.

Notably, if you're extracting on windows with windows conventions,
then windows will treat them identically anyhow.

If you're extracting a windows file to a unix environment, then \t
really should be translated to /t.

> 2. relative path vs absolute path (windows\temp\test.txt vs c:\windows\temp\test.txt)

These really are different, as leaving off the "C:" should mean
"current drive", which will often (but not always) be C:

This (and differing capitalization) are among the reasons to do the
filename fix in a separate method, so that subclasses with more local
knowledge can more easily do the right thing.

Note that for python 3.4 and newer, pathlib <URL:
https://docs.python.org/3/library/pathlib.html> may be helpful.  It
would probably even be possible to backport the essential parts as an
implementation detail. But I'm not sure if that could be done
compatibly with maintenance releases, or how much work it would take.

> The extraction part seems to be doing a good job at writing the files into sane locations.
> IMHO, there's no point in trying to replace slashes or otherwise "normalize", as this would fix the cases where the presence of an inverted slashes should be noted in debug output.

My understanding had been that it was failing to extract entirely.  So
exactly what is the problem?
msg217754 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-05-02 15:19
Extraction works fine, the issue was that raise() was creating an exception, and stoping the whole extraction process. When replaced with a warning, everything works fine.
msg217756 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-05-02 15:53
Adam Polkasnik said:
--------------------
> Extraction works fine, the issue was that raise() was creating an exception, and
> stopping the whole extraction process.

That doesn't make sense.  If an exception was "stopping the whole extraction process" then extraction was not working fine.

Questions:

  - Are the names with '\' in them in the central directory, or the per-file header?

  - If in the central directory (which is the name we are going to use, yes?) how do
    we tell if the '\' should be a '/' or an escape? (such as '\t')
msg217778 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-05-02 19:30
Ethan,
I'd refer you to msg92309...

And
When testing with WinZip it looks like this: 
No errors detected in compressed data of C:\Downloads\test.zip.
Testing ...
Testing test\                    OK
Testing test\test2.txt           OK
Testing test1.txt                OK

Then in python:
Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:25:23) [MSC v.1600 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import zipfile
>>> zf =  zipfile.ZipFile('test.zip')
>>> namelist = zf.namelist()
>>> namelist
['test/', 'test/test2.txt', 'test1.txt']
>>> for af in namelist:
...     zf.read(af)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "c:\Python34\lib\zipfile.py", line 1117, in read
    with self.open(name, "r", pwd) as fp:
  File "c:\Python34\lib\zipfile.py", line 1180, in open
    % (zinfo.orig_filename, fname))
zipfile.BadZipFile: File name in directory 'test\\' and header b'test/' differ.

So, based on that everything is already converted to forward slashes for the extraction.
msg217932 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-05-05 16:35
Ah, so when you (Adam) said "extraction works fine", what you meant was "extraction works fine *in other programs*".  Okay.
msg217933 - (view) Author: Adam Polkosnik (apolkosnik) * Date: 2014-05-05 16:40
Both. Other programs, and in python scripts when raise() is removed in zipfile.py. Unless your results are different.
msg345999 - (view) Author: Morgan Ramsay (MorganRamsay) * Date: 2019-06-18 16:41
The encoding test in ZipFile.open() is highly opinionated and has no purpose beyond itself. Testing for encoding issues should be done outside this library in the user's own code.

Using the 3.7.2 version of ZipFile, this is my proposal:

https://gist.github.com/MorganRamsay/696e89450e0f172c16ac8dfc016eb79f/revisions?diff=unified

Currently, I'm subclassing ZipFile with this patch and I've had no issues with extracting thousands of different ZIP files on Windows. I can't attest to this solution's applicability on other platforms.
History
Date User Action Args
2022-04-11 14:56:52adminsetgithub: 51088
2019-06-18 21:36:54python-devsetstage: needs patch -> patch review
pull_requests: + pull_request14050
2019-06-18 16:41:11MorganRamsaysetnosy: + MorganRamsay

messages: + msg345999
versions: + Python 3.6, Python 3.7
2015-07-21 07:09:20ethan.furmansetnosy: - ethan.furman
2015-06-18 19:59:54Sean Goodwinsetnosy: + Sean Goodwin
2014-05-05 16:40:07apolkosniksetmessages: + msg217933
2014-05-05 16:35:24ethan.furmansetmessages: + msg217932
2014-05-02 19:30:24apolkosniksetmessages: + msg217778
2014-05-02 15:53:17ethan.furmansetmessages: + msg217756
2014-05-02 15:19:29apolkosniksetmessages: + msg217754
2014-05-02 14:55:38Jim.Jewettsetmessages: + msg217753
2014-05-02 05:14:04apolkosniksetmessages: + msg217743
2014-05-01 00:05:52alanmcintyresetnosy: - alanmcintyre
2014-04-30 22:13:45Jim.Jewettsetmessages: + msg217661
2014-04-30 22:00:30Jim.Jewettsetmessages: + msg217660
2014-04-30 21:51:58apolkosniksetmessages: + msg217659
2014-04-30 20:42:23gregory.p.smithsetmessages: + msg217648
2014-04-30 20:13:09francismbsetnosy: + francismb
messages: + msg217647
2014-04-30 19:11:50apolkosniksetfiles: + zipfile_276_filename_mismatch_v3.patch

messages: + msg217643
2014-04-30 19:08:59ethan.furmansetmessages: + msg217642
2014-04-30 19:05:26apolkosniksetmessages: + msg217641
2014-04-30 18:48:28apolkosniksetfiles: - zipfile_340_filename_mismatch_v2.patch
2014-04-30 18:33:23Jim.Jewettsetmessages: + msg217638
2014-04-30 18:31:41Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg217636
2014-04-30 18:28:56apolkosniksetfiles: + zipfile_340_filename_mismatch_v3.patch

messages: + msg217635
2014-04-30 18:01:52berker.peksagsetnosy: + berker.peksag
messages: + msg217634
2014-04-30 16:10:33apolkosniksetfiles: + zipfile_276_filename_mismatch_v2.patch

messages: + msg217627
2014-04-30 15:53:40apolkosniksetfiles: - zipfile_276_filename_mismatch_v2.patch
2014-04-30 15:52:37apolkosniksetfiles: + zipfile_340_filename_mismatch_v2.patch

messages: + msg217625
2014-04-30 15:51:57apolkosniksetfiles: + zipfile_276_filename_mismatch_v2.patch

messages: + msg217624
2014-04-30 15:50:50apolkosniksetfiles: - zipfile_276_filename_mismatch.patch
2014-04-30 15:50:42apolkosniksetfiles: - zipfile_stupid3.patch
2014-04-30 15:50:33apolkosniksetfiles: - zipfile_340_filename_mismatch.patch
2014-04-30 12:54:14ncoghlansetmessages: + msg217616
2014-04-30 07:23:25gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg217578
2014-04-30 05:30:35apolkosniksetfiles: + zipfile_276_filename_mismatch.patch

messages: + msg217576
2014-04-30 05:27:23apolkosniksetfiles: + zipfile_340_filename_mismatch.patch

messages: + msg217575
2014-04-30 05:26:12apolkosniksetfiles: - zipfile_276_filename_mismatch.patch
2014-04-30 05:25:59apolkosniksetfiles: - zipfile_340_filename_mismatch.patch
2014-04-30 05:23:54apolkosniksetfiles: + zipfile_340_filename_mismatch.patch

messages: + msg217574
2014-04-30 05:19:50terry.reedysetmessages: + msg217573
2014-04-30 05:16:10apolkosniksetfiles: + zipfile_276_filename_mismatch.patch

messages: + msg217572
2014-04-30 04:34:06ncoghlansetnosy: + ncoghlan
messages: + msg217571
2014-04-30 04:23:10apolkosniksetmessages: + msg217570
2014-04-30 04:16:52apolkosniksetmessages: + msg217569
2014-04-30 04:12:23ethan.furmansetnosy: + ethan.furman
2014-04-29 23:12:25terry.reedysetmessages: + msg217561
versions: - Python 3.1, Python 3.2, Python 3.3
2014-04-29 21:42:14apolkosniksetmessages: + msg217558
2014-04-29 21:21:07apolkosniksetmessages: + msg217556
2014-04-29 21:15:57apolkosniksetmessages: + msg217554
2014-04-29 21:08:29apolkosniksetmessages: + msg217551
versions: + Python 3.1, Python 3.2, Python 3.3
2014-04-29 20:41:34terry.reedysetmessages: + msg217546
versions: - Python 3.1, Python 3.2, Python 3.3
2014-04-29 17:58:44apolkosniksetmessages: + msg217533
versions: + Python 3.1, Python 3.2, Python 3.5
2014-02-18 21:54:32apolkosniksetmessages: + msg211562
2014-01-23 23:52:27terry.reedysetmessages: + msg209023
versions: - Python 2.6, Python 3.1, Python 3.2, Python 3.5
2014-01-23 20:23:28apolkosniksetfiles: - zipfile_stupid.patch
2014-01-23 20:23:23apolkosniksetfiles: - zipfile_stupid2.patch
2014-01-23 20:22:51apolkosniksetfiles: + zipfile_stupid3.patch

messages: + msg208987
2014-01-23 20:12:46serhiy.storchakasetmessages: + msg208985
2014-01-23 20:09:18georg.brandlsetnosy: + georg.brandl
messages: + msg208983
2014-01-23 20:04:33apolkosniksetmessages: + msg208982
2014-01-23 18:34:28ronaldoussorensetmessages: + msg208975
2014-01-23 18:20:51apolkosniksetfiles: + zipfile_stupid2.patch

messages: + msg208973
2014-01-23 17:59:11apolkosniksetfiles: + zipfile_stupid.patch

messages: + msg208970
versions: + Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5
2014-01-23 17:37:39apolkosniksetfiles: - zipfile.py.patch
2013-10-31 18:56:35apolkosniksetmessages: + msg201842
2013-10-24 09:56:32tim.goldensetnosy: - tim.golden
2013-10-17 20:42:49apolkosniksetfiles: + zipfile.py.patch
versions: + Python 2.7
nosy: + apolkosnik

messages: + msg200165
2012-09-28 14:01:24tim.goldensetassignee: tim.golden ->
2012-04-07 19:11:36serhiy.storchakasetnosy: + serhiy.storchaka
2010-09-14 11:35:01ronaldoussorensetmessages: + msg116385
2010-09-14 11:26:24amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg116384
2010-08-06 15:35:16tim.goldensetassignee: tim.golden

nosy: + tim.golden
2009-09-19 17:01:06chucksetfiles: + zlib_forward_slash.patch

nosy: + chuck
messages: + msg92874

keywords: + patch
2009-09-11 21:01:53amaury.forgeotdarcsetstage: needs patch
2009-09-11 18:57:17terry.reedysetnosy: + terry.reedy
messages: + msg92516
2009-09-06 21:26:51alanmcintyresetmessages: + msg92335
2009-09-06 20:41:54ronaldoussorensetmessages: + msg92330
2009-09-06 18:58:40alanmcintyresetnosy: + alanmcintyre
messages: + msg92326
2009-09-06 12:40:05ronaldoussorensetnosy: + ronaldoussoren
messages: + msg92309
2009-09-06 04:02:01NewerCookiesetmessages: + msg92297
2009-09-04 19:57:57NewerCookiesetmessages: + msg92265
2009-09-04 19:56:50NewerCookiecreate