Title: zipfile - extract truncates (existing) file when bad password provided (zip encryption weakness)
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: CristiFati, alanmcintyre, ned.deily, serhiy.storchaka, twouters
Priority: normal Keywords: patch

Created on 2019-03-08 22:56 by CristiFati, last changed 2019-03-18 02:02 by ned.deily.

Pull Requests
URL Status Linked Edit
PR 12242 closed CristiFati, 2019-03-08 23:14
Messages (5)
msg337543 - (view) Author: Cristi Fati (CristiFati) * Date: 2019-03-08 22:56
PKWARE encryption password pre check algorithm (relying on an 8 bits value to differentiate passwords) is insanely short.

Most of the wrong passwords are filtered out by the check, but some of them aren't. For the ones in the latter category, when trying to extract an archive member, a 0 lengthed file with its name will be created on the FS (overwriting any previous version).


1. Extract an archive member using the good password. File extracted
2. Extract the same member using a wrong password:
    2.1 For most of the passwords, they will be detected and the operation cancelled
    2.2 But some of them, they won't be detected (false positives), but the decryption itself will fail overwriting the file (from #1.) on FS but leaving it with 0 bytes content

This is the about #2.2.

More details on [[SO]: zipfile.BadZipFile: Bad CRC-32 when extracting a password protected .zip & .zip goes corrupt on extract (@CristiFati's answer)](
msg337545 - (view) Author: Cristi Fati (CristiFati) * Date: 2019-03-08 23:14

As a note, it applies to any Python version.
msg337570 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-03-09 11:24
When you pass an incorrect password, it is not possible to guarantee that you will get an exception and the destination file will be kept unchanged. It is possible that you will get an incorrectly deciphered file without noticing.

I do not think that the zipfile code needs to be changed. If you want to keep the content of the target file in case of error, just extract the file into other place.
msg337610 - (view) Author: Cristi Fati (CristiFati) * Date: 2019-03-10 11:27
Hm, I assumed that a bad password, will raise an exception (at some point). but, if it doesn't, the destination file will be overwritten (with the messed up content), which also happens now (so, no behavior change).

This is trying to make wrong passwords behavior (when an exception is raised) consistent.

What I can think of is that when some bytes were already extracted when the exception occurs, overwrite the existing file (if any), so at the end the faulty content will be kept (although I don't know haw this could help anyone), but in any case a 0 lengthed file is not a good option (from my PoV).

Of course specifying another dir is an option, but I only see it as a workaround.
msg338155 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-03-18 02:02
I'll let it up to other coredevs to decide whether this is behavior that should be changed in master for the next feature release and/or whether a doc change is needed.  But we are not going to change the behavior of 3.6 for sure so I am closing PR 12242.
Date User Action Args
2019-03-18 02:02:48ned.deilysetversions: + Python 3.8
nosy: + ned.deily

messages: + msg338155

resolution: not a bug ->
2019-03-10 11:27:48CristiFatisetmessages: + msg337610
versions: - Python 3.6
2019-03-09 11:24:14serhiy.storchakasetresolution: not a bug
messages: + msg337570
2019-03-09 07:30:16xtreaksetnosy: + twouters, alanmcintyre, serhiy.storchaka
2019-03-08 23:14:32CristiFatisetkeywords: + patch

stage: patch review
messages: + msg337545
pull_requests: + pull_request12230
2019-03-08 22:56:50CristiFaticreate