classification
Title: behavioral differences between shutil.unpack_archive and ZipFile.extractall
Type: behavior Stage: needs patch
Components: Documentation, Library (Lib) Versions: Python 3.3, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, eric.araujo, peter@psantoro.net, pitrou, r.david.murray
Priority: normal Keywords: patch

Created on 2014-03-13 09:46 by peter@psantoro.net, last changed 2014-03-20 12:45 by r.david.murray.

Files
File name Uploaded Description Edit
shutil.diff peter@psantoro.net, 2014-03-13 09:46 patch
test_unpack.zip peter@psantoro.net, 2014-03-13 23:12 test script and test zips
Messages (7)
msg213374 - (view) Author: Peter Santoro (peter@psantoro.net) * Date: 2014-03-13 09:46
Since Python 3.3.1, ZipFile.extractall was enhanced to better handle absolute paths and illegal characters.  The associated logic within shutil._unpack_zipfile essentially skips zip members with these issues.

If a zip file contains all absolute paths, ZipFile.extractall works as expected (i.e. the zip file is unpacked), but shutil._unpack_zipfile (normally called indirectly via shutil.unpack_archive) appears to do nothing (i.e. it silently fails to unpack the zip file).

The attached patch attempts to unify the behavior of extracting zip files between shutil.unpack_archive with ZipFile.extractall.
msg213494 - (view) Author: Peter Santoro (peter@psantoro.net) * Date: 2014-03-13 23:12
I've attached a zip file which contains a test script and test zip files for the previously submitted Python 3.3.5 patch.  See the included README.txt for more information.  To view the contents of the included bad.zip file, use the following command:

> unzip -l bad.zip
msg214082 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-19 11:40
Well, you have to assume the code you're removing is there for a reason (e.g. perhaps this is meant to protect from attacks when opening a zip file uploaded by a user). 

I'd like to hear from Éric on this.
msg214093 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-19 13:19
First step would be to get rid of the warning in the zipfile docs and replace it with the info that the absolute path '/' and any relative path are stripped silently before the file is extracted.

It would also be worth adding an enhancement to zipfile to optionally not do it silently.

I hope the same considerations apply to tarfile, but I haven't checked.

In other words, I do think that code is a holdover from when zipfile *wasn't* safe, but since I didn't write it I don't know for sure.
msg214127 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-03-19 19:24
shutil.unpack_archive was extracted from distutils by Tarek.  I can do some Mercurial archaelogy to find more about the behaviour.
msg214196 - (view) Author: Peter Santoro (peter@psantoro.net) * Date: 2014-03-20 09:56
It seems clear to me that the logic in shutil._unpack_zipfile that silently skips paths that start with '/' (indicates absolute path) or that contain references to the parent directory ('..') was added to prevent malicious zip files from making potential malicious/unwanted modifications to the filesystem (perhaps at a time when zipfile did not itself contain such logic).  This conservative approach works, but it can have unexpected results.  For example, if all entries in a zip file contain these invalid characters, then shutil._unpack_zipfile appears to do nothing (i.e. the zip file is not unpacked).  This is good (except for the silent part), if the zip file is truly malicious.  However, I recently had to deal with thousands of zip files created by well known software vendors where hundreds of the zip files were created incorrectly and contained these invalid characters.  These files were not malicious, but they were created improperly. Note that shutil._unpack_zipfile silently failed to unzip these files, but by using ZipFile.extractall I could unzip them.

It appears that most unzipping software today either either ignores (sometimes silently) potentially malicious zip entries (e.g. Windows 7 Explorer displays an invalid zip file error) or it attempts to filter out/replace known bad characters so that the zip entries can be extracted (e.g. WinZip, gnu unzip).  I created this issue because the Python library uses both approaches, which may need rethinking.

The newer logic in ZipFile._extract_member, which is used by ZipFile.extractall, takes a different approach.  Instead of silently ignoring potentially malicious zip entries, it attempts to filter out or replace known invalid characters before extracting the zip entries.

From the Python zipfile docs:
---
If a member filename is an absolute path, a drive/UNC sharepoint and leading (back)slashes will be stripped, e.g.: ///foo/bar becomes foo/bar on Unix, and C:\foo\bar becomes foo\bar on Windows. And all ".." components in a member filename will be removed, e.g.: ../../foo../../ba..r becomes foo../ba..r. On Windows illegal characters (:, <, >, |, ", ?, and *) replaced by underscore (_).
---

As ZipFile._extract_member filters out/replaces more invalid characters than shutil._unpack_zipfile handles, one could argue that the (apparent older) approach used by shutil._unpack_zipfile is less safe.

The approaches used by shutil._unpack_zipfile and ZipFile.extractall to deal with potentially malicious zip file entries are different.  This issue could be closed if not deemed important by the Python core developers or it could be handled by documentation and/or coding changes.
msg214214 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-20 12:45
Note that unix unzip does exactly the same thing as zipfile extractall (except that it does issue warnings), and I believe this is considered "best practice" these days for extraction tools: strip out absolute/relative path components and extract to the destination directory.
History
Date User Action Args
2014-03-20 12:45:25r.david.murraysetmessages: + msg214214
2014-03-20 09:56:39peter@psantoro.netsetmessages: + msg214196
2014-03-19 19:24:30eric.araujosetmessages: + msg214127
2014-03-19 13:19:05r.david.murraysetnosy: + r.david.murray, docs@python
messages: + msg214093

assignee: docs@python
components: + Documentation
stage: test needed -> needs patch
2014-03-19 11:40:54pitrousetnosy: + eric.araujo, pitrou
messages: + msg214082
2014-03-13 23:12:18peter@psantoro.netsetfiles: + test_unpack.zip

messages: + msg213494
2014-03-13 09:51:20eric.araujosetstage: test needed
2014-03-13 09:46:05peter@psantoro.netcreate