classification
Title: zipfile.ZipFile overwrites files outside destination path
Type: security Stage:
Components: Library (Lib) Versions: Python 3.2, Python 2.7, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: amaury.forgeotdarc, gregory.p.smith, ned.deily, r.david.murray, schmir, twb
Priority: normal Keywords: patch

Created on 2009-09-22 22:10 by schmir, last changed 2009-10-02 00:55 by gregory.p.smith.

Files
File name Uploaded Description Edit
extract-doc.diff twb, 2009-09-29 22:36 Update to extract() method documentation review
zipfile-6972-test.diff twb, 2009-09-30 06:14 test for directory escape review
zipfile-6972-patch-2.diff twb, 2009-09-30 19:29 patch to zipfile module for directory escape, with fix for case sensitive file systems review
Messages (24)
msg93021 - (view) Author: Ralf Schmitt (schmir) Date: 2009-09-22 22:10
ZipFile.extractall happily overwrites any file on the filesystem. One
can put files with a name like "//etc/password" in a zip file and
extractall will overwrite /etc/password (with sufficient rights).

The docs say:

ZipFile.extractall([path[, members[, pwd]]])

    Extract all members from the archive to the current working
directory. path specifies a different directory to extract to. members
is optional and must be a subset of the list returned by namelist(). pwd
is the password used for encrypted files.


I read that as: it will put all files into path or a subdirectory.
Using names like "../../../etc/password" also leads to files being
written outside that path directory.
msg93251 - (view) Author: Thomas W. Barr (twb) Date: 2009-09-29 04:19
Do people have an opinion as to whether this should be fixed with a 
docfix, fixed as default (with option to allow path traversal) or fixed as 
a non-default option? The same issue exists in ZipFile.extract, but in 
that case you're presumably passing a path you've already vetted.

Either way, I'll write a test case for it tomorrow.
msg93268 - (view) Author: Ralf Schmitt (schmir) Date: 2009-09-29 09:46
I think this should clearly be fixed in the code. The current code tries
to handle absolute paths by removing the first slash (unfortunately not
the second), so it looks like it tries to be safe and only write to the
destination directory. That should be the default operation.
I even think that there should be *no* option to allow overriding files
outside the destination path (on unix one can always use / as
destination if he feels like overwriting his /etc/passwd)
The documentation should also mention that it's unsafe to use this
method in python <2.6.2.
msg93269 - (view) Author: Ralf Schmitt (schmir) Date: 2009-09-29 09:48
The documentation should also mention that it's unsafe to use this
method in python <= 2.6.2. 2.6.2 is also unsafe.
msg93278 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-09-29 11:02
The tarfile module solved this issue with a documentation warning:
http://docs.python.org/library/tarfile.html#tarfile.TarFile.extractall
msg93305 - (view) Author: Thomas W. Barr (twb) Date: 2009-09-29 17:20
My working solution is to iterate through members, and ensuring that
os.path.abspath(os.path.join(path, member)) always .startswith(path).
This seems like a better solution than trying to trap on a pattern in
the string. Presumably the same fix can be made to tarfile.

For what it's worth, OS X's BOMArchiveManager will place a file stored
as '../foo.txt' in the extract path, not the directory right outside it.

While we're on the topic, there may also be a bug in this, or the
tarfile package that would allow a malicious archive to extract a
symlink to an existing directory somewhere on the target machine, and
files extracted to that symlink. I haven't really thought that through,
but I'm sure that my fix won't correct that possible issue.
msg93330 - (view) Author: Thomas W. Barr (twb) Date: 2009-09-29 20:51
Uploading test.
msg93331 - (view) Author: Thomas W. Barr (twb) Date: 2009-09-29 20:53
Uploading patch. This actually should fix my theoretical symlink bug
since realpath() properly follows symlinks. The only thing that I
haven't been able to test is the behavior of realpath() on
case-insensitive operating systems. This should do the right thing, the
path should be normalized, but can someone confirm this?
msg93332 - (view) Author: Thomas W. Barr (twb) Date: 2009-09-29 21:10
As for the documentation, it might be a wise idea to up date the current
documentation to mention this issue, until the next release. I'm not
really sure what the process is for doing that, though...
msg93333 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-09-29 21:35
Patches to the docs, just like patches to the code (the docs are in the
Doc subdirectory).  Once committed, they get auto-generated and uploaded.
msg93334 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-09-29 21:58
Documentation note added (copied from tarfile) in trunk r75149, 
release26-maint r75150 (hopefully in time for 2.6.3 but thats up to 
Barry).
msg93336 - (view) Author: Thomas W. Barr (twb) Date: 2009-09-29 22:36
zf.extract() is unsafe for the same reason. My patch fixes this issue,
but we should mention the possible bug in the documentation there as
well. They do this for the similar bug in tarfile.

I've copy/pasted the mention in tarfile.extract() to zipfile.extract()
into the diff.
msg93349 - (view) Author: Thomas W. Barr (twb) Date: 2009-09-30 06:14
My apologies, I clicked the wrong button and deleted my test. There is no 
change in the newly uploaded one.
msg93372 - (view) Author: Thomas W. Barr (twb) Date: 2009-09-30 19:29
os.path.realpath() doesn't normalize case, so this could have issues on
Windows. The new patch should not.

The Mac version of os.path.normpath doesn't change the path, as per the
posix version, which isn't correct on HFS+, which is not case sensitive.
That's another bug for another ticket, though.
msg93374 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009-09-30 20:03
"The Mac version of os.path.normpath doesn't change the path, as per the
posix version, which isn't correct on HFS+, which is not case sensitive."

Not so. Case-sensitive vs case-insensitive behavior is chosen when 
initializing an HFS+ file system (since OS X 10.3).
msg93375 - (view) Author: Thomas W. Barr (twb) Date: 2009-09-30 20:11
Good point, I'd forgotten that case-sensitive file systems are an
option. I do know that it's not the default, though, and that as shipped
from Apple, at least the consumer machines are case-insensitive. Things
may be different in server-land.

For what it's worth, NTFS isn't entirely case-insensitive, either.
(http://support.microsoft.com/kb/100625)

Should the os.path.normcase() method be made smarter, and actually query
the filesystem about such things? (If I should start a new ticket for
this, somebody please let me know.)
msg93376 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009-09-30 21:29
Yes, as shipped from the factory, the default "root" file system is 
still case-insensitive but the user can change that.  There there are 
file systems on attached disk images and NFS-mounted file systems, etc 
etc.  More to the point, it's not a system attribute, rather it's a 
file-system attribute and, since a file system mount point can be almost 
anywhere in a directory structure, in general, you can't predict where 
you're going to encounter insensitive vs -sensitive behavior; it could 
vary from directory to directory.  But isn't dealing with case-
insensitive behavior just a special case of the general case of what 
extract/extractall do about overwriting existing files?  I don't see 
that addressed in the current docs.
msg93379 - (view) Author: Thomas W. Barr (twb) Date: 2009-09-30 22:41
A fair point. I was thinking that we could query the OS about whatever
filesystem the path is on, but this wouldn't work for a file that hasn't
been created yet.

The issue with extractall() isn't just that it can extract over existing
files, it's that it can write files anywhere on the filesystem, both by
exploiting symlinks and through path manipulation. The more I think
about it, though, the more I think the case sensitivity is a non-issue
here, since the trailing part of the extraction paths is built out of
the base path, which I then compare against. The capitalization will
therefore be consistent, and I don't need to worry about this. Does this
seem right?
msg93380 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-09-30 22:50
The patch won't work if the target file already exists as a symlink.

I think that such a check is not a good idea. Using symlinks to extract files 
to somewhere else may be a feature, after all.  Specially if the symlink 
already exists before the operation. Some real-case example:
  /home/xxx/bin   --> symlink to /someotherpath/bin
  /home/xxx/lib   --> symlink to /someotherpath/lib
Now I want to extract "lib/libXXX.so" into "/home/xxx"

I suggest to only update the documentation with a warning, similar to the one 
for the tarfile module.
msg93381 - (view) Author: Ralf Schmitt (schmir) Date: 2009-09-30 23:16
Adding a warning to the documentation is wrong. The intention of the
code clearly is to only create files in the destination directory (or
why remove the first slash then?) and that is also the impression I get
from reading the documentation.
msg93382 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-09-30 23:37
Adding a warning to the documentation is not wrong, it is the only thing 
that is possible for the 2.6.3 release.  Its too late in the current 
release process to change code.
msg93396 - (view) Author: Ralf Schmitt (schmir) Date: 2009-10-01 10:11
I'd rather have an extractall version which just throws a RuntimeError
than one which overwrites any file with any content on my filesystem if
I'm trying to unzip a zip file.  Then I at least know that I have to
write my own version. Adding a warning to the documentation instead of
fixing that bug was wrong for the tarfile module and is wrong for the
zipfile module. The "release process" disagrees with me here, but I'd
still call that wrong.
msg93420 - (view) Author: Thomas W. Barr (twb) Date: 2009-10-01 20:13
Even if we can't fix things for this release, presumably it's not too
late to fix things for 2.7, right?

Yes, there certainly are cases where you might want to have creative
usage of symlinks and stored paths to allow overwriting existing files,
and placing files outside of the extraction path, but that doesn't seem
like the default case to me. Would it be a decent compromise to add a
extract_under_path=True default option to extract() and extractall() for
2.7?
msg93426 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-10-02 00:55
yes this will be fixed in 2.7/3.2.

as for creative uses where someone might want the out of supplied path 
overwriting behavior?  those people are insane and should be made to jump 
through extra hoops to get it. ;)
History
Date User Action Args
2009-10-02 00:55:44gregory.p.smithsetversions: + Python 2.7, Python 3.2
2009-10-02 00:55:32gregory.p.smithsetmessages: + msg93426
2009-10-01 20:13:20twbsetmessages: + msg93420
2009-10-01 10:11:11schmirsetmessages: + msg93396
2009-09-30 23:37:04gregory.p.smithsetmessages: + msg93382
2009-09-30 23:16:41schmirsetmessages: + msg93381
2009-09-30 22:50:48amaury.forgeotdarcsetmessages: + msg93380
2009-09-30 22:41:26twbsetmessages: + msg93379
2009-09-30 21:29:41ned.deilysetmessages: + msg93376
2009-09-30 20:11:35twbsetmessages: + msg93375
2009-09-30 20:03:52ned.deilysetnosy: + ned.deily
messages: + msg93374
2009-09-30 19:30:02twbsetfiles: - zipfile-6972-patch.diff
2009-09-30 19:29:50twbsetfiles: + zipfile-6972-patch-2.diff

messages: + msg93372
2009-09-30 06:14:01twbsetfiles: + zipfile-6972-test.diff

messages: + msg93349
2009-09-30 06:11:11twbsetfiles: - zipfile-6972-test.diff
2009-09-29 22:36:23twbsetfiles: + extract-doc.diff

messages: + msg93336
2009-09-29 21:58:53gregory.p.smithsetmessages: + msg93334
2009-09-29 21:35:15r.david.murraysetnosy: + r.david.murray
messages: + msg93333
2009-09-29 21:10:11twbsetmessages: + msg93332
2009-09-29 20:59:19gregory.p.smithsetpriority: normal
assignee: gregory.p.smith

nosy: + gregory.p.smith
2009-09-29 20:53:16twbsetfiles: + zipfile-6972-patch.diff

messages: + msg93331
2009-09-29 20:51:33twbsetfiles: + zipfile-6972-test.diff
keywords: + patch
messages: + msg93330
2009-09-29 17:20:08twbsetmessages: + msg93305
2009-09-29 11:02:03amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg93278
2009-09-29 09:48:02schmirsetmessages: + msg93269
2009-09-29 09:46:02schmirsetmessages: + msg93268
2009-09-29 04:19:37twbsetmessages: + msg93251
2009-09-29 00:19:59twbsetnosy: + twb
2009-09-22 22:11:34schmirsettitle: zipfile.ZipFile -> zipfile.ZipFile overwrites files outside destination path
2009-09-22 22:10:50schmircreate