classification
Title: zipfile.extractall is safe by now
Type: security Stage:
Components: Documentation, Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: VA, amaajemyfren, docs@python
Priority: normal Keywords:

Created on 2020-05-25 07:18 by VA, last changed 2020-05-27 09:25 by amaajemyfren.

Messages (4)
msg369854 - (view) Author: Va (VA) Date: 2020-05-25 07:18
In documentation of all Python 3 versions, [ZipFile.extractall](https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.extractall) states with a big red warning:

> Warning
> Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of path, e.g. members that have absolute filenames starting with "/" or filenames with two dots "..". This module attempts to prevent that. See extract() note.

However, when looking at the implementation, it calls _extract_member() which seems to sanitize filenames. So the warning might not be relevant anymore.

Furthermore, when looking at [Python 2](https://docs.python.org/2/library/zipfile.html#zipfile.ZipFile.extractall) documentation, we can see the same warning, along with a change note:

> Changed in version 2.7.4: The zipfile module attempts to prevent that. See extract() note.

So, the big red warning in Python 3 documentation might be relevant only for Python < 2.7.4, not for any Python 3 version.
msg369900 - (view) Author: Ama Aje My Fren (amaajemyfren) * Date: 2020-05-25 17:46
Hi,

On Mon, May 25, 2020 at 10:18 AM Va <report@bugs.python.org> wrote:

>
> So, the big red warning in Python 3 documentation might be relevant only for Python < 2.7.4, not for any Python 3 version.
>

You may be on to something. It does appear to be what was discussed in
msg181646 on issue6972.
What I see is that from CPython 3.4
(https://docs.python.org/3.4/library/zipfile.html#zipfile.ZipFile.extractall)
while the security warning is still there they add the following line in it:

> This module attempts to prevent that. See extract() note.

The extract() note goes into some detail to explain what and how they
attempt to prevent it.

It is not obvious to me that zipfile._extract_member() together with
(for windows) zipfile._sanitize_windows_name() have handled everything
that could happen.
May I suggest that out of caution we leave it as it is?
msg369971 - (view) Author: Va (VA) Date: 2020-05-26 11:47
> It is not obvious to me that zipfile._extract_member() together with
(for windows) zipfile._sanitize_windows_name() have handled everything
that could happen.

What hasn't been handled then?
What is the safe way to use it?

I think documenting "this function is unsafe" without suggesting a replacement or a safe way to use it isn't very constructive: as a developer, I want to extract a zip archive, but the only function supposed to do the job tells me "this is unsafe". Ok, so what am I supposed to do to be safe?

That's what documentation should tell me, not let me puzzled with doubt.

> May I suggest that out of caution we leave it as it is?

I don't think the situation should stay like this.

- either the documentation should be more precise on what are the problems that can occur, and how to handle those problems
- or better, the function should be fixed and made fully safe, so all programs using it are safe (and the warning can be removed)
msg370059 - (view) Author: Ama Aje My Fren (amaajemyfren) * Date: 2020-05-27 09:25
On Tue, May 26, 2020 at 2:47 PM Va <report@bugs.python.org> wrote:
>
> What hasn't been handled then?
>

The rules for naming files in Windows is long
(https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file).
It is e.g. possible to create files under WSL within Windows that
break these rules. In my case it was to add the colon (:) to a file
name. In Python for windows this would fail because the underlying
system API would stop it from happening (and in zip, it will be
changed to an underscore (_)) - but it is unclear what would actually
happen if you do so. In the old days just trying to open C:\Con\Con
(which did not exist) caused a BSOD.

>
> What is the safe way to use it?
>

The Security message suggests _with care_ - to wit - "Never extract
archives from untrusted sources without prior inspection."
There may be no absolutely safe way if the zipfile was crafted
maliciously. Just like there are inherent vulnerabilities in using XML
... (https://docs.python.org/3/library/xml.html#xml-vulnerabilities).
If a zipped file had a tree starting at C:\ and replaced a dll in
C:\Windows (and was running as Admin), a lot of caveats I know, but it
could be a problem.

> I think documenting "this function is unsafe" without suggesting a replacement or a safe way to use it isn't very constructive: as a developer, I want to extract a zip archive, but the only function supposed to do the job tells me "this is unsafe". Ok, so what am I supposed to do to be safe?

Does it say that unzipping a file is unsafe? It looks to me like it
says that in special conditions the extraction of a zipped file tree
may be unsafe and it is important to use caution. It is the case in a
lot of programming, is it not, that there are instances of security
vulnerabilities entering ordinary looking code? It happens in sql
(https://xkcd.com/327/) and many places within Python's Standard
Library (https://hackernoon.com/10-common-security-gotchas-in-python-and-how-to-avoid-them-e19fbe265e03)
even something as innocuous as using the new-style string format
(https://lucumr.pocoo.org/2016/12/29/careful-with-str-format/).

>
> That's what documentation should tell me, not let me puzzled with doubt.
>

This is an interesting point. What is the scope of Python Library
Documentation? I disagree with your view on scope. In my view the
Library Documentation should focus on what is exposed in the library
for ordinary use. So e.g. implementation details may not be expected
to be shown in the Documentation (like there is no documentation for
zipfile._extract_member()). It does have a duty of care - especially
to well known gotchas - but it is _not_ security documentation. I
think (this is my view, it is not god given) that in many cases it is
fair to assume that if one told a developer to be careful with her
code it is enough in so far as library documentation is concerned.

Thanks.
History
Date User Action Args
2020-05-27 09:25:52amaajemyfrensetmessages: + msg370059
title: zipfile.extractall is safe by now? -> zipfile.extractall is safe by now
2020-05-26 11:51:57VAsettype: behavior -> security
components: + Library (Lib)
title: zipfile.extractall is safe by now -> zipfile.extractall is safe by now?
2020-05-26 11:47:41VAsetmessages: + msg369971
2020-05-25 17:46:47amaajemyfrensetnosy: + amaajemyfren
messages: + msg369900
2020-05-25 07:18:15VAcreate