classification
Title: tarfile: Traversal attack vulnerability
Type: security Stage: test needed
Components: Library (Lib) Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: 17102 29788 Superseder:
Assigned To: lars.gustaebel Nosy List: Arfrever, Daniel.Garcia, benjamin.peterson, christian.heimes, edulix, georg.brandl, haypo, jcea, jwilk, lars.gustaebel, martin.panter, ned.deily, r.david.murray, serhiy.storchaka
Priority: high Keywords: patch, security_issue

Created on 2014-03-31 08:14 by Daniel.Garcia, last changed 2017-03-11 05:29 by martin.panter.

Files
File name Uploaded Description Edit
prevent-tar-traversal-attack.diff Daniel.Garcia, 2014-03-31 08:14 patch to prevent review
safetarfile-1.diff lars.gustaebel, 2014-05-01 12:12 New SafeTarFile class and documentation review
Messages (16)
msg215222 - (view) Author: Daniel Garcia (Daniel.Garcia) * Date: 2014-03-31 08:14
The application does not validate the filenames inside the tar archive, allowing to extract files in arbitrary path. An attacker can craft a tar file to override files.

I've view this vulnerability in libtar:
http://lwn.net/Vulnerabilities/587141/
I've checked that python tarfile doesn't validate the filenames so python tarfile is vulnerable to this attack.
msg215223 - (view) Author: Daniel Garcia (Daniel.Garcia) * Date: 2014-03-31 08:23
The solution in the patch is based on the gnutar solution to this, removing the prefix when extracting and adding.
msg215224 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-03-31 08:25
Setting as release blocker pending evaluation.
msg215225 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2014-03-31 08:31
It's a known and well-documented behavior of the tar module:

https://docs.python.org/2.7/library/tarfile.html#tarfile.TarFile.extractall
msg215226 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-03-31 09:03
> It's a known and well-documented behavior of the tar module

Would it possible to disable this behaviour by default, and only enable ti explicitly? The tar command line program has for example the -P / --absolute-paths option.
msg215237 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-31 13:26
Yes, this behavior is documented, but still it is desirable to fix it. The tar utility has a lot of switches which controls extracting and by default it prevents three ways of attack (absolute names, '..' and symlinks), but there are other possible ways of attack. This is complex issue and I'm working on it. See also issue19974.

In any case we should be very careful because every protection against attack changes a behavior (which can be safe if you know what you do), so perhaps we should add parameters which controls behavior. This is possible only in new Python version.
msg215239 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-31 13:59
Note that any issues here should also be considered for zipfile and shutil.  (Well, shutil can just use the other two once the security is available.)  See issue 20907.
msg215242 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2014-03-31 14:42
Don't forget about SUID and SGID, too.
msg215656 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2014-04-06 13:24
In the past, our answer to these kinds of bug reports has always been that you must not extract an archive from an untrusted source without making sure that it has no malicious contents. And that tarfile conforms to the posix specifications with respect to extraction of files and pathname resolution.  That's why we put this prominent warning in the documentation, and I think its advice still holds.

I don't think that this issue should be marked as a release blocker, because the way tarfile currently works was a conscious decision, not an accident.  tarfile does what it is designed to do: it processes a sequence of instructions to store a number of files in the filesystem. So the attack that is described by Daniel Garcia exploits neither a bug in tarfile nor a loophole in the tar archive format. A necessary condition for this attack to work is that the attacker has to trick the user into extracting the malicious archive first. After that, tarfile interprets the contained instructions word-for-word but still only within the boundaries defined by the user's privileges.

I think it is obvious that it is potentially dangerous to extract tar archives we didn't create ourselves, because we actually give another person direct access to our filesystem. tarfile could mitigate some of the adverse effects, but this will not change the fact that it remains unsafe to use tarfile to a certain degree unless you use it with your own data or take reasonable precautions.

Anyway, if we come to the conclusion that we want to eliminate this kind of attack, we must be aware that there is a lot more to do than that. tarfile as it is today is vulnerable to all known attacks against tar programs, and maybe even a few more that rely on its specific implementation.


1. Path traversal:

    The archive contains files names e.g. /etc/passwd or ../etc/passwd.

2. Symlink file attack:

    foo links to /etc/passwd.
    Another member named foo follows, its data overwrites the target file's data.

3. Symlink directory attack:

    foo links to /etc.
    The following member foo/passwd overwrites /etc/passwd.

4. Hardlink attack:

    Hardlink member foo links to /etc/passwd.
    tarfile creates the hardlink to /etc/passwd because it cannot find it inside the archive and falls back to the one in the filesystem.
    Another file named foo follows, its data overwrites /etc/passwd's data.

5. Permission manipulation:

    The archive contains an executable that is placed somewhere in PATH with its setuid flag set, so that an unprivileged user is able to gain root privileges.

6. Device file attacks:

    The archive contains a device node foo with the same major and minor numbers as an attached device.
    Another member named foo follows, its data is written to the device.

7. Huge zero file attacks:

    Bzip2 and lzma allow it to store huge blobs of repetetive data in tiny archives. When unpacked this data may fill up an entire filesystem.

8. Excessive memory usage:

    tarfile saves one TarInfo object per member it finds in an archive. If the archive contains several millions of members, this may fill up the memory.

9. Saving a huge sparse file:

    tarfile is unable to detect holes in sparse files and thus cannot store them efficiently. Archiving a huge sparse file can take very long and may lead to a very big archive that fills up the filesystem.


Additionally, there are more issues mentioned in the GNU tar manual:

  https://www.gnu.org/software/tar/manual/html_node/Security.html


In conclusion, I like to emphasize that tarfile is a library, it is no replacement for GNU tar. And as a library it has a different focus, it is merely a building block for an application, and has to be used with a little bit of responsibility. And even if we start to implement all possible checks, I'm afraid we never can do without a warning in the documentation that reminds everyone to keep an eye on what they're doing.
msg215658 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-04-06 14:51
Thank you Lars for your thorough reply.

While I agree that this isn't a release blocker, as it was clearly designed to behave this way... it seems to me that it wouldn't take much to make the tarfile module a lot safer.  Specifically:

  * Don't allow creating files whose absolute path is not under the
    destination.
  * Don't allow creating links (hard or soft) which link to a path
    outside of the destination.
  * Don't create device nodes.

This would fix your listed attacks 1-6.  The remaining attacks you cite are denial-of-service attacks; while they're undesirable, they shouldn't compromise the security of the machine.  (I suppose we could even address those, adding "reasonable" quotas for disk space and number of files.)

I doubt that would make tarfile secure.  But maybe "practicality beats purity"?
msg216675 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-04-17 08:41
Seems like shutil._unpack_tarfile() is affected. I guess it could at least do with one of those warnings in the documentation for make_archive().

The patch for this bug looks a bit over enthusiastic, for example skip_prefixes("blaua../stuff") would incorrectly strip the first bit and just return "stuff".

It seems there might already be plenty of existing code to check for bad paths. Examples that come to mind:

* http.server.SimpleHTTPRequestHandler.translate_path()
* zipfile.ZipFile._extract_member()
* shutil._unpack_zipfile()

This code either ignores the bad path elements, or ignores the whole path. Perhaps some of it could be recycled into a common function somewhere, rather than implementing it all over again for tar files.

I have written my own function joinpath() to do this sort of checking, which you are welcome to use:

https://bitbucket.org/vadmium/pyrescene/src/34264f6/rescene/utility.py#cl-217

You would call it with something like joinpath(tarpath.split("/"), osdir).
msg217188 - (view) Author: Eduardo Robles Elvira (edulix) * Date: 2014-04-26 11:34
Do we have any final decision on what's the best approach to solve this? I see some possibilities:

a) leave the issue to the library user. I think that's a not good solution security-wise as many will be unaware of the problem and this promotes code duplication for the fix. On the other hand, this does not change default behavior.

b) fix the problem as proposed in the patch sent by Daniel. This makes the tarfile secure against this kind of attacks. It does change the behavior and doesn't allow to extract in arbitrary paths, though.

c) fix the problem so that by default extracting in arbitrary paths is not allowed, but allow somehow to do that optionally. This way we change the default behavior but provide an easy fix for those that depend on that functionality.

d) do not change the default, but provide a well documented and easy  way to activate the safety checks that fix this kind of attacks. The advantage is that it doesn't change the default behavior, the disadvantage is that many people will have to modify their code to be secure, and that the default is not very secure.

For what is worth, I believe either b or c should be chosen to fix this issue.
msg217189 - (view) Author: Eduardo Robles Elvira (edulix) * Date: 2014-04-26 11:51
Also, I guess this patch solves and is closely related to #1044 which was, at the time (2007), considered "not a bug".
msg217690 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2014-05-01 12:12
Let me present for discussion a proposal (and a patch with documentation) with an approach that is a little different, but in my opinion the most effective. I hope that it will appeal to all involved.

My proposal consists of a new class SafeTarFile, that is a subclass and drop-in replacement for the TarFile class and can be employed whenever the user feels the necessity.  It can be used the same way as TarFile, with the difference that SafeTarFile is equipped with a wide range of tests and as soon as it detects anything bad it interrupts the current operation with a SecurityError exception. This way damage is effectively averted, and it is up to the developer to decide whether he rejects the archive altogether (which is the obvious and recommended measure) or he wants to continue to process it in a subsequent step (on his own responsibility).

To simplify a few common operations, SafeTarFile has three more methods: analyze(), filter() and is_safe(). These methods will allow access to the archive without SecurityError exceptions being raised. The analyze() method is a kind of low-level iterator that produces each TarInfo object together with a list of warnings (if the member is bad) as a tuple. This gives a developer access to all the information he needs to implement his own more differentiated way of handling bad archives. The filter() method is a convenience method that provides an iterator over all the "good" members of an archive leaving out all the "bad" ones. It can be used as an argument to SafeTarFile.extractall() for example. is_safe() is a high-level shortcut method that reduces the result of the analysis to a simple True or False.

SafeTarFile has a variety of checks that test e.g. for bad pathnames, bad permissions and duplicate files. Also, to prevent denial-of-service scenarios, it enforces user-defined limits upon the archive, such as a maximum number of files or a maxmimum size of unpacked data.

The main advantage of this approach is the higher degree of security. The practice of rewriting paths (e.g. like in Daniel.Garcia's patch) is error-prone, has side-effects and is hard to maintain because of its tendency towards regression. It just adds another layer of complexity to an already complex and delicate problem.

SafeTarFile (or whatever it will be called) is backward compatible and easy to maintain, because it is an isolated addition to the tarfile module. It is easily subclassable to add more tests. It can be used as a standalone tool to check for bad archives and possible denial-of-service scenarios. Its analyze() method tells the user exactly what's wrong with the archive instead of keeping it away from him. Instead of silently extracting files to locations they weren't expected to be stored (i.e. after "fixing" their pathnames), SafeTarFile simply refuses to extract them at all. This way it is far more transparent and understandable to the user what happens.

Feedback is welcome.
msg277339 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-09-24 21:57
Issue 17102 is open about the specific problem of escaping the destination directory. Maybe it is a duplicate, but this bug also discusses other problems.
msg289438 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-03-11 05:29
Issue 29788 proposes an option to disable the vulnerability in the CLI
History
Date User Action Args
2017-03-11 05:29:40martin.pantersetdependencies: + tarfile: Add absolute_path option to tarfile, disabled by default
messages: + msg289438
2016-09-24 23:04:58larrysetnosy: - larry
2016-09-24 21:57:22martin.pantersetdependencies: + tarfile extract can write files outside the destination path
messages: + msg277339
2016-09-24 18:59:29christian.heimessetpriority: normal -> high
versions: + Python 3.6, Python 3.7, - Python 3.5
2014-06-12 00:55:09jceasetnosy: + jcea
2014-05-01 12:12:17lars.gustaebelsetfiles: + safetarfile-1.diff
priority: release blocker -> normal
versions: - Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4
messages: + msg217690

assignee: lars.gustaebel
2014-04-26 11:51:36edulixsetmessages: + msg217189
2014-04-26 11:34:06edulixsetnosy: + edulix
messages: + msg217188
2014-04-23 12:01:22jwilksetnosy: + jwilk
2014-04-17 22:11:41Arfreversetnosy: + Arfrever
2014-04-17 08:41:36martin.pantersetnosy: + martin.panter
messages: + msg216675
2014-04-06 14:51:56larrysetmessages: + msg215658
2014-04-06 13:24:25lars.gustaebelsetmessages: + msg215656
2014-03-31 14:42:34christian.heimessetmessages: + msg215242
2014-03-31 13:59:08r.david.murraysetnosy: + r.david.murray
messages: + msg215239
2014-03-31 13:26:45serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg215237
2014-03-31 09:03:37hayposetnosy: + haypo
messages: + msg215226
2014-03-31 08:31:30christian.heimessetnosy: + christian.heimes
messages: + msg215225
2014-03-31 08:25:43ned.deilysetpriority: normal -> release blocker

versions: + Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4
keywords: + security_issue
nosy: + larry, lars.gustaebel, benjamin.peterson, georg.brandl, ned.deily

messages: + msg215224
stage: test needed
2014-03-31 08:23:03Daniel.Garciasetmessages: + msg215223
2014-03-31 08:14:19Daniel.Garciacreate