Message 215656 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lars.gustaebel
Recipients	Daniel.Garcia, benjamin.peterson, christian.heimes, georg.brandl, larry, lars.gustaebel, ned.deily, r.david.murray, serhiy.storchaka, vstinner
Date	2014-04-06.13:24:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1396790665.52.0.303901486831.issue21109@psf.upfronthosting.co.za>
In-reply-to

Content
In the past, our answer to these kinds of bug reports has always been that you must not extract an archive from an untrusted source without making sure that it has no malicious contents. And that tarfile conforms to the posix specifications with respect to extraction of files and pathname resolution. That's why we put this prominent warning in the documentation, and I think its advice still holds. I don't think that this issue should be marked as a release blocker, because the way tarfile currently works was a conscious decision, not an accident. tarfile does what it is designed to do: it processes a sequence of instructions to store a number of files in the filesystem. So the attack that is described by Daniel Garcia exploits neither a bug in tarfile nor a loophole in the tar archive format. A necessary condition for this attack to work is that the attacker has to trick the user into extracting the malicious archive first. After that, tarfile interprets the contained instructions word-for-word but still only within the boundaries defined by the user's privileges. I think it is obvious that it is potentially dangerous to extract tar archives we didn't create ourselves, because we actually give another person direct access to our filesystem. tarfile could mitigate some of the adverse effects, but this will not change the fact that it remains unsafe to use tarfile to a certain degree unless you use it with your own data or take reasonable precautions. Anyway, if we come to the conclusion that we want to eliminate this kind of attack, we must be aware that there is a lot more to do than that. tarfile as it is today is vulnerable to all known attacks against tar programs, and maybe even a few more that rely on its specific implementation. 1. Path traversal: The archive contains files names e.g. /etc/passwd or ../etc/passwd. 2. Symlink file attack: foo links to /etc/passwd. Another member named foo follows, its data overwrites the target file's data. 3. Symlink directory attack: foo links to /etc. The following member foo/passwd overwrites /etc/passwd. 4. Hardlink attack: Hardlink member foo links to /etc/passwd. tarfile creates the hardlink to /etc/passwd because it cannot find it inside the archive and falls back to the one in the filesystem. Another file named foo follows, its data overwrites /etc/passwd's data. 5. Permission manipulation: The archive contains an executable that is placed somewhere in PATH with its setuid flag set, so that an unprivileged user is able to gain root privileges. 6. Device file attacks: The archive contains a device node foo with the same major and minor numbers as an attached device. Another member named foo follows, its data is written to the device. 7. Huge zero file attacks: Bzip2 and lzma allow it to store huge blobs of repetetive data in tiny archives. When unpacked this data may fill up an entire filesystem. 8. Excessive memory usage: tarfile saves one TarInfo object per member it finds in an archive. If the archive contains several millions of members, this may fill up the memory. 9. Saving a huge sparse file: tarfile is unable to detect holes in sparse files and thus cannot store them efficiently. Archiving a huge sparse file can take very long and may lead to a very big archive that fills up the filesystem. Additionally, there are more issues mentioned in the GNU tar manual: https://www.gnu.org/software/tar/manual/html_node/Security.html In conclusion, I like to emphasize that tarfile is a library, it is no replacement for GNU tar. And as a library it has a different focus, it is merely a building block for an application, and has to be used with a little bit of responsibility. And even if we start to implement all possible checks, I'm afraid we never can do without a warning in the documentation that reminds everyone to keep an eye on what they're doing.

In the past, our answer to these kinds of bug reports has always been that you must not extract an archive from an untrusted source without making sure that it has no malicious contents. And that tarfile conforms to the posix specifications with respect to extraction of files and pathname resolution. That's why we put this prominent warning in the documentation, and I think its advice still holds.

I don't think that this issue should be marked as a release blocker, because the way tarfile currently works was a conscious decision, not an accident. tarfile does what it is designed to do: it processes a sequence of instructions to store a number of files in the filesystem. So the attack that is described by Daniel Garcia exploits neither a bug in tarfile nor a loophole in the tar archive format. A necessary condition for this attack to work is that the attacker has to trick the user into extracting the malicious archive first. After that, tarfile interprets the contained instructions word-for-word but still only within the boundaries defined by the user's privileges.

I think it is obvious that it is potentially dangerous to extract tar archives we didn't create ourselves, because we actually give another person direct access to our filesystem. tarfile could mitigate some of the adverse effects, but this will not change the fact that it remains unsafe to use tarfile to a certain degree unless you use it with your own data or take reasonable precautions.

Anyway, if we come to the conclusion that we want to eliminate this kind of attack, we must be aware that there is a lot more to do than that. tarfile as it is today is vulnerable to all known attacks against tar programs, and maybe even a few more that rely on its specific implementation.

1. Path traversal:

The archive contains files names e.g. /etc/passwd or ../etc/passwd.

2. Symlink file attack:

foo links to /etc/passwd.
Another member named foo follows, its data overwrites the target file's data.

3. Symlink directory attack:

foo links to /etc.
The following member foo/passwd overwrites /etc/passwd.

4. Hardlink attack:

Hardlink member foo links to /etc/passwd.
tarfile creates the hardlink to /etc/passwd because it cannot find it inside the archive and falls back to the one in the filesystem.
Another file named foo follows, its data overwrites /etc/passwd's data.

5. Permission manipulation:

The archive contains an executable that is placed somewhere in PATH with its setuid flag set, so that an unprivileged user is able to gain root privileges.

6. Device file attacks:

The archive contains a device node foo with the same major and minor numbers as an attached device.
Another member named foo follows, its data is written to the device.

7. Huge zero file attacks:

Bzip2 and lzma allow it to store huge blobs of repetetive data in tiny archives. When unpacked this data may fill up an entire filesystem.

8. Excessive memory usage:

tarfile saves one TarInfo object per member it finds in an archive. If the archive contains several millions of members, this may fill up the memory.

9. Saving a huge sparse file:

tarfile is unable to detect holes in sparse files and thus cannot store them efficiently. Archiving a huge sparse file can take very long and may lead to a very big archive that fills up the filesystem.

Additionally, there are more issues mentioned in the GNU tar manual:

https://www.gnu.org/software/tar/manual/html_node/Security.html

In conclusion, I like to emphasize that tarfile is a library, it is no replacement for GNU tar. And as a library it has a different focus, it is merely a building block for an application, and has to be used with a little bit of responsibility. And even if we start to implement all possible checks, I'm afraid we never can do without a warning in the documentation that reminds everyone to keep an eye on what they're doing.

History
Date	User	Action	Args
2014-04-06 13:24:26	lars.gustaebel	set	recipients: + lars.gustaebel, georg.brandl, vstinner, larry, christian.heimes, benjamin.peterson, ned.deily, r.david.murray, serhiy.storchaka, Daniel.Garcia
2014-04-06 13:24:25	lars.gustaebel	set	messageid: <1396790665.52.0.303901486831.issue21109@psf.upfronthosting.co.za>
2014-04-06 13:24:25	lars.gustaebel	link	issue21109 messages
2014-04-06 13:24:22	lars.gustaebel	create