This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lars.gustaebel
Recipients Arfrever, Daniel.Garcia, benjamin.peterson, christian.heimes, edulix, georg.brandl, jwilk, larry, lars.gustaebel, martin.panter, ned.deily, r.david.murray, serhiy.storchaka, vstinner
Date 2014-05-01.12:12:08
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1398946337.15.0.22919198614.issue21109@psf.upfronthosting.co.za>
In-reply-to
Content
Let me present for discussion a proposal (and a patch with documentation) with an approach that is a little different, but in my opinion the most effective. I hope that it will appeal to all involved.

My proposal consists of a new class SafeTarFile, that is a subclass and drop-in replacement for the TarFile class and can be employed whenever the user feels the necessity.  It can be used the same way as TarFile, with the difference that SafeTarFile is equipped with a wide range of tests and as soon as it detects anything bad it interrupts the current operation with a SecurityError exception. This way damage is effectively averted, and it is up to the developer to decide whether he rejects the archive altogether (which is the obvious and recommended measure) or he wants to continue to process it in a subsequent step (on his own responsibility).

To simplify a few common operations, SafeTarFile has three more methods: analyze(), filter() and is_safe(). These methods will allow access to the archive without SecurityError exceptions being raised. The analyze() method is a kind of low-level iterator that produces each TarInfo object together with a list of warnings (if the member is bad) as a tuple. This gives a developer access to all the information he needs to implement his own more differentiated way of handling bad archives. The filter() method is a convenience method that provides an iterator over all the "good" members of an archive leaving out all the "bad" ones. It can be used as an argument to SafeTarFile.extractall() for example. is_safe() is a high-level shortcut method that reduces the result of the analysis to a simple True or False.

SafeTarFile has a variety of checks that test e.g. for bad pathnames, bad permissions and duplicate files. Also, to prevent denial-of-service scenarios, it enforces user-defined limits upon the archive, such as a maximum number of files or a maxmimum size of unpacked data.

The main advantage of this approach is the higher degree of security. The practice of rewriting paths (e.g. like in Daniel.Garcia's patch) is error-prone, has side-effects and is hard to maintain because of its tendency towards regression. It just adds another layer of complexity to an already complex and delicate problem.

SafeTarFile (or whatever it will be called) is backward compatible and easy to maintain, because it is an isolated addition to the tarfile module. It is easily subclassable to add more tests. It can be used as a standalone tool to check for bad archives and possible denial-of-service scenarios. Its analyze() method tells the user exactly what's wrong with the archive instead of keeping it away from him. Instead of silently extracting files to locations they weren't expected to be stored (i.e. after "fixing" their pathnames), SafeTarFile simply refuses to extract them at all. This way it is far more transparent and understandable to the user what happens.

Feedback is welcome.
History
Date User Action Args
2014-05-01 12:12:17lars.gustaebelsetrecipients: + lars.gustaebel, georg.brandl, vstinner, larry, christian.heimes, benjamin.peterson, jwilk, ned.deily, Arfrever, r.david.murray, martin.panter, serhiy.storchaka, edulix, Daniel.Garcia
2014-05-01 12:12:17lars.gustaebelsetmessageid: <1398946337.15.0.22919198614.issue21109@psf.upfronthosting.co.za>
2014-05-01 12:12:17lars.gustaebellinkissue21109 messages
2014-05-01 12:12:14lars.gustaebelcreate