This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile would fail to extract tarballs with files under R/O directories (twice)
Type: Stage:
Components: IO Versions: Python 3.6, Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Catherine.Devlin, Yaroslav.Halchenko, lars.gustaebel
Priority: normal Keywords: patch

Created on 2017-05-23 03:49 by Yaroslav.Halchenko, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
tarfilero.py Yaroslav.Halchenko, 2017-05-23 03:49 simple script to demonstrate inability to extract a tarball with a file under RO directory
Pull Requests
URL Status Linked Edit
PR 1768 closed Catherine.Devlin, 2017-05-23 19:22
PR 1808 open Catherine.Devlin, 2017-05-25 02:56
Messages (7)
msg294217 - (view) Author: Yaroslav Halchenko (Yaroslav.Halchenko) Date: 2017-05-23 03:49
If tarfile contains a file under a directory which has no write permission, extractall would fail since chmod'ing of the directory is done right when it is "extracted".

Please find attached a quick&dummy script to demonstrate the problem using Python code.  The issue is not just of an academic interest -- git-annex uses read-only permission to safe-guard against manual deletion of content. So tarball of any of git-annex repository carrying content for at least a single file, would not be extractable using Python's tarfile module (works fine with pure tar, verified that it is still failing to extract with Python v3.6.1-228-g1398b1bc7d from http://github.com/python/cpython).
msg294278 - (view) Author: Catherine Devlin (Catherine.Devlin) * Date: 2017-05-23 19:22
I confirmed the error, and that doing the corresponding tar/untar cycle with the command-line `tar` utility succeeds.

issue_30438_test.patch adds a unittest version of Yaroslav's demo file to test_tarfile.py.  (It's irrelevant if the PR is merged.)

This doesn't actually include a fix.  Issue should remain open.
msg294323 - (view) Author: Catherine Devlin (Catherine.Devlin) * Date: 2017-05-24 05:36
I apologize, I retract my earlier comment - I believe both Yaroslav and I were confused about the nature of the problem.  I think it's not related to permissions at all, but to adding a file to the tarfile twice.

To see this, use `tarfilero.py` as provided by Yaroslav, but comment out the `tar.add('sample/rodir/file')` line (line 16) and run it - everything works normally, and the `rodir/file` is present.

It appears that adding the directory to the tarfile also adds the file within the directory, and adding the file individually creates a second reference to the file.  When expanding, `tarfile` attempts to create the file twice, and the second attempt fails because a file by that name already exists.

I still think this is a bug - perhaps re-adding a file already present in a tarfile should throw an error, or silently do nothing without adding a second reference to the file, or at least the error message when trying to expand a file into a path that is blocked by a file already present should give a more informative error.  I will look for existing tickets along those lines.
msg294353 - (view) Author: Yaroslav Halchenko (Yaroslav.Halchenko) Date: 2017-05-24 12:48
Dear Catherine,

Thank you very much for looking into it!! And sorry that I have missed the fact of recursive addition when pointing to a directory.  Indeed though, tar handles that case a bit more gracefully.

BUT I feel somewhat dumb since I am afraid that may be the actual original issue I have observed was simply because I already had that archive extracted and tried to extract it twice, overriding existing files.  That leads to the failure I think I was trying to chase down (example with a sample tiny real annex repo):

$> wget -q http://onerussian.com/tmp/sample.tar ; python -c 'import tarfile; tarfile.open("sample.tar").extractall()'                                
$> python -c 'import tarfile; tarfile.open("sample.tar").extractall()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/tarfile.py", line 2081, in extractall
    self.extract(tarinfo, path)
  File "/usr/lib/python2.7/tarfile.py", line 2118, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
  File "/usr/lib/python2.7/tarfile.py", line 2194, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/usr/lib/python2.7/tarfile.py", line 2234, in makefile
    with bltn_open(targetpath, "wb") as target:
IOError: [Errno 13] Permission denied: './sample/.git/annex/objects/G6/qW/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b'

$> tar -xf sample.tar && echo "extracted ok"
extracted ok


But I wouldn't even consider it a failure but would take it as a feature in my case (stuff is read-only for a reason!)

Altogether, I do not have the earth-shaking problem now, thus if you feel that issue needs retitle or closing, feel free to do so
msg294427 - (view) Author: Catherine Devlin (Catherine.Devlin) * Date: 2017-05-25 02:56
Okay, the problem is a little more specific than my last message suggested, but also a little less specific than the original report.

A "PermissionError: [Errno 13] Permission denied" is thrown when expanding a tarfile to which a file had been added more than once, *if* the file is not writeable.  (tarfile expands the file twice, but the second time finds a non-writeable file in the way.)

This was true for Yaroslav's case because his file was added once as its directory was added, and again when the file was added directly.  The permission of the parent directory does not matter after all.
msg294440 - (view) Author: Catherine Devlin (Catherine.Devlin) * Date: 2017-05-25 06:54
My last commit to the PR includes a fix by delaying setting permission to all files, not just to directories, in .extractall().

It might be better to catch the problem during .add instead, preventing tarring multiple copies, but I found subtle difficulties with that approach.  (Do we have a foolproof way to establish that a second addition is a duplicate?  What if the filename and path is the same, but the file has been changed since last time it was added to the tar?)
 
This solution uses code that's already present.
msg294443 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2017-05-25 07:32
Actually, it is not prohibited to add the same file to the same archive more than once.
History
Date User Action Args
2022-04-11 14:58:46adminsetgithub: 74623
2017-05-25 07:32:17lars.gustaebelsetnosy: + lars.gustaebel
messages: + msg294443
2017-05-25 06:54:29Catherine.Devlinsetmessages: + msg294440
2017-05-25 02:56:43Catherine.Devlinsetmessages: + msg294427
2017-05-25 02:56:03Catherine.Devlinsetpull_requests: + pull_request1892
2017-05-24 12:49:25Yaroslav.Halchenkosettitle: tarfile would fail to extract tarballs with files under R/O directories -> tarfile would fail to extract tarballs with files under R/O directories (twice)
2017-05-24 12:48:49Yaroslav.Halchenkosetmessages: + msg294353
2017-05-24 05:36:29Catherine.Devlinsetmessages: + msg294323
2017-05-24 05:30:16Catherine.Devlinsetfiles: - issue_30438_test.patch
2017-05-23 19:22:20Catherine.Devlinsetfiles: + issue_30438_test.patch

nosy: + Catherine.Devlin
messages: + msg294278
pull_requests: + pull_request1850

keywords: + patch
2017-05-23 03:49:43Yaroslav.Halchenkocreate