Issue6196
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009-06-04 21:45 by srid, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (10) | |||
---|---|---|---|
msg88908 - (view) | Author: Sridhar Ratnakumar (srid) | Date: 2009-06-04 21:45 | |
If a tarball has a-x perms set on its root directory, one cannot access its contents. $ tar zxf generator_tools-0.3.5.tar.gz. $ ls generator_tools-0.3.5/ ls: cannot access generator_tools-0.3.5/README.txt: Permission denied ... sridharr@double:/tmp/i$ This is fine for GNU tar (the user can always do a chmod +x later). But for the tarfile library, it would be better to have a flag such as readaccess=True that will force ``extractall`` to enforce *minimum* permissions required for the basic read access. This means, tarfile would ignore u-x on directories and u-r on files. The reason I make this feature request (instead of working around the issue myself in a verbose way) is that the very reason to write a program to extract tarball (instead of doing it manually) is to automate it .. which automation is more effective and simple if ``extractall`` had a flag such as readaccess=True. |
|||
msg88909 - (view) | Author: Sridhar Ratnakumar (srid) | Date: 2009-06-04 21:46 | |
Here's a test data from PyPI: http://pypi.python.org/packages/source/g/generator_tools/generator_tools-0.3.5.tar.gz |
|||
msg88911 - (view) | Author: Sridhar Ratnakumar (srid) | Date: 2009-06-04 21:52 | |
Considering this bug where tarfile fails to set g+s, https://bugs.launchpad.net/pyopenssl/+bug/236190 a more general approach could be: tarfile.extractall(safe_perms=True) where if safe_perms is set, tarfile can 1) ignore +/-s on all files, 2) ignore u-x on directories and 3) ignore u-r on files. |
|||
msg88912 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2009-06-04 22:12 | |
I don't see why the tarfile case should be different from the tar case. You can "always chmod it later" in python, too (with os.walk and os.chmod). Perhaps the real need is for a recursive chmod in shutil? |
|||
msg88915 - (view) | Author: Sridhar Ratnakumar (srid) | Date: 2009-06-04 22:28 | |
[David] I don't see why the tarfile case should be different from the tar case. (...) As I explained, Viz: [quote]'(...)the very reason to write a program to extract tarball (instead of doing it manually) is to automate it .. which automation is *more effective and simple* if ``extractall`` had a flag such as readaccess=True'[endquote] (emphasis added) [David] You can "always chmod it later" in python, too (with os.walk and os.chmod). (...) Of course, I can. Or: EXECUTE = 0100 READ = 0400 dir_perm = EXECUTE file_perm = EXECUTE | READ for tarinfo in f.getmembers(): tarinfo.mode |= (dir_perm if tarinfo.isdir() else file_perm) As you can see, for a tarfile with huge list of files.. this can be a performance issue. [David] (...) Perhaps the real need is for a recursive chmod in shutil? The real need is to fix the weird permissions on some tarballs (such as generator_tools-0.3.5.tar.gz in PyPI and the above mentioned pyopenssl tarball). This need usually leads to designing workarounds. I just think it is not simple (as in, keeping the code off from such hacks that are tangential to the problem being solved) and effective (as in, not having to deal with potential unintended side effects like bugs in the post-fix chmoding or in the pre-fix tarinfo mode modifications). Hence the feature request. |
|||
msg88934 - (view) | Author: Lars Gustäbel (lars.gustaebel) * | Date: 2009-06-05 11:00 | |
I am still not convinced why tarfile needs this kind of a work-around built in. We talk about a very small number of cases here and the generator_tools-0.3.5.tar.gz is really broken beyond repair. It is the only thing that should be fixed here IMO ;-) I agree with David here. It is easy to manipulate the tarfile in advance, as you have shown yourself. The performance argument does not convince me either. |
|||
msg88956 - (view) | Author: Sridhar Ratnakumar (srid) | Date: 2009-06-05 17:45 | |
[Lars] (...) We talk about a very small number of cases here and the generator_tools-0.3.5.tar.gz is really broken beyond repair. It is the only thing that should be fixed here IMO ;-) Sure, that is what the pyopenssl folks did - fix their tarball. However, it is reasonable expect certain tarballs to be 'broken beyond repair' when you are running tarfile.extracall over a huge number of tarballs such as the ones in PyPI. Indeed, the tarfile module already has several fixes for such 'broken' cases, Viz: [quote]'If ignore_zeros is False, treat an empty block as the end of the archive. If it is True, skip empty (and invalid) blocks and try to get as many members as possible. This is only useful for reading concatenated or **damaged** archives.'[endquote] [emphasis added] [quote]'(...)Directory information like owner, modification time and permissions are set after all members have been extracted. This is done to work around two problems: A directory’s modification time is reset each time a file is created in it. And, if a directory’s **permissions do not allow writing**, extracting files to it will fail'[endquote] [emphasis added] [Lars] I agree with David here. It is easy to manipulate the tarfile in advance, as you have shown yourself. The performance argument does not convince me either. Ok. Can you comment on this argument? [quote]'(...)the very reason to write a program to extract tarball (instead of doing it manually) is to automate it .. which automation is *more effective and simple* if ``extractall`` had a flag such as readaccess=True'[endquote] (emphasis added) [quote]'I just think it is not simple (as in, keeping the code off from such hacks that are tangential to the problem being solved) and effective (as in, not having to deal with potential unintended side effects like bugs in the post-fix chmoding or in the pre-fix tarinfo mode modifications).'[endquote] |
|||
msg88967 - (view) | Author: Lars Gustäbel (lars.gustaebel) * | Date: 2009-06-05 19:18 | |
Sure, tarfile contains numerous work-arounds for quirky and buggy archives. Otherwise, it would not be usable in real-life. But we should not mix up different issues here. tarfile reads and extracts your generator_tools.tar just fine. Formally, the data is okay. It's the stored information that is useless and that you are not happy with. But as we both agree it is rather simple to fix this information in advance: import tarfile tar = tarfile.open("generator_tools-0.3.5.tar.gz") for t in tar: if t.isdir(): t.mode = 0755 else: t.mode = 0644 tar.extractall() tar.close() Sure, there is some functionality in extractall() that addresses issues with inappropriate permissions, but without this functionality the archive would not even *extract* cleanly. That is very different from your problem. In my opinion, the code above illustrates quite well, that tarfile was designed to be high-level and flexible at the same time. Make use of that. I honestly think that extractall() can do well without a readaccess argument. |
|||
msg88985 - (view) | Author: Sridhar Ratnakumar (srid) | Date: 2009-06-06 00:26 | |
[Lars] Sure, there is some functionality in extractall() that addresses issues with inappropriate permissions, but without this functionality the archive would not even *extract* cleanly. That is very different from your problem. Fair enough. 'tis time to creating a pypi package out of my high-level wrapper: http://gist.github.com/124597 |
|||
msg88990 - (view) | Author: Lars Gustäbel (lars.gustaebel) * | Date: 2009-06-06 07:09 | |
I close this issue then. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:49 | admin | set | github: 50446 |
2009-06-06 07:09:26 | lars.gustaebel | set | status: open -> closed resolution: rejected messages: + msg88990 |
2009-06-06 00:26:24 | srid | set | messages: + msg88985 |
2009-06-05 19:18:28 | lars.gustaebel | set | messages: + msg88967 |
2009-06-05 17:45:07 | srid | set | messages: + msg88956 |
2009-06-05 11:00:17 | lars.gustaebel | set | assignee: lars.gustaebel messages: + msg88934 nosy: + lars.gustaebel |
2009-06-04 22:28:16 | srid | set | messages: + msg88915 |
2009-06-04 22:12:06 | r.david.murray | set | priority: low nosy: + r.david.murray messages: + msg88912 |
2009-06-04 21:52:02 | srid | set | messages: + msg88911 |
2009-06-04 21:46:30 | srid | set | messages: + msg88909 |
2009-06-04 21:45:51 | srid | create |