classification
Title: Please support "numeric_owner" in tarfile
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: berker.peksag, eric.smith, lars.gustaebel, mvo, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-01-08 14:42 by mvo, last changed 2015-05-13 05:01 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
tarfile-numeric-owner.diff mvo, 2015-01-08 14:42 Patch (without test!) that adds a new "numeric_owner" kwarg to extract() review
tarfile-numeric-owner-with-tests.diff mvo, 2015-01-21 08:43 review
tarfile-numeric-owner-with-tests-1.diff eric.smith, 2015-04-13 14:13 review
tarfile-numeric-owner-with-tests-2.diff eric.smith, 2015-04-13 20:19
tarfile-numeric-owner-with-tests-3.diff eric.smith, 2015-04-14 03:22
tarfile-numeric-owner-with-tests-4.diff eric.smith, 2015-04-14 17:35 review
Messages (19)
msg233663 - (view) Author: Michael Vogt (mvo) * Date: 2015-01-08 14:42
Please consider adding a option to extract a tarfile with the uid/gid instead of the lookup for uname/gname in the tarheader (i.e. what tar --numeric-owner provides).

One use-case is if you unpack a chroot tarfile that contains a /etc/{passwd,group} file with different uid/gid for user/groups like zope that may be present in both host and chroot but have different numbers. With the current approach files owned by this user will get the host uid/gid instead of the uid/gid of the chroot.

Attached is a patch to outline what I have in mind - if there is a chance that this patch goes in I'm happy to write the required test (mocking os.chown()) for this to go in.

Thanks for your consideration,
 Michael
msg233672 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-01-08 16:56
"(...) if there is a chance that this patch goes in I'm happy to write the required test (mocking os.chown()) for this to go in."

We don't accept changes without test. So you must write a test.

Implementing the feature in Python makes sense.
msg233755 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-01-09 14:20
The patch also needs documentation update for TarFile.extract():

    https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extract

The tarfile documentation is located at Doc/library/tarfile.rst.
msg233883 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-01-12 13:24
I think Michael is asking if the proposed change would ever be accepted. If the answer is "no, not even if you write the tests and update the documentation", then there's no sense putting the work into this. That seems like a reasonable question to me.

I think the proposed change is reasonable, but I'm no tarfile expert.

But since this functionality is available in the tar command as --numeric-owner, I think the feature request itself is a good idea.
msg233884 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-01-12 14:30
I concur that this is a reasonable feature request, and it is not one that can be satisfied without modifying the tarfile module (that is, you can't write a simple wrapper to tarfile to get the functionality desired without cutting and pasting the entire chown method).
msg233915 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2015-01-13 09:11
I would argue that a serious alternative to this patch is to simply override the TarFile.chown() method in a subclass. However, I'm not sure if this expects too much of the user.
msg233935 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-01-13 13:10
If it weren't for the fact that this feature is something that the tar command provides, I'd agree (the chown method is relatively short).  However, since tar *does* provide this feature, it seems reasonable for us to support it as well.  Call me +0.5 :)
msg233936 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-01-13 13:36
I don't think we want to encourage the type of coupling that arises from subclassing, especially when when overriding an undocumented method. I'm +1 on the change. I'll review the patch. Michael: can you write the tests, and hopefully docs?
msg233939 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-01-13 14:52
Ignore my review comment on pwd and grp being None. I see that there is a test for it (at least grp), and they're not available on Windows.
msg234427 - (view) Author: Michael Vogt (mvo) * Date: 2015-01-21 08:43
Thanks everyone for the comments and feedback!

Attached is a updated patch with tests and a documentation update. 

Feedback is very welcome. I decided to skip the test on systems where root is not uid,gid=0. I could also mock that of course if you prefer it this way.

Thanks,
 Michael
msg240608 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-04-13 14:13
Updated patch with a few minor doc tweaks.

The one substantive change I did make was to add numeric_owner to the call to self.chown() when setting directory owners. I believe this is correct, but I need to convince myself and to write a test.
msg240642 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-04-13 16:22
Note that this change will break code that subclasses TarFile and overrides chown(), as suggested in msg233915. I'm not too concerned about that, since chown() is not documented. Ideally it would be renamed to _chown(), but that's probably a separate issue.
msg240729 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-04-13 20:19
I added numeric_owner to the self.chown() call when adding directories. I'm reasonably sure this is correct.

I added tests for dirs, although they need some cleaning up to be simpler and cleaner. I'll do that cleanup shortly, but I want to check this in before I get on a plane.

I also changed the tests to use different numbers for .gid and .uid, in order to pick up if there's a transposition error somewhere.

If anyone can test this on Windows, that would be helpful.
msg240837 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-04-14 03:22
Other than Misc/NEWS, I think this is the final version of this patch.
msg240838 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-04-14 04:33
* +.. method:: TarFile.extractall(path=".", members=None, numeric_owner=False)

  numeric_owner can be a keyword-only argument.

* TarFile.extract and TarFile.extractall docs need a versionchanged directive.

* It would be nice to add an entry to Doc/whatsnew/3.5.rst

* +        filename_1 = fname
  +        dirname_1 = dirname
  +        filename_2 = os.path.join(dirname, fname)

  I'd just yield fname, dirname, os.path.join(dirname, fname) here.

* +            for name, uid, gid, typ, contents in [(fname, 99, 98, tarfile.REGTYPE, fobj),
  +                                                  (dirname, 77, 76, tarfile.DIRTYPE, None),
  +                                                  (os.path.join(dirname, fname), 88, 87, tarfile.REGTYPE, fobj),
  +                                                  ]:

  Moving the list to a new variable would be more readable.

* Typo: # ceate -> # create

* +def root_is_uid_gid_0():

  Question: Can't we use something like root_in_posix in test_os here?

* +        with tarfile.open(tar_filename) as r:

  Nitpick: What does "r" mean here? "tar" or "tarobj" looks more readable to me.

* Nitpick: I'd prefer ``None`` over :const:`True`. However, the current style is just "true" in the tarfile documentation.
msg240963 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-04-14 17:35
Thanks for your review, Berker. I've updated the code with most of your suggestions, although some of them were mooted by some restructuring I did.

A couple of questions/issues:
- I'm not sure where we stand on keyword-only arguments. I certainly agree that I'd never specify numeric_only unless I named it. However, I don't see a lot of that style in new or modified APIs. I'll ask over on python-dev and see what they say.

- test_extract_without_numeric_owner only works if the user named 'root' has uid = 0 (same for gid). This is a different test than what test_os.root_in_posix is doing. I think it's best to leave it as-is, although I've added a comment and reduced the scope of the skip to just this one test.

- I reformatted the tests to stay within 80 characters, and I think it also made them more legible.

- I'm not sure what you mean by your last point. I believe I use True as it is elsewhere in that module and its documentation. And None doesn't make any sense to me as a parameter value for this.
msg241102 - (view) Author: Roundup Robot (python-dev) Date: 2015-04-15 14:27
New changeset 6b70f16d585a by Eric V. Smith in branch 'default':
Issue 23193: Add numeric_owner to tarfile.TarFile.extract() and tarfile.TarFile.extractall().
https://hg.python.org/cpython/rev/6b70f16d585a
msg241104 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-04-15 14:29
Thanks everyone for their help, especially Michael for the original patch.
msg243034 - (view) Author: Roundup Robot (python-dev) Date: 2015-05-13 05:01
New changeset e5a53d75dc19 by Zachary Ware in branch 'default':
Issue #23193: Skip numeric_owner tests on platforms where they don't make sense
https://hg.python.org/cpython/rev/e5a53d75dc19
History
Date User Action Args
2015-05-13 05:01:59python-devsetmessages: + msg243034
2015-04-15 14:29:57eric.smithsetstatus: open -> closed
resolution: fixed
messages: + msg241104

stage: patch review -> resolved
2015-04-15 14:27:53python-devsetnosy: + python-dev
messages: + msg241102
2015-04-14 17:35:29eric.smithsetfiles: + tarfile-numeric-owner-with-tests-4.diff

messages: + msg240963
2015-04-14 08:14:31hayposetnosy: - haypo
2015-04-14 04:33:22berker.peksagsetmessages: + msg240838
2015-04-14 03:22:49eric.smithsetfiles: + tarfile-numeric-owner-with-tests-3.diff

messages: + msg240837
2015-04-13 20:19:34eric.smithsetfiles: + tarfile-numeric-owner-with-tests-2.diff

messages: + msg240729
2015-04-13 16:22:04eric.smithsetmessages: + msg240642
2015-04-13 14:13:38eric.smithsetfiles: + tarfile-numeric-owner-with-tests-1.diff

messages: + msg240608
2015-01-28 13:27:24eric.smithsetstage: test needed -> patch review
2015-01-21 08:43:15mvosetfiles: + tarfile-numeric-owner-with-tests.diff

messages: + msg234427
2015-01-13 14:52:15eric.smithsetmessages: + msg233939
2015-01-13 14:26:06eric.smithsetassignee: eric.smith
2015-01-13 13:36:00eric.smithsetmessages: + msg233936
stage: patch review -> test needed
2015-01-13 13:10:38r.david.murraysetmessages: + msg233935
2015-01-13 09:11:55lars.gustaebelsetmessages: + msg233915
2015-01-12 14:30:12r.david.murraysetnosy: + r.david.murray
messages: + msg233884
2015-01-12 13:24:42eric.smithsetnosy: + eric.smith
messages: + msg233883
2015-01-09 14:20:22berker.peksagsetnosy: + berker.peksag
messages: + msg233755
2015-01-08 16:56:48hayposetnosy: + haypo
messages: + msg233672
2015-01-08 14:49:23serhiy.storchakasetnosy: + lars.gustaebel, serhiy.storchaka
stage: patch review

versions: + Python 3.5
2015-01-08 14:42:24mvocreate