Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please support "numeric_owner" in tarfile #67382

Closed
mvo mannequin opened this issue Jan 8, 2015 · 19 comments
Closed

Please support "numeric_owner" in tarfile #67382

mvo mannequin opened this issue Jan 8, 2015 · 19 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@mvo
Copy link
Mannequin

mvo mannequin commented Jan 8, 2015

BPO 23193
Nosy @gustaebel, @ericvsmith, @bitdancer, @berkerpeksag, @serhiy-storchaka
Files
  • tarfile-numeric-owner.diff: Patch (without test!) that adds a new "numeric_owner" kwarg to extract()
  • tarfile-numeric-owner-with-tests.diff
  • tarfile-numeric-owner-with-tests-1.diff
  • tarfile-numeric-owner-with-tests-2.diff
  • tarfile-numeric-owner-with-tests-3.diff
  • tarfile-numeric-owner-with-tests-4.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericvsmith'
    closed_at = <Date 2015-04-15.14:29:57.889>
    created_at = <Date 2015-01-08.14:42:24.546>
    labels = ['type-feature', 'library']
    title = 'Please support "numeric_owner" in tarfile'
    updated_at = <Date 2015-05-13.05:01:59.994>
    user = 'https://bugs.python.org/mvo'

    bugs.python.org fields:

    activity = <Date 2015-05-13.05:01:59.994>
    actor = 'python-dev'
    assignee = 'eric.smith'
    closed = True
    closed_date = <Date 2015-04-15.14:29:57.889>
    closer = 'eric.smith'
    components = ['Library (Lib)']
    creation = <Date 2015-01-08.14:42:24.546>
    creator = 'mvo'
    dependencies = []
    files = ['37645', '37803', '38914', '38953', '38981', '38998']
    hgrepos = []
    issue_num = 23193
    keywords = ['patch']
    message_count = 19.0
    messages = ['233663', '233672', '233755', '233883', '233884', '233915', '233935', '233936', '233939', '234427', '240608', '240642', '240729', '240837', '240838', '240963', '241102', '241104', '243034']
    nosy_count = 7.0
    nosy_names = ['lars.gustaebel', 'eric.smith', 'mvo', 'r.david.murray', 'python-dev', 'berker.peksag', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue23193'
    versions = ['Python 3.5']

    @mvo
    Copy link
    Mannequin Author

    mvo mannequin commented Jan 8, 2015

    Please consider adding a option to extract a tarfile with the uid/gid instead of the lookup for uname/gname in the tarheader (i.e. what tar --numeric-owner provides).

    One use-case is if you unpack a chroot tarfile that contains a /etc/{passwd,group} file with different uid/gid for user/groups like zope that may be present in both host and chroot but have different numbers. With the current approach files owned by this user will get the host uid/gid instead of the uid/gid of the chroot.

    Attached is a patch to outline what I have in mind - if there is a chance that this patch goes in I'm happy to write the required test (mocking os.chown()) for this to go in.

    Thanks for your consideration,
    Michael

    @mvo mvo mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 8, 2015
    @vstinner
    Copy link
    Member

    vstinner commented Jan 8, 2015

    "(...) if there is a chance that this patch goes in I'm happy to write the required test (mocking os.chown()) for this to go in."

    We don't accept changes without test. So you must write a test.

    Implementing the feature in Python makes sense.

    @berkerpeksag
    Copy link
    Member

    The patch also needs documentation update for TarFile.extract():

    https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extract
    

    The tarfile documentation is located at Doc/library/tarfile.rst.

    @ericvsmith
    Copy link
    Member

    I think Michael is asking if the proposed change would ever be accepted. If the answer is "no, not even if you write the tests and update the documentation", then there's no sense putting the work into this. That seems like a reasonable question to me.

    I think the proposed change is reasonable, but I'm no tarfile expert.

    But since this functionality is available in the tar command as --numeric-owner, I think the feature request itself is a good idea.

    @bitdancer
    Copy link
    Member

    I concur that this is a reasonable feature request, and it is not one that can be satisfied without modifying the tarfile module (that is, you can't write a simple wrapper to tarfile to get the functionality desired without cutting and pasting the entire chown method).

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Jan 13, 2015

    I would argue that a serious alternative to this patch is to simply override the TarFile.chown() method in a subclass. However, I'm not sure if this expects too much of the user.

    @bitdancer
    Copy link
    Member

    If it weren't for the fact that this feature is something that the tar command provides, I'd agree (the chown method is relatively short). However, since tar *does* provide this feature, it seems reasonable for us to support it as well. Call me +0.5 :)

    @ericvsmith
    Copy link
    Member

    I don't think we want to encourage the type of coupling that arises from subclassing, especially when when overriding an undocumented method. I'm +1 on the change. I'll review the patch. Michael: can you write the tests, and hopefully docs?

    @ericvsmith ericvsmith self-assigned this Jan 13, 2015
    @ericvsmith
    Copy link
    Member

    Ignore my review comment on pwd and grp being None. I see that there is a test for it (at least grp), and they're not available on Windows.

    @mvo
    Copy link
    Mannequin Author

    mvo mannequin commented Jan 21, 2015

    Thanks everyone for the comments and feedback!

    Attached is a updated patch with tests and a documentation update.

    Feedback is very welcome. I decided to skip the test on systems where root is not uid,gid=0. I could also mock that of course if you prefer it this way.

    Thanks,
    Michael

    @ericvsmith
    Copy link
    Member

    Updated patch with a few minor doc tweaks.

    The one substantive change I did make was to add numeric_owner to the call to self.chown() when setting directory owners. I believe this is correct, but I need to convince myself and to write a test.

    @ericvsmith
    Copy link
    Member

    Note that this change will break code that subclasses TarFile and overrides chown(), as suggested in msg233915. I'm not too concerned about that, since chown() is not documented. Ideally it would be renamed to _chown(), but that's probably a separate issue.

    @ericvsmith
    Copy link
    Member

    I added numeric_owner to the self.chown() call when adding directories. I'm reasonably sure this is correct.

    I added tests for dirs, although they need some cleaning up to be simpler and cleaner. I'll do that cleanup shortly, but I want to check this in before I get on a plane.

    I also changed the tests to use different numbers for .gid and .uid, in order to pick up if there's a transposition error somewhere.

    If anyone can test this on Windows, that would be helpful.

    @ericvsmith
    Copy link
    Member

    Other than Misc/NEWS, I think this is the final version of this patch.

    @berkerpeksag
    Copy link
    Member

    • +.. method:: TarFile.extractall(path=".", members=None, numeric_owner=False)

      numeric_owner can be a keyword-only argument.

    • TarFile.extract and TarFile.extractall docs need a versionchanged directive.

    • It would be nice to add an entry to Doc/whatsnew/3.5.rst

      •    filename_1 = fname
        

      + dirname_1 = dirname
      + filename_2 = os.path.join(dirname, fname)

      I'd just yield fname, dirname, os.path.join(dirname, fname) here.

      •        for name, uid, gid, typ, contents in [(fname, 99, 98, tarfile.REGTYPE, fobj),
        

      + (dirname, 77, 76, tarfile.DIRTYPE, None),
      + (os.path.join(dirname, fname), 88, 87, tarfile.REGTYPE, fobj),
      + ]:

      Moving the list to a new variable would be more readable.

    • Typo: # ceate -> # create

    • +def root_is_uid_gid_0():

      Question: Can't we use something like root_in_posix in test_os here?

      •    with tarfile.open(tar_filename) as r:
        

      Nitpick: What does "r" mean here? "tar" or "tarobj" looks more readable to me.

    • Nitpick: I'd prefer ``None`` over :const:`True`. However, the current style is just "true" in the tarfile documentation.

    @ericvsmith
    Copy link
    Member

    Thanks for your review, Berker. I've updated the code with most of your suggestions, although some of them were mooted by some restructuring I did.

    A couple of questions/issues:

    • I'm not sure where we stand on keyword-only arguments. I certainly agree that I'd never specify numeric_only unless I named it. However, I don't see a lot of that style in new or modified APIs. I'll ask over on python-dev and see what they say.

    • test_extract_without_numeric_owner only works if the user named 'root' has uid = 0 (same for gid). This is a different test than what test_os.root_in_posix is doing. I think it's best to leave it as-is, although I've added a comment and reduced the scope of the skip to just this one test.

    • I reformatted the tests to stay within 80 characters, and I think it also made them more legible.

    • I'm not sure what you mean by your last point. I believe I use True as it is elsewhere in that module and its documentation. And None doesn't make any sense to me as a parameter value for this.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 15, 2015

    New changeset 6b70f16d585a by Eric V. Smith in branch 'default':
    bpo-23193: Add numeric_owner to tarfile.TarFile.extract() and tarfile.TarFile.extractall().
    https://hg.python.org/cpython/rev/6b70f16d585a

    @ericvsmith
    Copy link
    Member

    Thanks everyone for their help, especially Michael for the original patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 13, 2015

    New changeset e5a53d75dc19 by Zachary Ware in branch 'default':
    Issue bpo-23193: Skip numeric_owner tests on platforms where they don't make sense
    https://hg.python.org/cpython/rev/e5a53d75dc19

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants