Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfile module should have a command line #57686

Closed
brandon-rhodes mannequin opened this issue Nov 25, 2011 · 36 comments
Closed

tarfile module should have a command line #57686

brandon-rhodes mannequin opened this issue Nov 25, 2011 · 36 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@brandon-rhodes
Copy link
Mannequin

brandon-rhodes mannequin commented Nov 25, 2011

BPO 13477
Nosy @rhettinger, @gustaebel, @pitrou, @vstinner, @larryhastings, @ezio-melotti, @merwok, @berkerpeksag, @serhiy-storchaka
Files
  • issue_13477
  • issue_13477_v2
  • issue13477_v3.diff
  • issue13477_v4.diff
  • tarcli.patch
  • issue13477_v5.diff
  • issue13477_v6.diff
  • tarfile_cli.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-03-10.23:54:20.738>
    created_at = <Date 2011-11-25.03:11:05.173>
    labels = ['type-feature', 'library']
    title = 'tarfile module should have a command line'
    updated_at = <Date 2014-03-10.23:54:20.737>
    user = 'https://bugs.python.org/brandon-rhodes'

    bugs.python.org fields:

    activity = <Date 2014-03-10.23:54:20.737>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-03-10.23:54:20.738>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2011-11-25.03:11:05.173>
    creator = 'brandon-rhodes'
    dependencies = []
    files = ['29294', '29337', '29347', '29686', '31305', '31879', '32725', '32795']
    hgrepos = []
    issue_num = 13477
    keywords = ['patch', 'needs review']
    message_count = 36.0
    messages = ['148300', '148301', '148389', '183354', '183356', '183363', '183507', '183656', '183677', '183682', '183684', '183688', '183722', '183724', '183725', '183739', '183749', '183752', '184626', '184628', '184644', '184729', '184753', '184758', '186212', '195287', '198450', '199496', '199501', '203062', '203510', '203993', '204134', '204136', '204233', '213007']
    nosy_count = 13.0
    nosy_names = ['rhettinger', 'lars.gustaebel', 'pitrou', 'vstinner', 'larry', 'ezio.melotti', 'eric.araujo', 'kyle', 'brandon-rhodes', 'python-dev', 'berker.peksag', 'serhiy.storchaka', 'Ankur.Ankan']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue13477'
    versions = ['Python 3.4']

    @brandon-rhodes
    Copy link
    Mannequin Author

    brandon-rhodes mannequin commented Nov 25, 2011

    The tarfile module should have a simple command line that allows it to be executed with "-m" — even if its only ability was to take a filename and extract it to the current directory, it could be a lifesaver on Windows machines where Python has been installed but nothing else. Would such a patch be welcome if I could write one up?

    @ezio-melotti
    Copy link
    Member

    The feature request seems reasonable to me, but this can only go in 3.3.
    If you want to propose a patch, you might want to check the devguide and what other modules like zipfile do.

    @ezio-melotti ezio-melotti added the type-feature A feature request or enhancement label Nov 25, 2011
    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Nov 26, 2011

    This is no bad idea. I recommend keeping it as simple as possible. I would definitely not be supportive of a full tar clone. List, extract, create - that should be enough. There are two possible command line choices: do what the zipfile module does or emulate tar. I am in favor of the latter.

    @gustaebel gustaebel mannequin self-assigned this Nov 26, 2011
    @merwok
    Copy link
    Member

    merwok commented Mar 3, 2013

    Patch looks good! Some minor comments on Rietveld.

    Could you add tests?

    @rhettinger
    Copy link
    Contributor

    +1 for adding a CLI and +1 for keeping it minimal.

    @AnkurAnkan
    Copy link
    Mannequin

    AnkurAnkan mannequin commented Mar 3, 2013

    I was also working on this issue so thought I should also submit my patch.
    Has a few extra features from berker.peksag's patch:

    1. the name of the files to be extracted can be specified
    2. output directory can be specified for extracting files.

    @berkerpeksag
    Copy link
    Member

    Patch looks good! Some minor comments on Rietveld.

    Thanks for the review, Éric.

    Could you add tests?

    Done.

    Here's the new patch with Éric's comments addressed.

    @AnkurAnkan
    Copy link
    Mannequin

    AnkurAnkan mannequin commented Mar 7, 2013

    Thanks for your comments Serhiy.
    I have improved the patch according to your comments. Please have a look.

    And I am writing tests.

    @serhiy-storchaka
    Copy link
    Member

    It will be good if Berker and Ankur will merge their patches. Ankur's patch has some very useful features, but Berker's patch looks more mature.

    I prefer to emulate a subset of the tar utility interface too.

    @merwok
    Copy link
    Member

    merwok commented Mar 7, 2013

    I am more in favor of having something simple and similar to zipfile, like Lars, rather than following tar.

    @serhiy-storchaka
    Copy link
    Member

    This can confuse users. Note that even jar (which works with zip-like files) honors tar interface.

    @merwok
    Copy link
    Member

    merwok commented Mar 7, 2013

    Yeah, that’s always the discussion when writing a Python utility that has a unix equivalent: do you want to be familiar to Python users or to the unix tool users?

    I don’t have a strong opinion. I think unix users would have no reason to use python -m tarfile, and windows users won’t have the expectation that the interface is the same as tar—unless they are unix people who are using a windows machine for whatever reason. If it were me, I’d just start with python -m tarfile --help, so I’d have no expectations :)

    @vstinner
    Copy link
    Member

    vstinner commented Mar 8, 2013

    + parser.add_argument('--gz', '--gunzip', '--gzip', '--tgz', '-z',
    + '--ungzip', action = 'store_true',
    + help = 'gz compression')
    + parser.add_argument('--bz2', '--bzip2', '--tbz2', '--tbz', '--tb2',
    + action = 'store_true', help = 'bz2 compression')
    + parser.add_argument('--xz', '--lzma', action = 'store_true',
    + help = 'xz compression')

    Do we really need so much names for the same option? Where do these names come from?

    --

    main() should exit after extract and create to only do one operation and don't always display the usage.

    It would be better to not duplicate the list of options and use parser.print_help() instead of sys.stdout.write(doc).

    Some consistency tests on exclusive options (bzip/gzip/lzma and list/create/extract) would be nice.

    --

    tar options on Linux:

       -c, --create
       -t, --list
       -x, --extract, --get
       -z, --gzip, --ungzip
       -j, -I, --bzip
       -C, --directory DIRECTORY
    

    For tarfile, I propose to have a shorter list, and try to stay somehow compatible with tar:

       -c, --create
       -t, --list
       -x, --extract
       -z, --gzip
       -j, --bzip
       -C, --directory DIRECTORY
    

    Users of the TAR format usually come from UNIX, so using the same command line options should not be so surprising.

    I don't like the idea of an optional argument for --extract: "--extract file1 file2" is usually understood/read as "--extract=filename archive.tar". If you really think that we need to support "only extract some files", it should be a different option. Linux tar command has no such option. I propose to drop this feature (always extract all files).

    @berkerpeksag
    Copy link
    Member

    New patch(issue13477_v3.diff) attached.

    Changes:

    • Addressed comments from Serhiy
    • Added "output" parameter to --extract option (from Ankur's patch)
    • Updated tests and documentation

    The current docstring of tarfile module does not give much
    information(it just prints "Read from and write to tar format
    archives.") so I skipped the -d option.

    @AnkurAnkan
    Copy link
    Mannequin

    AnkurAnkan mannequin commented Mar 8, 2013

    • parser.add_argument('--gz', '--gunzip', '--gzip', '--tgz', '-z',
    •                    '--ungzip', action = 'store_true', 
      
    •                    help = 'gz compression')
      
    • parser.add_argument('--bz2', '--bzip2', '--tbz2', '--tbz', '-- tb2',
    •                    action = 'store_true', help = 'bz2 compression')
      
    • parser.add_argument('--xz', '--lzma', action = 'store_true',
    •                    help = 'xz compression')
      

    Do we really need so much names for the same option? Where do these > names come from?

    I was trying to implement all the formats mentioned in Serhiy's review. (and also different names for the same format)

    @merwok
    Copy link
    Member

    merwok commented Mar 8, 2013

    Users of the TAR format usually come from UNIX,
    so using the same command line options should not be so surprising.
    Not sure about that: they could be Python users wanting to unpack a tarball sdist. That said, there is no harm in being compatible, and I like your small list of options.

    FTR Lars said that he prefered compat with the zipfile CLI, which is:

    Usage:
    zipfile.py -l zipfile.zip # Show listing of a zipfile
    zipfile.py -t zipfile.zip # Test if a zipfile is valid
    zipfile.py -e zipfile.zip target # Extract zipfile into target dir
    zipfile.py -c zipfile.zip src ... # Create zipfile from sources

    @merwok
    Copy link
    Member

    merwok commented Mar 8, 2013

    Did you get all the review comments? Some of them were made on older versions of the patch, and don’t seem to be addressed in the latest version. Thanks.

    Ankur, could you submit a contributor agreement? http://www.python.org/psf/contrib/contrib-form/

    @AnkurAnkan
    Copy link
    Mannequin

    AnkurAnkan mannequin commented Mar 8, 2013

    I am still unclear about the outcomes of the discussion. I am confused which features need to be kept and which are to be removed.

    Ankur, could you submit a contributor agreement?
    I will submit it today.

    @larryhastings
    Copy link
    Contributor

    Modern tar programs don't need to be told the compression method--they infer it. If they can do it in C, we can do it in Python. So we should simply omit the "-bz2" stuff.

    As for what the interface should look like, I'm definitely in favor of it looking like tar. unzip has the same interface on different platforms; so does 7zip, so does unrar. I think it's reasonable to expect that tar would take the same interface on different platforms. We don't need to coddle Windows users here. We're already expecting them to be sophisticated enough to handle the EOL conversion we're not doing for them.

    @serhiy-storchaka
    Copy link
    Member

    Note that --create command should support --directory option too.

    Modern tar programs don't need to be told the compression method--they infer it. If they can do it in C, we can do it in Python. So we should simply omit the "-bz2" stuff.

    An archive may have no extension or have a nonstandard extension. And stdin/stdout does not have a name.

    @larryhastings
    Copy link
    Contributor

    Huh. tar *can* infer it from the data itself. On the other hand, it chooses explicitly not to.

    % cat ~/Downloads/Python-3.3.0.tar.bz2| tar xvf -
    tar: Archive is compressed. Use -j option
    tar: Error is not recoverable: exiting now

    % cat ~/Downloads/Python-3.3.0.tgz| tar xvf -
    tar: Archive is compressed. Use -z option
    tar: Error is not recoverable: exiting now

    I guess "tar" knows explicit is better than implicit too ;-)

    @brandon-rhodes
    Copy link
    Mannequin Author

    brandon-rhodes mannequin commented Mar 20, 2013

    Larry Hastings <report@bugs.python.org> writes:

    Huh. tar *can* infer it from the data itself. On the other hand, it
    chooses explicitly not to. I guess "tar" knows explicit is better
    than implicit too ;-)

    I am told that the refusal of "tar" to introspect the data is because:

    (a) Tar runs "gunzip -c" (for example) as an external program; it does
    not actually compile against libz.

    (b) Streams in UNIX cannot be rewound. Tar cannot look at the first
    block of an input pipe and then "put the block back" so that the same
    input can be fed directly to "gunzip" as its input.

    (c) Given (a) and (b), tar could only support data introspection of
    input from a pipe if it were willing to be a pass-through that, after
    reading and introspecting the first block, then fired up "gunzip" and
    sent ALL of the blocks through. Which would require multiprocessing,
    threading, or async I/O so that tar could both read and write, which
    would make tar more complicated.

    (d) Therefore, tar refuses to even look.

    Since Python does bundle compression in its standard library, it can
    quite trivially step forward and actually do the data introspection that
    tar insists on not doing; the first few bytes of a tar archive are quite
    demonstrably different from the first bytes of a gzip stream, if I
    recall.

    @vstinner
    Copy link
    Member

    I don't think that we need to support compressing/decompressing using
    the standard input/output.

    2013/3/20 Brandon Craig Rhodes <report@bugs.python.org>:

    Brandon Craig Rhodes added the comment:

    Larry Hastings <report@bugs.python.org> writes:

    > Huh. tar *can* infer it from the data itself. On the other hand, it
    > chooses explicitly not to. I guess "tar" knows explicit is better
    > than implicit too ;-)

    I am told that the refusal of "tar" to introspect the data is because:

    (a) Tar runs "gunzip -c" (for example) as an external program; it does
    not actually compile against libz.

    (b) Streams in UNIX cannot be rewound. Tar cannot look at the first
    block of an input pipe and then "put the block back" so that the same
    input can be fed directly to "gunzip" as its input.

    (c) Given (a) and (b), tar could only support data introspection of
    input from a pipe if it were willing to be a pass-through that, after
    reading and introspecting the first block, then fired up "gunzip" and
    sent ALL of the blocks through. Which would require multiprocessing,
    threading, or async I/O so that tar could both read and write, which
    would make tar more complicated.

    (d) Therefore, tar refuses to even look.

    Since Python does bundle compression in its standard library, it can
    quite trivially step forward and actually do the data introspection that
    tar insists on not doing; the first few bytes of a tar archive are quite
    demonstrably different from the first bytes of a gzip stream, if I
    recall.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue13477\>


    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Mar 20, 2013

    I'd like to re-emphasize that it is best to keep the whole thing as simple and straight-forward as possible. Offer some basic operations and that's it.

    Although I am pretty accustomed to the original tar command line, I think we should copy zipfile's interface. It makes more sense to offer some kind of unified "Python" command line approach for archive access than keeping to old traditions.

    I agree with Victor that we don't really need support for stdin/stdout. It only complicates matters.

    If everybody still votes for stdin/stdout, I'd like to point out that tarfile supports compression detection for streams. It would be best to use mode="r|*" throughout because it works for both normal files and stdin. Use mode="w|(compression)" for writing to files and stdout accordingly.

    If we do not support stdin/stdout we no longer need all these compression options because for reading we do autodetection and for writing we could deduce the compression from the file extension (which is just some kind of autodetection too).

    Another side note: We should be aware of the effects discussed in bpo-17102 and bpo-1044. In my opinion tarfile as a library is obligated to behave like that, but maybe that's not acceptable for a command line tool.

    @serhiy-storchaka
    Copy link
    Member

    Then I propose to add an alternative tarfile command-line interface as Tools/scripts/tar.py for those who prefer a well-known and well-tested traditional interface.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 15, 2013

    Regenerated patch against latest default (fixing conflicts).

    @berkerpeksag
    Copy link
    Member

    Thanks for the rebase, Antoine.

    Here is an updated patch:

    • Adressed Serhiy's comments. I didn't add a directory parameter to the
      create command to keep the CLI simple.
    • Added a test for dotless files
    • Returned proper exit codes

    @berkerpeksag berkerpeksag added the stdlib Python modules in the Lib dir label Sep 26, 2013
    @pitrou
    Copy link
    Member

    pitrou commented Oct 11, 2013

    From a quick glance, the patch looks ok. Serhiy, do you want to review it any further?

    @serhiy-storchaka
    Copy link
    Member

    Yes, this is in my plans.

    @serhiy-storchaka
    Copy link
    Member

    I have added comments on Rietveld.

    @berkerpeksag
    Copy link
    Member

    Attached an updated patch that addresses Serhiy's comments. Thanks!

    @serhiy-storchaka
    Copy link
    Member

    I think Berker has misunderstood me. Here is a patch based on issue13477_v5.diff with some cherry-picked changes from issue13477_v6.diff and several other changes:

    • --create, --extract, --list, and --test options are now mutual exclusive.
    • --test now test a tarfile for integrity (as in the zipfile module).
    • File names in output are printed now with repr().
    • Now tarfile CLI now is silent by default. Added option -v (--verbose) to print more verbose output as in issue13477_v5.diff.
    • Added helps for arguments.
    • Fixed and enhanced tests,

    I'm going to commit this patch at short time.

    Known bugs:

    • Help for --extract shows "--extract <tarfile> [<output_dir> ...]" instead of "--extract <tarfile> [<output_dir>]". --extract accepts only 1 to 2 arguments.
    • --list fails with a tarfile containing unencodable file names. In particular it fails with test tarfiles in the test suite.
    • Possible problems with unusual locales and file system encodings.
    • Corrupted tarfiles produces tracebacks.
    • Tests for --create should check that created tarfile contains correct files.
    • Tests for --create should check that correct files are extracted.
    • Needed tests for non-ASCII file names.

    Besides all this I think the patch can be committed.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 23, 2013

    New changeset a5b6c8cbc473 by Serhiy Storchaka in branch 'default':
    Issue bpo-13477: Added command line interface to the tarfile module.
    http://hg.python.org/cpython/rev/a5b6c8cbc473

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 24, 2013

    New changeset 70b9d22b900a by Serhiy Storchaka in branch 'default':
    Build a list of supported test tarfiles dynamically for CLI "test" command
    http://hg.python.org/cpython/rev/70b9d22b900a

    @serhiy-storchaka
    Copy link
    Member

    changeset: 87476:a539c85aec51
    user: Antoine Pitrou <solipsis@pitrou.net>
    date: Sun Nov 24 01:55:05 2013 +0100
    summary:
    Try to fix test_tarfile under Windows

    Thank you Antoine.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 10, 2014

    New changeset 5b52db6fc7dc by R David Murray in branch 'default':
    whatsnew: tarfile cli (bpo-13477).
    http://hg.python.org/cpython/rev/5b52db6fc7dc

    @pitrou pitrou closed this as completed Mar 10, 2014
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants