Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xz support to shutil #49661

Closed
proyvind mannequin opened this issue Mar 3, 2009 · 34 comments
Closed

Add xz support to shutil #49661

proyvind mannequin opened this issue Mar 3, 2009 · 34 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@proyvind
Copy link
Mannequin

proyvind mannequin commented Mar 3, 2009

BPO 5411
Nosy @pitrou, @tarekziade, @merwok, @4kir4, @berkerpeksag, @hynek, @serhiy-storchaka
Dependencies
  • bpo-5689: Support xz compression in tarfile module
  • Files
  • shutil-lzma_3.patch
  • docs-fix-sphinx-warning.patch: fix sphinx's whitespace/indentation warning
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-08-06.15:58:54.643>
    created_at = <Date 2009-03-03.11:56:24.716>
    labels = ['type-feature', 'library']
    title = 'Add xz support to shutil'
    updated_at = <Date 2014-08-11.15:11:21.692>
    user = 'https://bugs.python.org/proyvind'

    bugs.python.org fields:

    activity = <Date 2014-08-11.15:11:21.692>
    actor = 'berker.peksag'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-08-06.15:58:54.643>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2009-03-03.11:56:24.716>
    creator = 'proyvind'
    dependencies = ['5689']
    files = ['32163', '36342']
    hgrepos = []
    issue_num = 5411
    keywords = ['patch']
    message_count = 34.0
    messages = ['83072', '83263', '83401', '83404', '111038', '115780', '148660', '152748', '152749', '153131', '153351', '153831', '153832', '153834', '153937', '153938', '154029', '173831', '173833', '174491', '174707', '199977', '199987', '200183', '224471', '224722', '224771', '224780', '224943', '224944', '224945', '224946', '225181', '225193']
    nosy_count = 11.0
    nosy_names = ['pitrou', 'nadeem.vawda', 'tarek', 'eric.araujo', 'Arfrever', 'proyvind', 'akira', 'python-dev', 'berker.peksag', 'hynek', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue5411'
    versions = ['Python 3.5']

    @proyvind
    Copy link
    Mannequin Author

    proyvind mannequin commented Mar 3, 2009

    @proyvind proyvind mannequin added the stdlib Python modules in the Lib dir label Mar 3, 2009
    @proyvind proyvind mannequin assigned tarekziade Mar 3, 2009
    @proyvind proyvind mannequin added the type-feature A feature request or enhancement label Mar 3, 2009
    @tarekziade
    Copy link
    Mannequin

    tarekziade mannequin commented Mar 7, 2009

    Good idea !

    are you able provide a unit test for the changes made into the two
    commands ?

    @proyvind
    Copy link
    Mannequin Author

    proyvind mannequin commented Mar 9, 2009

    hmm, I'm unsure about how this should be done..

    I guess such a test would belong in Lib/distutils/test_dist.py, but I'm
    uncertain about how it should be done, ie. should it be a test for doing
    'bdist', 'bdist_rpm' and 'sdist' for each of the formats supported? I
    cannot seem to find any tests for the currently supported formats and
    such tests would introduce dependencies on the tools used to compress
    with these formats..

    @tarekziade
    Copy link
    Mannequin

    tarekziade mannequin commented Mar 9, 2009

    I guess such a test would belong in Lib/distutils/test_dist.py

    no, rather in test_bdist_rpm, test_sdist and so on,
    but I can do it, it'll just take more time.

    @tarekziade tarekziade mannequin removed the stdlib Python modules in the Lib dir label Apr 9, 2010
    @merwok
    Copy link
    Member

    merwok commented Jul 21, 2010

    distutils2 uses tarfile now instead of external programs. Adding dependency on another bug.

    @merwok
    Copy link
    Member

    merwok commented Sep 7, 2010

    FTR, the distutils2 repo is http://bitbucket.org/tarek/distutils2. distutils2.tests.support contains helper to create temp directories and run commands; see docstrings and example uses in the tests for more info.

    @merwok
    Copy link
    Member

    merwok commented Nov 30, 2011

    distutils2/packaging now just uses the shutil functions. I’ll make a patch for shutil after tarfile is updated.

    @merwok merwok added the stdlib Python modules in the Lib dir label Nov 30, 2011
    @merwok merwok changed the title add xz compression support to distutils add xz compression support to shutil Nov 30, 2011
    @merwok merwok assigned merwok and unassigned tarekziade Nov 30, 2011
    @merwok
    Copy link
    Member

    merwok commented Feb 6, 2012

    This not-so-bad patch adds lzma compression support to the shutil functions, under the 'xztar' name. Please review.

    @merwok
    Copy link
    Member

    merwok commented Feb 6, 2012

    I should add that a doc update will be part of the next patch.

    In parallel, I’ve also started updating packaging to add lzma support to the bdist command, but the situation is very disappointing: tarfile knows what format it supports, shutil has its own list, and packaging has a few duplicates too! Ideally, tarfile should have an attribute to expose what compression formats are available, then shutil would reuse that, and packaging would just call shutil. (I’m not even talking about the duplication in the doc.) If you agree with the general idea, I’ll open reports to a) add the attribute to tarfile b) rework the (luckily internal) registries in shutil to stop being hard-coded c) remove packaging’s registries.

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Feb 11, 2012

    This not-so-bad patch adds lzma compression support to the shutil functions, under the 'xztar' name. Please review.

    Functionally, the patch looks good to me. There are some docstrings that
    should probably be updated, though:

    • _make_tarball (the 'compress' argument)
    • make_archive
    • _unpack_tarfile
    • unpack_archive

    Ideally, tarfile should have an attribute to expose what compression formats are available, then shutil would reuse that, and packaging would just call shutil.

    +1

    @merwok
    Copy link
    Member

    merwok commented Feb 14, 2012

    a) is bpo-14013; b) is bpo-14011.

    If Lars agrees to bpo-14013, I will withdraw this patch in favor of another one that would make shutil automatically support all compressors that tarfile supports (my b) point).

    Note that this is not going to be pain-free, as for example bzip2 compression is obtained with the strings “bz2”, “bzip2”, “bzip” or “bztar” depending on the modules; I think it’s worth it nonetheless, even if I have to add an ugly conversion in shutil for compat.

    @merwok merwok changed the title add xz compression support to shutil shutil should support all formats supported by tarfile automatically Feb 20, 2012
    @merwok
    Copy link
    Member

    merwok commented Feb 21, 2012

    I have a working updated shutil module, tests pass and the documentation is improved. I will make sure to make different commits for improving the tests, cleaning up some things, adding tarfile.compression_formats and removing duplication in shutil (see dep bugs), to be sure not to introduce regressions.

    Before I finish and post the patch, I’d like feedback on a choice I made. I don’t think it’s possible to have shutil automatically support all the formats that tarfile does, because of the spelling issue I mentioned. Here’s what the code would look like (let me know if I should post it elsewhere or as a file to let you get syntax coloring):

    _ARCHIVE_FORMATS = {}
    _UNPACK_FORMATS = {}

      for fmt in tarfile.compression_formats:
          code = fmt + 'tar'
          ext = '.' + fmt
          desc = "%s'ed tar file" % fmt
          _ARCHIVE_FORMATS[code] = (_make_tarball, [('compress', fmt)], desc)
          _UNPACK_FORMATS[code] = ([ext], _unpack_tarfile, [], desc)
    
      # kludge to add alternate extension
      if 'gztar' in _ARCHIVE_FORMATS:
          _UNPACK_FORMATS['gztar'][0].append('.tgz')
          # XXX desc should say "gzip'ed tar file", not "gz'ed"
    
      # rectify naming incompatibility
      if 'bz2tar' in _ARCHIVE_FORMATS:
          # XXX alternative: make 'bztar' alias for 'bz2tar', but would complicate
          # manipulating the registries
          del _ARCHIVE_FORMATS['bz2tar']
          del _UNPACK_FORMATS['bz2tar']
          desc = "bzip2'ed tar file"
          _ARCHIVE_FORMATS['bztar'] = (_make_tarball, [('compress', 'bz2')], desc)
          _UNPACK_FORMATS['bztar'] = (['.bz2'], _unpack_tarfile, [], desc)

    # now add uncompressed tar and zip file

    I really don’t like that code. Given that tarfile is not extensible at run time, it is not a big deal to have to update shutil whenever we add a compression format to tarfile. Therefore, I backtracked on my “automatic support” idea but kept a lot of cleanup in did in the code. Here’s what the code looks like:

      _ARCHIVE_FORMATS = {}
      _UNPACK_FORMATS = {}
    
      if 'xz' in tarfile.compression_formats:
          desc = "xz'ed tar file"
          # XXX '.xz' is not great, '.tar.xz' would be better
          _ARCHIVE_FORMATS['xztar'] = (_make_tarball, [('compress', 'xz')], desc)
          _UNPACK_FORMATS['xztar'] = (['.xz'], _unpack_tarfile, [], desc)
    
      if 'bz2' in tarfile.compression_formats:
          desc = "bzip2'ed tar file"
          _ARCHIVE_FORMATS['bztar'] = (_make_tarball, [('compress', 'bz2')], desc)
          _UNPACK_FORMATS['bztar'] = (['.bz2'], _unpack_tarfile, [], desc)
    
      if 'gz' in tarfile.compression_formats:
          desc = "gzip'ed tar file"
          _ARCHIVE_FORMATS['gztar'] = (_make_tarball, [('compress', 'gz')], desc)
          _UNPACK_FORMATS['gztar'] = (['.gz', '.tgz'], _unpack_tarfile, [], desc)

    So, do you agree that “not automated but not ugly” is better than “automated with ugly klutches”?

    @merwok merwok changed the title shutil should support all formats supported by tarfile automatically Add xz support to shutil Feb 21, 2012
    @merwok
    Copy link
    Member

    merwok commented Feb 21, 2012

    s/cleanup in did in the code/cleanup I did in the code/

    @merwok
    Copy link
    Member

    merwok commented Feb 21, 2012

    Note that there is a way to get fully automated support for tar formats: tarfile could expose, in addition to the list compression_formats, another structure with the descriptions (e.g. “gzip’ed tar file”) and file extensions (e.g. ['.gz', '.tgz'] —no, it’s not '.tar.gz', which is unfortunate, and could cause Lars to reject that idea). I’m just putting this here for reference, but my preference is still for the second idea I talk about in my precedent message.

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Feb 22, 2012

    So, do you agree that “not automated but not ugly” is better than “automated with ugly klutches”?

    Definitely. If we have to add special cases that are almost as long as
    the original code, the "automation" seems pointless.

    Note that there is a way to get fully automated support for tar formats: tarfile could expose, in addition to the list compression_formats, another structure with the descriptions (e.g. “gzip’ed tar file”) and file extensions (e.g. ['.gz', '.tgz'] —no, it’s not '.tar.gz', which is unfortunate, and could cause Lars to reject that idea). I’m just putting this here for reference, but my preference is still for the second idea I talk about in my precedent message.

    As much as it would be nice to have all of the information centralized in
    one place, I don't think this is practical - some of the data that would
    be stored in this structure really is specific to shutil, so I agree that
    your second option is better.

    I think we can restructure the code a bit to reduce the work involved in
    adding a new format, though. Maybe something like this:

        _ARCHIVE_FORMATS = {}
        _UNPACK_FORMATS = {}
    for name, tarname, progname, extensions, desc in [
            ("xz", "xztar", "xz", [".xz"], "xz'ed tar file"),
            ("bz2", "bztar", "bzip2", [".bz2"], "bzip2'ed tar file"),
            ("gz", "gztar", "gzip", [".gz", ".tgz"], "gzip'ed tar file")]:
        if name in tarfile.compression_formats:
            _ARCHIVE_FORMATS[tarname] = _make_tarball, [("compress", progname)], desc
            _UNPACK_FORMATS[tarname] = extensions, _unpack_tarfile, [], desc
    

    By the way, is there any reason why you've used ".gz" instead of
    ".tar.gz" as the extension for "gztar" in your proposed change? The
    current version of shutil.py uses ".tar.gz", but has ".bz2" for "bztar".

    Also, I notice that _make_tarball() will need to be updated to add the
    info for xz to the tar_compression and compress_ext dictionaries. It
    seems a bit ugly to duplicate this information when it's already present
    in the format dicts, so maybe it would be better to pass the file
    extension in directly instead? I assume that _make_tarball() is not part
    of shutil's public API.

    If we did this, there would be no need for a separate "progname" field
    in the initialization code I suggested above:

    for name, tarname, extensions, desc in [
            ("xz", "xztar", [".xz"], "xz'ed tar file"),
            ("bz2", "bztar", [".bz2"], "bzip2'ed tar file"),
            ("gz", "gztar", [".gz", ".tgz"], "gzip'ed tar file")]:
        if name in tarfile.compression_formats:
            _ARCHIVE_FORMATS[tarname] = _make_tarball, [("compress", name)], desc
            _UNPACK_FORMATS[tarname] = extensions, _unpack_tarfile, [], desc
    

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Feb 22, 2012

    For the "xztar" format, you should also perhaps recognize the ".txz"
    extension - I've seen this used by FreeBSD.

    @merwok
    Copy link
    Member

    merwok commented Feb 23, 2012

    I think we can restructure the code a bit to reduce the work involved in adding a new format, though.
    Maybe something like this: [snip]
    Thanks, I’ve applied your obvious refactor in my clone :) For clarity, I will split my patch into many patches for different reports (bug fixes and doc improvements for 3.2, then code cleanup in 3.3, then adding xz).

    By the way, is there any reason why you've used ".gz" instead of ".tar.gz" as the extension for "gztar"
    in your proposed change? The current version of shutil.py uses ".tar.gz", but has ".bz2" for "bztar".
    Yes, I confused the two formats. I reported the '.bz2' bug (should be '.tar.bz2') and will fix it.

    Also, I notice that _make_tarball() will need to be updated to add the info for xz to the tar_compression
    and compress_ext dictionaries.
    I ripped them off. I could have pasted my full patch to save you time, sorry!

    For the "xztar" format, you should also perhaps recognize the ".txz" extension
    Done. Google finds a number of matches. I won’t add '.tbz' though, as I don’t know if it’s bzip or bzip2, and as it does not seem used much.

    @serhiy-storchaka
    Copy link
    Member

    I have not seen this issue and was created a new bpo-16313 with almost same patch as Eric's one (only I have changed the documentation too). Here's the patch. I wonder that it was not committed yet to 3.3. I think it would be better first to add xz support and the code cleanup to do then (if it takes so much time).

    @serhiy-storchaka
    Copy link
    Member

    Oh, I see the '.bz2' bug. Patch updated.

    @hynek
    Copy link
    Member

    hynek commented Nov 2, 2012

    Éric, what’s your take on this approach (not code)? We have time enough till 3.4 but it seems this doesn't really move forward. Any thoughts how to get this moving? Unfortunately I'm not invested enough in this to make a educated decision.

    @merwok
    Copy link
    Member

    merwok commented Nov 3, 2012

    I will upload my patch and compare it with Serhiy’s. Now that 3.3.0 is released, there is no hurry.

    @serhiy-storchaka
    Copy link
    Member

    3.4 beta 1 will be soon.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 15, 2013

    Serhiy's patch needs a "versionchanged" or "versionadded" tag in the Docs.

    @serhiy-storchaka
    Copy link
    Member

    Serhiy's patch needs a "versionchanged" or "versionadded" tag in the Docs.

    Done.

    @serhiy-storchaka
    Copy link
    Member

    Now 3.4 is released, but shutil still doesn't support the xz compression. I think we should hurry on to be in time for 3.5.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 4, 2014

    I think we should hurry on to be in time for 3.5.

    Your patch looks ok to me. Just update the version number in the docs ;)

    @serhiy-storchaka
    Copy link
    Member

    Éric?

    @merwok
    Copy link
    Member

    merwok commented Aug 4, 2014

    I’m afraid I changed computers once or twice since I worked on that, so I can’t readily find my clone and get the latest patch. I know where the hard drives are but I can’t say when I will have time to search them.

    @merwok merwok removed their assignment Aug 4, 2014
    @serhiy-storchaka serhiy-storchaka self-assigned this Aug 6, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 6, 2014

    New changeset e57db221b6c4 by Serhiy Storchaka in branch 'default':
    Issue bpo-5411: Added support for the "xztar" format in the shutil module.
    http://hg.python.org/cpython/rev/e57db221b6c4

    @pitrou
    Copy link
    Member

    pitrou commented Aug 6, 2014

    Thank you Serhiy! You need to fix the version number in the "versionchanged".

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 6, 2014

    New changeset a47996c10579 by Serhiy Storchaka in branch 'default':
    Issue bpo-5411: Fixed version number.
    http://hg.python.org/cpython/rev/a47996c10579

    @serhiy-storchaka
    Copy link
    Member

    Oh, my fault. Thank you Antoine!

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Aug 11, 2014

    sphinx generates warning for the current docs introduced by this issue:

    WARNING: Explicit markup ends without a blank line; unexpected unindent.

    I've uploaded a documentation patch that fixes it.

    @berkerpeksag
    Copy link
    Member

    Fixed. Thanks for the patch, Akira.

    http://hg.python.org/cpython/rev/7fcfb634ccca

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants