Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support tarfile.PAX_FORMAT in shutil.make_archive #74846

Closed
ncoghlan opened this issue Jun 14, 2017 · 9 comments
Closed

Support tarfile.PAX_FORMAT in shutil.make_archive #74846

ncoghlan opened this issue Jun 14, 2017 · 9 comments
Labels
3.8 only security fixes docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

BPO 30661
Nosy @ncoghlan, @gustaebel, @CAM-Gerlach
PRs
  • bpo-36268: Change default tar format to pax from GNU  #12355
  • bpo-30661: Improve doc for tarfile pax change and effect on shutil #12635
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-04-07.04:50:17.291>
    created_at = <Date 2017-06-14.01:25:39.175>
    labels = ['type-feature', '3.8', 'docs']
    title = 'Support tarfile.PAX_FORMAT in shutil.make_archive'
    updated_at = <Date 2019-04-07.04:50:17.290>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2019-04-07.04:50:17.290>
    actor = 'ncoghlan'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2019-04-07.04:50:17.291>
    closer = 'ncoghlan'
    components = ['Documentation']
    creation = <Date 2017-06-14.01:25:39.175>
    creator = 'ncoghlan'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30661
    keywords = ['patch']
    message_count = 9.0
    messages = ['295974', '295976', '338021', '339192', '339218', '339300', '339305', '339555', '339556']
    nosy_count = 4.0
    nosy_names = ['ncoghlan', 'lars.gustaebel', 'docs@python', 'CAM-Gerlach']
    pr_nums = ['12355', '12635']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue30661'
    versions = ['Python 3.8']

    @ncoghlan
    Copy link
    Contributor Author

    shutil.make_archive currently just uses the default tar format, which is GNU_FORMAT.

    This format doesn't ensure that all character paths are encoded as UTF-8, and hence may end up embedding platform specific encoding assumptions into the generated tarball.

    I see a few possible ways of resolving this:

    1. Change the default tar format to PAX_FORMAT. It's been 16 years since that was defined, and Python itself has supported it since 2.6 was released in 2008, so perhaps we can rely on other tools supporting it now? (My main open question on that front would be "What happens if you specify "format=GNU_FORMAT" when attempting to read a PAX formatted archive?)

    2. Add new shutil level "pax", "gzpax", "bzpax", "xzpax" format definitions to explicitly request PAX_FORMAT

    3. Add a mechanism to shutil.make_archive that allows format-dependent settings to be based down to the underlying archive creation functions (e.g. "format=tarfile.PAX_FORMAT").

    @ncoghlan ncoghlan added 3.7 (EOL) end of life type-feature A feature request or enhancement labels Jun 14, 2017
    @ncoghlan
    Copy link
    Contributor Author

    The main benefit I'd see to the last option is that it would also cover passing a "filter" option for tarfile.TarFile.add(). Dropping down to the lower level API for that isn't *hard*, it's just a bit fiddly (note: currently untested example code):

       sdist = tarfile.open(sdist_path, "w:gz", format=tarfile.PAX_FORMAT)
       sdist.add(os.getcwd(), arcname=sdist_subdir, filter=_exclude_hidden_and_special_files)

    @CAM-Gerlach
    Copy link
    Member

    FYI, GH-12355 will implement pax as default, as discussed in bpo-36268, which should be equivalent to option 1 here, thus also resolving this issue. Could you confirm that this is the case, and do you have any other comments on the change? Thanks!

    @CAM-Gerlach CAM-Gerlach added stdlib Python modules in the Lib dir 3.8 only security fixes and removed 3.7 (EOL) end of life labels Mar 15, 2019
    @ncoghlan
    Copy link
    Contributor Author

    Aye, I agree that changing the default resolves the feature request here. I've recategorised this as a documentation issue, as the initial PR only changed the tarfile documentation, so the impact on shutil isn't obvious.

    So the changes needed will be:

    • add a "What's New" entry for shutil, noting that shtuil.make_archive inherited the change in default archive format from tarfile
    • corresponding "version changed" note in the shutil.make_archive documentation

    An addition to the "Porting" section in What's New may also be needed, depending on how tarfile.Tarfile behaves if you tell it to open a PAX_FORMAT archive using GNU_FORMAT or vice-versa (tarfile.open and shutil.unpack_archive will be fine, since they query the file's own metadata to find out which format to use)

    @ncoghlan ncoghlan added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir labels Mar 30, 2019
    @CAM-Gerlach
    Copy link
    Member

    I opened a PR to implement both those changes, and also added some minor related clarifications and fixes to the format section of the tarfile docs.

    how tarfile.Tarfile behaves if you tell it to open a PAX_FORMAT archive using GNU_FORMAT or vice-versa

    I tested tarfile.Tarfile() and extract_all() on the resulting object with several different simple- to moderately-complex (including Unicode filenames) real-world pax- and GNU-format archives packed with different archivers, with both format=GNU_FORMAT and format=PAX_FORMAT for each one, got no warnings or errors with debug=3 and errorlevel=2, and extraction was successful and yielded identical results for either format argument, and did not get a PAXHEADERS file output for either one. Furthermore, tracing the code, its not clear that Tarfile() (with 'r') and extract, etc. use the passed format.

    Even if so, in order to produce an error after this change but not before, all of the following would seem to have to be the case:

    • The tarfile being read would have to be in GNU format, i.e. from an external source or produced with an older version of Python
    • The tarfile would have to make use of specific extended/non-standard GNU tar features not tested above
    • The user would have to use Tarfile() to open the tarfile, rather than one of the other, more common/higher-level methods
    • The user's call to Tarfile() would have to have used DEFAULT_FORMAT rather than being explicitly specified. and implicitly relied DEFAULT_FORMAT == GNU_FORMAT

    Therefore, this seems like a very specific corner-case. However, if you think I should include it, I'll go ahead with it. Also, let me know if these doc changes should have a separate NEWS entry or the previous one adequately covers it.

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Apr 1, 2019

    tarfile does not use the format argument for reading, it will be detected. You can even mix different formats in one archive and tarfile will be fine with it.

    @CAM-Gerlach
    Copy link
    Member

    Thanks for the confirmation!

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Apr 7, 2019

    New changeset 89a8944 by Nick Coghlan (CAM Gerlach) in branch 'master':
    bpo-30661: Improve docs for tarfile pax change and effect on shutil (GH-12635)
    89a8944

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Apr 7, 2019

    Thanks for the technical clarification Lars, and for the docs update C.A.M.

    @ncoghlan ncoghlan closed this as completed Apr 7, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants