Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tarfile using fstat on GZip file object #66658

Closed
bartolsthoorn mannequin opened this issue Sep 23, 2014 · 6 comments
Closed

Tarfile using fstat on GZip file object #66658

bartolsthoorn mannequin opened this issue Sep 23, 2014 · 6 comments
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error

Comments

@bartolsthoorn
Copy link
Mannequin

bartolsthoorn mannequin commented Sep 23, 2014

BPO 22468
Nosy @gustaebel, @vadmium
Files
  • gettarinfo.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-02-20.00:26:53.311>
    created_at = <Date 2014-09-23.08:49:52.603>
    labels = ['type-bug', 'docs']
    title = 'Tarfile using fstat on GZip file object'
    updated_at = <Date 2016-02-20.00:26:53.308>
    user = 'https://bugs.python.org/bartolsthoorn'

    bugs.python.org fields:

    activity = <Date 2016-02-20.00:26:53.308>
    actor = 'martin.panter'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2016-02-20.00:26:53.311>
    closer = 'martin.panter'
    components = ['Documentation']
    creation = <Date 2014-09-23.08:49:52.603>
    creator = 'bartolsthoorn'
    dependencies = []
    files = ['39136']
    hgrepos = []
    issue_num = 22468
    keywords = ['patch']
    message_count = 6.0
    messages = ['227328', '238961', '238967', '241582', '260537', '260541']
    nosy_count = 6.0
    nosy_names = ['lars.gustaebel', 'docs@python', 'BreamoreBoy', 'python-dev', 'martin.panter', 'bartolsthoorn']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue22468'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6']

    @bartolsthoorn
    Copy link
    Mannequin Author

    bartolsthoorn mannequin commented Sep 23, 2014

    CPython tarfile gettarinfo method uses fstat to determine the size of a file (using its fileobject). When that file object is actually created with Gzip.open (so a GZipfile), it will get the compressed size of the file. The addfile method will then continue to read the uncompressed data of the gzipped file, but will read too few bytes, resulting in a tar of incomplete files.

    I suggest checking the file object class before using fstat to determine the size, and raise a warning if it's a gzip file.

    To clarify, this only happens when adding a GZip file object to tar. I know that it's not a really common scenario, and the problem is really that GZip file size can only properly be determined by uncompressing and reading it entirely, but I think it's nice to not fail without warning.

    So this is an example that is failing:

    import tarfile
    c = io.BytesIO()
    with tarfile.open(mode='w', fileobj=c) as tar:
      for textfile in ['1.txt.gz', '2.txt.gz']:
        with gzip.open(textfile) as f:
          tarinfo = tar.gettarinfo(fileobj=f)
          tar.addfile(tarinfo=tarinfo, fileobj=f)
      data = c.getvalue()
    return data
    

    Instead this reads the proper filesize and writes the files to a tar:

    import tarfile
    c = io.BytesIO()
    with tarfile.open(mode='w', fileobj=c) as tar:
      for textfile in ['1.txt.gz', '2.txt.gz']:
        with gzip.open(textfile) as f:
          buff = f.read()
          tarinfo = tarfile.TarInfo(name=f.name)
          tarinfo.size = len(buff)
          tar.addfile(tarinfo=tarinfo, fileobj=io.BytesIO(buff))
      data = c.getvalue()
    return data
    

    @bartolsthoorn bartolsthoorn mannequin added the type-bug An unexpected behavior, bug, or error label Sep 23, 2014
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Mar 23, 2015

    msg227328 states "it's not a really common scenario" but I believe we must still allow for it, what do others think?

    @vadmium
    Copy link
    Member

    vadmium commented Mar 23, 2015

    I think a warning in the documentation might be helpful.

    However a special check in the code doesn’t seem right. Would you check for LZMAFile and BZ2File as well? Some of the other attributes (modification time, owner, etc) may be useful even for a GzipFile, and the programmer can just overwrite the file size attribute if necessary.

    @vadmium
    Copy link
    Member

    vadmium commented Apr 20, 2015

    I am posting a documentation patch which I hope should clarify that objects like GzipFile won’t work automatically with gettarinfo(). It also has other modifications to address bpo-21996 (name must be text) and help with bpo-22208 (clarify non-OS files won’t work).

    @vadmium vadmium added the docs Documentation in the Doc dir label Apr 20, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 20, 2016

    New changeset 94a94deaf06a by Martin Panter in branch '3.5':
    Issues bpo-22468, bpo-21996, bpo-22208: Clarify gettarinfo() and TarInfo usage
    https://hg.python.org/cpython/rev/94a94deaf06a

    New changeset e66c476b25ec by Martin Panter in branch 'default':
    Issue bpo-22468: Merge gettarinfo() doc from 3.5
    https://hg.python.org/cpython/rev/e66c476b25ec

    New changeset 9d5217aaea13 by Martin Panter in branch '2.7':
    Issues bpo-22468, bpo-21996, bpo-22208: Clarify gettarinfo() and TarInfo usage
    https://hg.python.org/cpython/rev/9d5217aaea13

    @vadmium
    Copy link
    Member

    vadmium commented Feb 20, 2016

    Hoping my clarification in the documentation is enough to call this fixed

    @vadmium vadmium closed this as completed Feb 20, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant