Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gettarinfo method does not handle files without text string names #66195

Closed
vadmium opened this issue Jul 17, 2014 · 4 comments
Closed

gettarinfo method does not handle files without text string names #66195

vadmium opened this issue Jul 17, 2014 · 4 comments
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir

Comments

@vadmium
Copy link
Member

vadmium commented Jul 17, 2014

BPO 21996
Nosy @bitdancer, @vadmium, @serhiy-storchaka
Dependencies
  • bpo-22468: Tarfile using fstat on GZip file object
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-02-20.00:27:16.450>
    created_at = <Date 2014-07-17.07:52:08.286>
    labels = ['library', 'docs']
    title = 'gettarinfo method does not handle files without text string names'
    updated_at = <Date 2016-02-20.00:27:16.448>
    user = 'https://github.com/vadmium'

    bugs.python.org fields:

    activity = <Date 2016-02-20.00:27:16.448>
    actor = 'martin.panter'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2016-02-20.00:27:16.450>
    closer = 'martin.panter'
    components = ['Documentation', 'Library (Lib)']
    creation = <Date 2014-07-17.07:52:08.286>
    creator = 'martin.panter'
    dependencies = ['22468']
    files = []
    hgrepos = []
    issue_num = 21996
    keywords = []
    message_count = 4.0
    messages = ['223318', '223479', '241584', '260538']
    nosy_count = 5.0
    nosy_names = ['r.david.murray', 'docs@python', 'python-dev', 'martin.panter', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue21996'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6']

    @vadmium
    Copy link
    Member Author

    vadmium commented Jul 17, 2014

    It looks like if you pass a “fileobj” argument to “gettarinfo”, it assumes it can use the “name” as a text string.

    >>> import tarfile
    >>> with tarfile.open("/dev/null", "w") as tar, open("/bin/sh", "rb") as file: tar.gettarinfo(fileobj=file)
    ... 
    <TarInfo 'bin/sh' at 0x7f13cc937f20>
    >>> with tarfile.open("/dev/null", "w") as tar, open(b"/bin/sh", "rb") as file: tar.gettarinfo(fileobj=file)
    ... 
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/media/disk/home/proj/python/cpython/Lib/tarfile.py", line 1767, in gettarinfo
        arcname = arcname.replace(os.sep, "/")
    TypeError: expected bytes, bytearray or buffer compatible object
    >>> with tarfile.open("/dev/null", "w") as tar, open(0, "rb", closefd=False) as file: tar.gettarinfo(fileobj=file)
    ... 
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/media/disk/home/proj/python/cpython/Lib/tarfile.py", line 1766, in gettarinfo
        drv, arcname = os.path.splitdrive(arcname)
      File "Lib/posixpath.py", line 133, in splitdrive
        return p[:0], p
    TypeError: 'int' object is not subscriptable

    In my case, my code always sets the final TarInfo.name attribute later on, so the initial name does not matter. Perhaps at least the documentation should say that “fileobj.name” must be a real unencoded file name string unless “arcname” is also given. My workaround was to add a dummy arcname argument, a bit like this:

    # Explicit dummy name to avoid using file name of bytes
    tarinfo = self.tar.gettarinfo(fileobj=file, arcname="")
    # . . .
    tarinfo.name = "{}/{}".format(self.pkgname, name)

    @vadmium vadmium added the stdlib Python modules in the Lib dir label Jul 17, 2014
    @bitdancer
    Copy link
    Member

    Agreed, the documentation should be modified to say "(using os.fstat on its file descriptor, and its 'name' attribute if arcname is not specified").

    @bitdancer bitdancer added the docs Documentation in the Doc dir label Jul 19, 2014
    @vadmium
    Copy link
    Member Author

    vadmium commented Apr 20, 2015

    Over in bpo-22468, I posted a documentation patch which includes wording to address this bug.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 20, 2016

    New changeset 94a94deaf06a by Martin Panter in branch '3.5':
    Issues bpo-22468, bpo-21996, bpo-22208: Clarify gettarinfo() and TarInfo usage
    https://hg.python.org/cpython/rev/94a94deaf06a

    New changeset 9d5217aaea13 by Martin Panter in branch '2.7':
    Issues bpo-22468, bpo-21996, bpo-22208: Clarify gettarinfo() and TarInfo usage
    https://hg.python.org/cpython/rev/9d5217aaea13

    @vadmium vadmium closed this as completed Feb 20, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants