classification
Title: gettarinfo method does not handle files without text string names
Type: Stage: resolved
Components: Documentation, Library (Lib) Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: 22468 Superseder:
Assigned To: docs@python Nosy List: docs@python, martin.panter, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2014-07-17 07:52 by martin.panter, last changed 2016-02-20 00:27 by martin.panter. This issue is now closed.

Messages (4)
msg223318 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-07-17 07:52
It looks like if you pass a “fileobj” argument to “gettarinfo”, it assumes it can use the “name” as a text string.

>>> import tarfile
>>> with tarfile.open("/dev/null", "w") as tar, open("/bin/sh", "rb") as file: tar.gettarinfo(fileobj=file)
... 
<TarInfo 'bin/sh' at 0x7f13cc937f20>
>>> with tarfile.open("/dev/null", "w") as tar, open(b"/bin/sh", "rb") as file: tar.gettarinfo(fileobj=file)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/disk/home/proj/python/cpython/Lib/tarfile.py", line 1767, in gettarinfo
    arcname = arcname.replace(os.sep, "/")
TypeError: expected bytes, bytearray or buffer compatible object
>>> with tarfile.open("/dev/null", "w") as tar, open(0, "rb", closefd=False) as file: tar.gettarinfo(fileobj=file)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/disk/home/proj/python/cpython/Lib/tarfile.py", line 1766, in gettarinfo
    drv, arcname = os.path.splitdrive(arcname)
  File "Lib/posixpath.py", line 133, in splitdrive
    return p[:0], p
TypeError: 'int' object is not subscriptable

In my case, my code always sets the final TarInfo.name attribute later on, so the initial name does not matter. Perhaps at least the documentation should say that “fileobj.name” must be a real unencoded file name string unless “arcname” is also given. My workaround was to add a dummy arcname argument, a bit like this:

# Explicit dummy name to avoid using file name of bytes
tarinfo = self.tar.gettarinfo(fileobj=file, arcname="")
# . . .
tarinfo.name = "{}/{}".format(self.pkgname, name)
msg223479 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-19 21:06
Agreed, the documentation should be modified to say "(using os.fstat on its file descriptor, and its 'name' attribute if arcname is not specified").
msg241584 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-04-20 01:28
Over in Issue 22468, I posted a documentation patch which includes wording to address this bug.
msg260538 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-02-20 00:18
New changeset 94a94deaf06a by Martin Panter in branch '3.5':
Issues #22468, #21996, #22208: Clarify gettarinfo() and TarInfo usage
https://hg.python.org/cpython/rev/94a94deaf06a

New changeset 9d5217aaea13 by Martin Panter in branch '2.7':
Issues #22468, #21996, #22208: Clarify gettarinfo() and TarInfo usage
https://hg.python.org/cpython/rev/9d5217aaea13
History
Date User Action Args
2016-02-20 00:27:16martin.pantersetstatus: open -> closed
stage: needs patch -> resolved
resolution: fixed
versions: + Python 3.6, - Python 3.4
2016-02-20 00:18:58python-devsetnosy: + python-dev
messages: + msg260538
2016-02-09 23:04:35martin.pantersetdependencies: + Tarfile using fstat on GZip file object
2015-04-20 01:28:31martin.pantersetmessages: + msg241584
2014-07-19 21:06:45r.david.murraysetassignee: docs@python
components: + Documentation
versions: + Python 2.7
nosy: + r.david.murray, docs@python

messages: + msg223479
stage: needs patch
2014-07-17 16:26:26berker.peksagsetnosy: + serhiy.storchaka

versions: + Python 3.5
2014-07-17 07:52:08martin.pantercreate