classification
Title: tarfile does not handle file .name being an int
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: remind
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: antoine.pietri, barry, eric.araujo, eryksun, lars.gustaebel, martin.panter, python-dev, r.david.murray, serhiy.storchaka, zach.ware
Priority: normal Keywords: patch

Created on 2014-03-24 01:32 by antoine.pietri, last changed 2017-06-01 07:22 by antoine.pietri. This issue is now closed.

Files
File name Uploaded Description Edit
tarfile-fileobjname.diff antoine.pietri, 2014-03-24 11:56 review
test_tarfile_fobj_int.diff antoine.pietri, 2014-03-24 18:53 Test case for Lib/test/tarfile.py
test_tarfile_all_modes.diff antoine.pietri, 2014-04-05 20:26
test_tarfile_fdopen.diff antoine.pietri, 2014-04-05 20:43
Messages (24)
msg214662 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-03-24 01:32
The fact that tempfile.TemporaryFile() has a "name" integer attribute  causes weird behavior when interacting with libraries that rely on this attribute being a valid string for file objects.
For instance, it led to this exception with the "tarfile" module, which I resolved by using a NamedTemporaryFile():

>>> tarfile.open(fileobj=tempfile.TemporaryFile(), mode='w')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/tarfile.py", line 1585, in open
    return cls.taropen(name, mode, fileobj, **kwargs)
  File "/usr/lib/python3.4/tarfile.py", line 1595, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/usr/lib/python3.4/tarfile.py", line 1431, in __init__
    self.name = os.path.abspath(name) if name else None
  File "/usr/lib/python3.4/posixpath.py", line 360, in abspath
    if not isabs(path):
  File "/usr/lib/python3.4/posixpath.py", line 64, in isabs
    return s.startswith(sep)
AttributeError: 'int' object has no attribute 'startswith'

Which is caused by these lines in the "tarfile" module:

if name is None and hasattr(fileobj, "name"):
    name = fileobj.name

If TemporaryFile() didn't have a name attribute, tarfile, which doesn't really need the file name, would simply have continued without errors.

I am not aware of any place where this "name" integer attribute is actually useful, and, as a matter of fact, it is not even documented: http://docs.python.org/3.4/library/tempfile.html#tempfile.TemporaryFile
msg214663 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-03-24 01:54
Alternatively, if the "name" attribute can't be removed, I propose the following diff for the tarfile module:

-if name is None and hasattr(fileobj, "name"):
+if name is None and hasattr(fileobj, "name") and isinstance(fileobj.name, str):

:-)
msg214670 - (view) Author: Eryk Sun (eryksun) * Date: 2014-03-24 06:15
This name attribute is documented here:

http://docs.python.org/3/library/io#io.FileIO.name

3.4 Source:

http://hg.python.org/cpython/file/04f714765c13/Modules/_io/fileio.c#l432

In PY2, os.fdopen sets the name to '<fdopen>'. See the related issue 13781. Here's the workaround in gzip.py:

    filename = getattr(fileobj, 'name', '')
    if not isinstance(filename, (str, bytes)):
        filename = ''
msg214685 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-03-24 11:56
I attached a patch for tarfile with eryksun's suggestion.
msg214708 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-03-24 18:53
Here's the test case as requested by berkerpeksag in the patch review.
msg214754 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-03-24 22:57
Isn’t the bug here really that TemporaryFile has a name attribute that’s not a string?  I don’t see how an integer name makes sense within the IO system.
msg214756 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-03-24 23:08
Yes, the bug report was originally titled like this ("TemporaryFile should'nt have a name attribute") but apparently this is a common and expected behavior for FileIO subclasses.
msg214757 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-24 23:08
See the documentation link in msg214670.  This isn't a characteristic of TemporaryFile, it's a characteristic of the Python IO system.  So you'd have to argue that the documented behavior of the io system is a bug.
msg214760 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-03-24 23:21
Well, this behavior seems pretty inconsistent to me, it means that every library has to check if the name attribute  is actually a string (which would correspond to a path) or an int (that would mean a fd), and I think a "fd" attribute would seem more consistent.
But as I don't think we could change this behavior that easily (it means breaking retrocompatibility after all...), we must at least make tarlib deal with that.
msg215216 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-03-31 00:40
Ping. The patch is just one line and there's a test case, if someone could review that, it would be great!

(For the record I signed the contributor agreement a week ago and my profile still hasn't been updated).
msg215230 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-31 11:48
Does tarfile work with bytes file names?

I think a test should explicitly test file objects with str, bytes and int name and file objects without name in all write modes ("w", "w:gz", "w|gz", etc).
msg215626 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-04-05 20:26
Well, that seems complicated: you can't overwrite a io.FileIO().name attribute, and doing so would be nonsensical for tarfile, which would try to perform IO operations on a random file descriptor... Also, I can't think of any case where a .name attribute could actually be bytes (I was just mirroring the code in msg214670).

Here's a patch that tries all combinations of encoding for writing, but I can't see a way to enforce manually the name attribute being an int, even for this test purposes.
msg215627 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-04-05 20:43
Actually, thinking about it, it seems safer to use os.open() + os.fdopen() than TemporaryFile(), like in the equivalent test for 'gzip'.

This new patch replaces last one.
msg215632 - (view) Author: Eryk Sun (eryksun) * Date: 2014-04-05 21:01
> you can't overwrite a io.FileIO().name attribute

A FileIO instance uses a dict for 'name' (msg214670): 

    >>> vars(sys.stdin.buffer.raw)
    {'name': '<stdin>'}

    >>> f = tempfile.TemporaryFile()
    >>> vars(f.raw)
    {'name': 3}

The name is optional meta-information. If it gets deleted, the repr falls back on using the file descriptor:

    >>> f.raw
    <_io.FileIO name=3 mode='rb+'>

    >>> del f.raw.name
    >>> f.raw
    <_io.FileIO fd=3 mode='rb+'>
msg216984 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-04-22 04:34
I ran into a related issue with the gettarinfo() method. Would that fall under the scope of this bug, or should I raise a separate one?

>>> with tarfile.open("/dev/null", "w") as tar:
...     with open(b"/bin/sh", "rb") as file:
...         tar.gettarinfo(fileobj=file)
... 
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/usr/lib/python3.4/tarfile.py", line 1768, in gettarinfo
    arcname = arcname.replace(os.sep, "/")
TypeError: expected bytes, bytearray or buffer compatible object

I realise that making TarInfo object with a byte string or integer as a file name is not a good idea. Perhaps the documentation should explicitly say that “fileobj.name” must be a real unencoded file name string unless “arcname” is also given. In my particular case I added arcname="", because my code generates the proper file name later on.
msg223275 - (view) Author: Roundup Robot (python-dev) Date: 2014-07-16 21:06
New changeset 308f3c1e36d3 by Serhiy Storchaka in branch '2.7':
Issue 21044: tarfile.open() now handles fileobj with an integer 'name'
http://hg.python.org/cpython/rev/308f3c1e36d3

New changeset d6b71971b228 by Serhiy Storchaka in branch '3.4':
Issue 21044: tarfile.open() now handles fileobj with an integer 'name'
http://hg.python.org/cpython/rev/d6b71971b228

New changeset 4c2f3240ad65 by Serhiy Storchaka in branch 'default':
Issue 21044: tarfile.open() now handles fileobj with an integer 'name'
http://hg.python.org/cpython/rev/4c2f3240ad65
msg223280 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-16 21:18
Committed with rewritten tests. Thank you for your contribution Martin.

> I ran into a related issue with the gettarinfo() method. Would that fall under the scope of this bug, or should I raise a separate one?

Yes, this looks as a separate bug.
msg223319 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-07-17 07:56
Opened Issue 21996 for the “gettarinfo” method.

Also, Serhiy, I think you may have got me mixed up with someone else. I don’t think I did any patches here, so I probably shouldn’t be credited for them :)
msg223335 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-07-17 14:17
> Also, Serhiy, I think you may have got me mixed up with someone else. I don’t think I did any patches here, so I probably shouldn’t be credited for them :)

Yeah, but I don't mind if I'm not in the ACKS file for a one-line patch though :P
msg223338 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-07-17 14:32
This change appears to have broken 2.7 on Windows:

http://buildbot.python.org/all/builders/x86%20Windows7%202.7/builds/2707/steps/test/logs/stdio
msg223345 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2014-07-17 16:16
This change does not need to be merged on 2.7 anyway, as the os.fdopen sets the name attribute to '<fdopen>' and not to the fd, this check is not required prior to python 3.

Still, it would be interesting to investigate why this breaks 2.7 though.
msg223643 - (view) Author: Roundup Robot (python-dev) Date: 2014-07-22 08:05
New changeset 51699f5f5430 by Serhiy Storchaka in branch '2.7':
Backout 308f3c1e36d3.  This change (issue21044) does not need to be merged on
http://hg.python.org/cpython/rev/51699f5f5430
msg223645 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-22 08:33
Thanks Zachary for pointing to buildbot failure.

> Also, Serhiy, I think you may have got me mixed up with someone else.

Indeed, I either missed you with Antoine Pietri or missed this issue with issue19524. In any case thanks you for your activity on the tracker. And thanks Antoine Pietri for his contribution on this issue.

> Still, it would be interesting to investigate why this breaks 2.7 though.

Left open until will investigated this. Will open a new issue if there is a bug here.
msg223742 - (view) Author: Roundup Robot (python-dev) Date: 2014-07-23 15:42
New changeset 825137d0d4ca by Serhiy Storchaka in branch '3.4':
Correct issue #21044 patch author.
http://hg.python.org/cpython/rev/825137d0d4ca

New changeset 4fe27263f9d4 by Serhiy Storchaka in branch 'default':
Correct issue #21044 patch author.
http://hg.python.org/cpython/rev/4fe27263f9d4
History
Date User Action Args
2017-06-01 07:22:24antoine.pietrisetstatus: open -> closed
stage: test needed -> resolved
2014-09-27 21:48:50r.david.murraysetstage: commit review -> test needed
versions: + Python 2.7, - Python 3.4, Python 3.5
2014-07-23 15:42:11python-devsetmessages: + msg223742
2014-07-22 08:33:12serhiy.storchakasetresolution: remind
messages: + msg223645
2014-07-22 08:05:54python-devsetmessages: + msg223643
2014-07-17 16:16:53antoine.pietrisetmessages: + msg223345
versions: - Python 2.7
2014-07-17 14:32:01zach.waresetstatus: closed -> open

nosy: + zach.ware
messages: + msg223338

resolution: fixed -> (no value)
stage: resolved -> commit review
2014-07-17 14:17:18antoine.pietrisetmessages: + msg223335
2014-07-17 07:56:51martin.pantersetmessages: + msg223319
2014-07-16 21:18:49serhiy.storchakasetstatus: open -> closed
versions: + Python 2.7
messages: + msg223280

resolution: fixed
stage: test needed -> resolved
2014-07-16 21:06:59python-devsetnosy: + python-dev
messages: + msg223275
2014-07-16 19:38:04serhiy.storchakasetassignee: serhiy.storchaka
2014-07-10 17:43:34barrysetnosy: + barry
2014-04-22 04:34:31martin.pantersetmessages: + msg216984
2014-04-05 21:01:20eryksunsetmessages: + msg215632
2014-04-05 20:49:54ezio.melottisetfiles: + test_tarfile_fdopen.diff
2014-04-05 20:47:41ezio.melottisetmessages: + msg215627
2014-04-05 20:26:05antoine.pietrisetfiles: + test_tarfile_all_modes.diff

messages: + msg215626
2014-03-31 11:49:26serhiy.storchakasetstage: patch review -> test needed
2014-03-31 11:48:53serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg215230
2014-03-31 00:59:13ned.deilysetnosy: + lars.gustaebel
2014-03-31 00:40:05antoine.pietrisetmessages: + msg215216
2014-03-24 23:21:21antoine.pietrisetmessages: + msg214760
2014-03-24 23:08:22r.david.murraysetmessages: + msg214757
2014-03-24 23:08:09antoine.pietrisetmessages: + msg214756
2014-03-24 22:57:54eric.araujosetnosy: + eric.araujo
messages: + msg214754
2014-03-24 18:53:45antoine.pietrisetfiles: + test_tarfile_fobj_int.diff

messages: + msg214708
2014-03-24 13:10:10r.david.murraysettitle: tarfile does not handle file __name__ being an int -> tarfile does not handle file .name being an int
2014-03-24 13:09:48r.david.murraysetnosy: + r.david.murray
title: tempfile.TemporaryFile() shouldn't have a name attribute -> tarfile does not handle file __name__ being an int

stage: patch review
versions: + Python 3.5, - Python 3.3
2014-03-24 11:56:53antoine.pietrisetfiles: + tarfile-fileobjname.diff
keywords: + patch
messages: + msg214685
2014-03-24 06:24:16martin.pantersetnosy: + martin.panter
2014-03-24 06:15:32eryksunsetnosy: + eryksun
messages: + msg214670
2014-03-24 01:54:40antoine.pietrisetmessages: + msg214663
2014-03-24 01:32:57antoine.pietricreate