New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarfile does not handle file .name being an int #65243
Comments
The fact that tempfile.TemporaryFile() has a "name" integer attribute causes weird behavior when interacting with libraries that rely on this attribute being a valid string for file objects. >>> tarfile.open(fileobj=tempfile.TemporaryFile(), mode='w')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/tarfile.py", line 1585, in open
return cls.taropen(name, mode, fileobj, **kwargs)
File "/usr/lib/python3.4/tarfile.py", line 1595, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/usr/lib/python3.4/tarfile.py", line 1431, in __init__
self.name = os.path.abspath(name) if name else None
File "/usr/lib/python3.4/posixpath.py", line 360, in abspath
if not isabs(path):
File "/usr/lib/python3.4/posixpath.py", line 64, in isabs
return s.startswith(sep)
AttributeError: 'int' object has no attribute 'startswith' Which is caused by these lines in the "tarfile" module: if name is None and hasattr(fileobj, "name"):
name = fileobj.name If TemporaryFile() didn't have a name attribute, tarfile, which doesn't really need the file name, would simply have continued without errors. I am not aware of any place where this "name" integer attribute is actually useful, and, as a matter of fact, it is not even documented: http://docs.python.org/3.4/library/tempfile.html#tempfile.TemporaryFile |
Alternatively, if the "name" attribute can't be removed, I propose the following diff for the tarfile module: -if name is None and hasattr(fileobj, "name"): :-) |
This name attribute is documented here: http://docs.python.org/3/library/io#io.FileIO.name 3.4 Source: http://hg.python.org/cpython/file/04f714765c13/Modules/_io/fileio.c#l432 In PY2, os.fdopen sets the name to '<fdopen>'. See the related bpo-13781. Here's the workaround in gzip.py: filename = getattr(fileobj, 'name', '')
if not isinstance(filename, (str, bytes)):
filename = '' |
I attached a patch for tarfile with eryksun's suggestion. |
Here's the test case as requested by berkerpeksag in the patch review. |
Isn’t the bug here really that TemporaryFile has a name attribute that’s not a string? I don’t see how an integer name makes sense within the IO system. |
Yes, the bug report was originally titled like this ("TemporaryFile should'nt have a name attribute") but apparently this is a common and expected behavior for FileIO subclasses. |
See the documentation link in msg214670. This isn't a characteristic of TemporaryFile, it's a characteristic of the Python IO system. So you'd have to argue that the documented behavior of the io system is a bug. |
Well, this behavior seems pretty inconsistent to me, it means that every library has to check if the name attribute is actually a string (which would correspond to a path) or an int (that would mean a fd), and I think a "fd" attribute would seem more consistent. |
Ping. The patch is just one line and there's a test case, if someone could review that, it would be great! (For the record I signed the contributor agreement a week ago and my profile still hasn't been updated). |
Does tarfile work with bytes file names? I think a test should explicitly test file objects with str, bytes and int name and file objects without name in all write modes ("w", "w:gz", "w|gz", etc). |
Well, that seems complicated: you can't overwrite a io.FileIO().name attribute, and doing so would be nonsensical for tarfile, which would try to perform IO operations on a random file descriptor... Also, I can't think of any case where a .name attribute could actually be bytes (I was just mirroring the code in msg214670). Here's a patch that tries all combinations of encoding for writing, but I can't see a way to enforce manually the name attribute being an int, even for this test purposes. |
Actually, thinking about it, it seems safer to use os.open() + os.fdopen() than TemporaryFile(), like in the equivalent test for 'gzip'. This new patch replaces last one. |
A FileIO instance uses a dict for 'name' (msg214670): >>> vars(sys.stdin.buffer.raw)
{'name': '<stdin>'}
>>> f = tempfile.TemporaryFile()
>>> vars(f.raw)
{'name': 3} The name is optional meta-information. If it gets deleted, the repr falls back on using the file descriptor: >>> f.raw
<_io.FileIO name=3 mode='rb+'>
>>> del f.raw.name
>>> f.raw
<_io.FileIO fd=3 mode='rb+'> |
I ran into a related issue with the gettarinfo() method. Would that fall under the scope of this bug, or should I raise a separate one? >>> with tarfile.open("/dev/null", "w") as tar:
... with open(b"/bin/sh", "rb") as file:
... tar.gettarinfo(fileobj=file)
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/usr/lib/python3.4/tarfile.py", line 1768, in gettarinfo
arcname = arcname.replace(os.sep, "/")
TypeError: expected bytes, bytearray or buffer compatible object I realise that making TarInfo object with a byte string or integer as a file name is not a good idea. Perhaps the documentation should explicitly say that “fileobj.name” must be a real unencoded file name string unless “arcname” is also given. In my particular case I added arcname="", because my code generates the proper file name later on. |
New changeset 308f3c1e36d3 by Serhiy Storchaka in branch '2.7': New changeset d6b71971b228 by Serhiy Storchaka in branch '3.4': New changeset 4c2f3240ad65 by Serhiy Storchaka in branch 'default': |
Committed with rewritten tests. Thank you for your contribution Martin.
Yes, this looks as a separate bug. |
Opened bpo-21996 for the “gettarinfo” method. Also, Serhiy, I think you may have got me mixed up with someone else. I don’t think I did any patches here, so I probably shouldn’t be credited for them :) |
Yeah, but I don't mind if I'm not in the ACKS file for a one-line patch though :P |
This change appears to have broken 2.7 on Windows: http://buildbot.python.org/all/builders/x86%20Windows7%202.7/builds/2707/steps/test/logs/stdio |
This change does not need to be merged on 2.7 anyway, as the os.fdopen sets the name attribute to '<fdopen>' and not to the fd, this check is not required prior to python 3. Still, it would be interesting to investigate why this breaks 2.7 though. |
New changeset 51699f5f5430 by Serhiy Storchaka in branch '2.7': |
Thanks Zachary for pointing to buildbot failure.
Indeed, I either missed you with Antoine Pietri or missed this issue with bpo-19524. In any case thanks you for your activity on the tracker. And thanks Antoine Pietri for his contribution on this issue.
Left open until will investigated this. Will open a new issue if there is a bug here. |
New changeset 825137d0d4ca by Serhiy Storchaka in branch '3.4': New changeset 4fe27263f9d4 by Serhiy Storchaka in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: