msg214662 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-03-24 01:32 |
The fact that tempfile.TemporaryFile() has a "name" integer attribute causes weird behavior when interacting with libraries that rely on this attribute being a valid string for file objects.
For instance, it led to this exception with the "tarfile" module, which I resolved by using a NamedTemporaryFile():
>>> tarfile.open(fileobj=tempfile.TemporaryFile(), mode='w')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/tarfile.py", line 1585, in open
return cls.taropen(name, mode, fileobj, **kwargs)
File "/usr/lib/python3.4/tarfile.py", line 1595, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/usr/lib/python3.4/tarfile.py", line 1431, in __init__
self.name = os.path.abspath(name) if name else None
File "/usr/lib/python3.4/posixpath.py", line 360, in abspath
if not isabs(path):
File "/usr/lib/python3.4/posixpath.py", line 64, in isabs
return s.startswith(sep)
AttributeError: 'int' object has no attribute 'startswith'
Which is caused by these lines in the "tarfile" module:
if name is None and hasattr(fileobj, "name"):
name = fileobj.name
If TemporaryFile() didn't have a name attribute, tarfile, which doesn't really need the file name, would simply have continued without errors.
I am not aware of any place where this "name" integer attribute is actually useful, and, as a matter of fact, it is not even documented: http://docs.python.org/3.4/library/tempfile.html#tempfile.TemporaryFile
|
msg214663 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-03-24 01:54 |
Alternatively, if the "name" attribute can't be removed, I propose the following diff for the tarfile module:
-if name is None and hasattr(fileobj, "name"):
+if name is None and hasattr(fileobj, "name") and isinstance(fileobj.name, str):
:-)
|
msg214670 - (view) |
Author: Eryk Sun (eryksun) * |
Date: 2014-03-24 06:15 |
This name attribute is documented here:
http://docs.python.org/3/library/io#io.FileIO.name
3.4 Source:
http://hg.python.org/cpython/file/04f714765c13/Modules/_io/fileio.c#l432
In PY2, os.fdopen sets the name to '<fdopen>'. See the related issue 13781. Here's the workaround in gzip.py:
filename = getattr(fileobj, 'name', '')
if not isinstance(filename, (str, bytes)):
filename = ''
|
msg214685 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-03-24 11:56 |
I attached a patch for tarfile with eryksun's suggestion.
|
msg214708 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-03-24 18:53 |
Here's the test case as requested by berkerpeksag in the patch review.
|
msg214754 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2014-03-24 22:57 |
Isn’t the bug here really that TemporaryFile has a name attribute that’s not a string? I don’t see how an integer name makes sense within the IO system.
|
msg214756 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-03-24 23:08 |
Yes, the bug report was originally titled like this ("TemporaryFile should'nt have a name attribute") but apparently this is a common and expected behavior for FileIO subclasses.
|
msg214757 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2014-03-24 23:08 |
See the documentation link in msg214670. This isn't a characteristic of TemporaryFile, it's a characteristic of the Python IO system. So you'd have to argue that the documented behavior of the io system is a bug.
|
msg214760 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-03-24 23:21 |
Well, this behavior seems pretty inconsistent to me, it means that every library has to check if the name attribute is actually a string (which would correspond to a path) or an int (that would mean a fd), and I think a "fd" attribute would seem more consistent.
But as I don't think we could change this behavior that easily (it means breaking retrocompatibility after all...), we must at least make tarlib deal with that.
|
msg215216 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-03-31 00:40 |
Ping. The patch is just one line and there's a test case, if someone could review that, it would be great!
(For the record I signed the contributor agreement a week ago and my profile still hasn't been updated).
|
msg215230 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-03-31 11:48 |
Does tarfile work with bytes file names?
I think a test should explicitly test file objects with str, bytes and int name and file objects without name in all write modes ("w", "w:gz", "w|gz", etc).
|
msg215626 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-04-05 20:26 |
Well, that seems complicated: you can't overwrite a io.FileIO().name attribute, and doing so would be nonsensical for tarfile, which would try to perform IO operations on a random file descriptor... Also, I can't think of any case where a .name attribute could actually be bytes (I was just mirroring the code in msg214670).
Here's a patch that tries all combinations of encoding for writing, but I can't see a way to enforce manually the name attribute being an int, even for this test purposes.
|
msg215627 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-04-05 20:43 |
Actually, thinking about it, it seems safer to use os.open() + os.fdopen() than TemporaryFile(), like in the equivalent test for 'gzip'.
This new patch replaces last one.
|
msg215632 - (view) |
Author: Eryk Sun (eryksun) * |
Date: 2014-04-05 21:01 |
> you can't overwrite a io.FileIO().name attribute
A FileIO instance uses a dict for 'name' (msg214670):
>>> vars(sys.stdin.buffer.raw)
{'name': '<stdin>'}
>>> f = tempfile.TemporaryFile()
>>> vars(f.raw)
{'name': 3}
The name is optional meta-information. If it gets deleted, the repr falls back on using the file descriptor:
>>> f.raw
<_io.FileIO name=3 mode='rb+'>
>>> del f.raw.name
>>> f.raw
<_io.FileIO fd=3 mode='rb+'>
|
msg216984 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2014-04-22 04:34 |
I ran into a related issue with the gettarinfo() method. Would that fall under the scope of this bug, or should I raise a separate one?
>>> with tarfile.open("/dev/null", "w") as tar:
... with open(b"/bin/sh", "rb") as file:
... tar.gettarinfo(fileobj=file)
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/usr/lib/python3.4/tarfile.py", line 1768, in gettarinfo
arcname = arcname.replace(os.sep, "/")
TypeError: expected bytes, bytearray or buffer compatible object
I realise that making TarInfo object with a byte string or integer as a file name is not a good idea. Perhaps the documentation should explicitly say that “fileobj.name” must be a real unencoded file name string unless “arcname” is also given. In my particular case I added arcname="", because my code generates the proper file name later on.
|
msg223275 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2014-07-16 21:06 |
New changeset 308f3c1e36d3 by Serhiy Storchaka in branch '2.7':
Issue 21044: tarfile.open() now handles fileobj with an integer 'name'
http://hg.python.org/cpython/rev/308f3c1e36d3
New changeset d6b71971b228 by Serhiy Storchaka in branch '3.4':
Issue 21044: tarfile.open() now handles fileobj with an integer 'name'
http://hg.python.org/cpython/rev/d6b71971b228
New changeset 4c2f3240ad65 by Serhiy Storchaka in branch 'default':
Issue 21044: tarfile.open() now handles fileobj with an integer 'name'
http://hg.python.org/cpython/rev/4c2f3240ad65
|
msg223280 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-07-16 21:18 |
Committed with rewritten tests. Thank you for your contribution Martin.
> I ran into a related issue with the gettarinfo() method. Would that fall under the scope of this bug, or should I raise a separate one?
Yes, this looks as a separate bug.
|
msg223319 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2014-07-17 07:56 |
Opened Issue 21996 for the “gettarinfo” method.
Also, Serhiy, I think you may have got me mixed up with someone else. I don’t think I did any patches here, so I probably shouldn’t be credited for them :)
|
msg223335 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-07-17 14:17 |
> Also, Serhiy, I think you may have got me mixed up with someone else. I don’t think I did any patches here, so I probably shouldn’t be credited for them :)
Yeah, but I don't mind if I'm not in the ACKS file for a one-line patch though :P
|
msg223338 - (view) |
Author: Zachary Ware (zach.ware) * |
Date: 2014-07-17 14:32 |
This change appears to have broken 2.7 on Windows:
http://buildbot.python.org/all/builders/x86%20Windows7%202.7/builds/2707/steps/test/logs/stdio
|
msg223345 - (view) |
Author: Antoine Pietri (antoine.pietri) * |
Date: 2014-07-17 16:16 |
This change does not need to be merged on 2.7 anyway, as the os.fdopen sets the name attribute to '<fdopen>' and not to the fd, this check is not required prior to python 3.
Still, it would be interesting to investigate why this breaks 2.7 though.
|
msg223643 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2014-07-22 08:05 |
New changeset 51699f5f5430 by Serhiy Storchaka in branch '2.7':
Backout 308f3c1e36d3. This change (issue21044) does not need to be merged on
http://hg.python.org/cpython/rev/51699f5f5430
|
msg223645 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-07-22 08:33 |
Thanks Zachary for pointing to buildbot failure.
> Also, Serhiy, I think you may have got me mixed up with someone else.
Indeed, I either missed you with Antoine Pietri or missed this issue with issue19524. In any case thanks you for your activity on the tracker. And thanks Antoine Pietri for his contribution on this issue.
> Still, it would be interesting to investigate why this breaks 2.7 though.
Left open until will investigated this. Will open a new issue if there is a bug here.
|
msg223742 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2014-07-23 15:42 |
New changeset 825137d0d4ca by Serhiy Storchaka in branch '3.4':
Correct issue #21044 patch author.
http://hg.python.org/cpython/rev/825137d0d4ca
New changeset 4fe27263f9d4 by Serhiy Storchaka in branch 'default':
Correct issue #21044 patch author.
http://hg.python.org/cpython/rev/4fe27263f9d4
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:00 | admin | set | github: 65243 |
2017-06-01 07:22:24 | antoine.pietri | set | status: open -> closed stage: test needed -> resolved |
2014-09-27 21:48:50 | r.david.murray | set | stage: commit review -> test needed versions:
+ Python 2.7, - Python 3.4, Python 3.5 |
2014-07-23 15:42:11 | python-dev | set | messages:
+ msg223742 |
2014-07-22 08:33:12 | serhiy.storchaka | set | resolution: remind messages:
+ msg223645 |
2014-07-22 08:05:54 | python-dev | set | messages:
+ msg223643 |
2014-07-17 16:16:53 | antoine.pietri | set | messages:
+ msg223345 versions:
- Python 2.7 |
2014-07-17 14:32:01 | zach.ware | set | status: closed -> open
nosy:
+ zach.ware messages:
+ msg223338
resolution: fixed -> (no value) stage: resolved -> commit review |
2014-07-17 14:17:18 | antoine.pietri | set | messages:
+ msg223335 |
2014-07-17 07:56:51 | martin.panter | set | messages:
+ msg223319 |
2014-07-16 21:18:49 | serhiy.storchaka | set | status: open -> closed versions:
+ Python 2.7 messages:
+ msg223280
resolution: fixed stage: test needed -> resolved |
2014-07-16 21:06:59 | python-dev | set | nosy:
+ python-dev messages:
+ msg223275
|
2014-07-16 19:38:04 | serhiy.storchaka | set | assignee: serhiy.storchaka |
2014-07-10 17:43:34 | barry | set | nosy:
+ barry
|
2014-04-22 04:34:31 | martin.panter | set | messages:
+ msg216984 |
2014-04-05 21:01:20 | eryksun | set | messages:
+ msg215632 |
2014-04-05 20:49:54 | ezio.melotti | set | files:
+ test_tarfile_fdopen.diff |
2014-04-05 20:47:41 | ezio.melotti | set | messages:
+ msg215627 |
2014-04-05 20:26:05 | antoine.pietri | set | files:
+ test_tarfile_all_modes.diff
messages:
+ msg215626 |
2014-03-31 11:49:26 | serhiy.storchaka | set | stage: patch review -> test needed |
2014-03-31 11:48:53 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg215230
|
2014-03-31 00:59:13 | ned.deily | set | nosy:
+ lars.gustaebel
|
2014-03-31 00:40:05 | antoine.pietri | set | messages:
+ msg215216 |
2014-03-24 23:21:21 | antoine.pietri | set | messages:
+ msg214760 |
2014-03-24 23:08:22 | r.david.murray | set | messages:
+ msg214757 |
2014-03-24 23:08:09 | antoine.pietri | set | messages:
+ msg214756 |
2014-03-24 22:57:54 | eric.araujo | set | nosy:
+ eric.araujo messages:
+ msg214754
|
2014-03-24 18:53:45 | antoine.pietri | set | files:
+ test_tarfile_fobj_int.diff
messages:
+ msg214708 |
2014-03-24 13:10:10 | r.david.murray | set | title: tarfile does not handle file __name__ being an int -> tarfile does not handle file .name being an int |
2014-03-24 13:09:48 | r.david.murray | set | nosy:
+ r.david.murray title: tempfile.TemporaryFile() shouldn't have a name attribute -> tarfile does not handle file __name__ being an int
stage: patch review versions:
+ Python 3.5, - Python 3.3 |
2014-03-24 11:56:53 | antoine.pietri | set | files:
+ tarfile-fileobjname.diff keywords:
+ patch messages:
+ msg214685
|
2014-03-24 06:24:16 | martin.panter | set | nosy:
+ martin.panter
|
2014-03-24 06:15:32 | eryksun | set | nosy:
+ eryksun messages:
+ msg214670
|
2014-03-24 01:54:40 | antoine.pietri | set | messages:
+ msg214663 |
2014-03-24 01:32:57 | antoine.pietri | create | |