Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfile does not handle file .name being an int #65243

Closed
seirl mannequin opened this issue Mar 24, 2014 · 24 comments
Closed

tarfile does not handle file .name being an int #65243

seirl mannequin opened this issue Mar 24, 2014 · 24 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@seirl
Copy link
Mannequin

seirl mannequin commented Mar 24, 2014

BPO 21044
Nosy @warsaw, @gustaebel, @merwok, @bitdancer, @vadmium, @zware, @serhiy-storchaka, @eryksun, @seirl
Files
  • tarfile-fileobjname.diff
  • test_tarfile_fobj_int.diff: Test case for Lib/test/tarfile.py
  • test_tarfile_all_modes.diff
  • test_tarfile_fdopen.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2017-06-01.07:22:24.510>
    created_at = <Date 2014-03-24.01:32:57.313>
    labels = ['type-bug', 'library']
    title = 'tarfile does not handle file .name being an int'
    updated_at = <Date 2017-06-01.07:22:24.509>
    user = 'https://github.com/seirl'

    bugs.python.org fields:

    activity = <Date 2017-06-01.07:22:24.509>
    actor = 'antoine.pietri'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2017-06-01.07:22:24.510>
    closer = 'antoine.pietri'
    components = ['Library (Lib)']
    creation = <Date 2014-03-24.01:32:57.313>
    creator = 'antoine.pietri'
    dependencies = []
    files = ['34603', '34605', '34739', '34740']
    hgrepos = []
    issue_num = 21044
    keywords = ['patch']
    message_count = 24.0
    messages = ['214662', '214663', '214670', '214685', '214708', '214754', '214756', '214757', '214760', '215216', '215230', '215626', '215627', '215632', '216984', '223275', '223280', '223319', '223335', '223338', '223345', '223643', '223645', '223742']
    nosy_count = 10.0
    nosy_names = ['barry', 'lars.gustaebel', 'eric.araujo', 'r.david.murray', 'python-dev', 'martin.panter', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'antoine.pietri']
    pr_nums = []
    priority = 'normal'
    resolution = 'remind'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue21044'
    versions = ['Python 2.7']

    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Mar 24, 2014

    The fact that tempfile.TemporaryFile() has a "name" integer attribute causes weird behavior when interacting with libraries that rely on this attribute being a valid string for file objects.
    For instance, it led to this exception with the "tarfile" module, which I resolved by using a NamedTemporaryFile():

    >>> tarfile.open(fileobj=tempfile.TemporaryFile(), mode='w')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.4/tarfile.py", line 1585, in open
        return cls.taropen(name, mode, fileobj, **kwargs)
      File "/usr/lib/python3.4/tarfile.py", line 1595, in taropen
        return cls(name, mode, fileobj, **kwargs)
      File "/usr/lib/python3.4/tarfile.py", line 1431, in __init__
        self.name = os.path.abspath(name) if name else None
      File "/usr/lib/python3.4/posixpath.py", line 360, in abspath
        if not isabs(path):
      File "/usr/lib/python3.4/posixpath.py", line 64, in isabs
        return s.startswith(sep)
    AttributeError: 'int' object has no attribute 'startswith'

    Which is caused by these lines in the "tarfile" module:

    if name is None and hasattr(fileobj, "name"):
        name = fileobj.name

    If TemporaryFile() didn't have a name attribute, tarfile, which doesn't really need the file name, would simply have continued without errors.

    I am not aware of any place where this "name" integer attribute is actually useful, and, as a matter of fact, it is not even documented: http://docs.python.org/3.4/library/tempfile.html#tempfile.TemporaryFile

    @seirl seirl mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Mar 24, 2014
    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Mar 24, 2014

    Alternatively, if the "name" attribute can't be removed, I propose the following diff for the tarfile module:

    -if name is None and hasattr(fileobj, "name"):
    +if name is None and hasattr(fileobj, "name") and isinstance(fileobj.name, str):

    :-)

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 24, 2014

    This name attribute is documented here:

    http://docs.python.org/3/library/io#io.FileIO.name

    3.4 Source:

    http://hg.python.org/cpython/file/04f714765c13/Modules/_io/fileio.c#l432

    In PY2, os.fdopen sets the name to '<fdopen>'. See the related bpo-13781. Here's the workaround in gzip.py:

        filename = getattr(fileobj, 'name', '')
        if not isinstance(filename, (str, bytes)):
            filename = ''

    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Mar 24, 2014

    I attached a patch for tarfile with eryksun's suggestion.

    @bitdancer bitdancer changed the title tempfile.TemporaryFile() shouldn't have a name attribute tarfile does not handle file __name__ being an int Mar 24, 2014
    @bitdancer bitdancer changed the title tarfile does not handle file __name__ being an int tarfile does not handle file .name being an int Mar 24, 2014
    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Mar 24, 2014

    Here's the test case as requested by berkerpeksag in the patch review.

    @merwok
    Copy link
    Member

    merwok commented Mar 24, 2014

    Isn’t the bug here really that TemporaryFile has a name attribute that’s not a string? I don’t see how an integer name makes sense within the IO system.

    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Mar 24, 2014

    Yes, the bug report was originally titled like this ("TemporaryFile should'nt have a name attribute") but apparently this is a common and expected behavior for FileIO subclasses.

    @bitdancer
    Copy link
    Member

    See the documentation link in msg214670. This isn't a characteristic of TemporaryFile, it's a characteristic of the Python IO system. So you'd have to argue that the documented behavior of the io system is a bug.

    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Mar 24, 2014

    Well, this behavior seems pretty inconsistent to me, it means that every library has to check if the name attribute is actually a string (which would correspond to a path) or an int (that would mean a fd), and I think a "fd" attribute would seem more consistent.
    But as I don't think we could change this behavior that easily (it means breaking retrocompatibility after all...), we must at least make tarlib deal with that.

    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Mar 31, 2014

    Ping. The patch is just one line and there's a test case, if someone could review that, it would be great!

    (For the record I signed the contributor agreement a week ago and my profile still hasn't been updated).

    @serhiy-storchaka
    Copy link
    Member

    Does tarfile work with bytes file names?

    I think a test should explicitly test file objects with str, bytes and int name and file objects without name in all write modes ("w", "w:gz", "w|gz", etc).

    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Apr 5, 2014

    Well, that seems complicated: you can't overwrite a io.FileIO().name attribute, and doing so would be nonsensical for tarfile, which would try to perform IO operations on a random file descriptor... Also, I can't think of any case where a .name attribute could actually be bytes (I was just mirroring the code in msg214670).

    Here's a patch that tries all combinations of encoding for writing, but I can't see a way to enforce manually the name attribute being an int, even for this test purposes.

    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Apr 5, 2014

    Actually, thinking about it, it seems safer to use os.open() + os.fdopen() than TemporaryFile(), like in the equivalent test for 'gzip'.

    This new patch replaces last one.

    @eryksun
    Copy link
    Contributor

    eryksun commented Apr 5, 2014

    you can't overwrite a io.FileIO().name attribute

    A FileIO instance uses a dict for 'name' (msg214670):

        >>> vars(sys.stdin.buffer.raw)
        {'name': '<stdin>'}
    
        >>> f = tempfile.TemporaryFile()
        >>> vars(f.raw)
        {'name': 3}

    The name is optional meta-information. If it gets deleted, the repr falls back on using the file descriptor:

        >>> f.raw
        <_io.FileIO name=3 mode='rb+'>
    
        >>> del f.raw.name
        >>> f.raw
        <_io.FileIO fd=3 mode='rb+'>

    @vadmium
    Copy link
    Member

    vadmium commented Apr 22, 2014

    I ran into a related issue with the gettarinfo() method. Would that fall under the scope of this bug, or should I raise a separate one?

    >>> with tarfile.open("/dev/null", "w") as tar:
    ...     with open(b"/bin/sh", "rb") as file:
    ...         tar.gettarinfo(fileobj=file)
    ... 
    Traceback (most recent call last):
      File "<stdin>", line 3, in <module>
      File "/usr/lib/python3.4/tarfile.py", line 1768, in gettarinfo
        arcname = arcname.replace(os.sep, "/")
    TypeError: expected bytes, bytearray or buffer compatible object

    I realise that making TarInfo object with a byte string or integer as a file name is not a good idea. Perhaps the documentation should explicitly say that “fileobj.name” must be a real unencoded file name string unless “arcname” is also given. In my particular case I added arcname="", because my code generates the proper file name later on.

    @serhiy-storchaka serhiy-storchaka self-assigned this Jul 16, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 16, 2014

    New changeset 308f3c1e36d3 by Serhiy Storchaka in branch '2.7':
    bpo-21044: tarfile.open() now handles fileobj with an integer 'name'
    http://hg.python.org/cpython/rev/308f3c1e36d3

    New changeset d6b71971b228 by Serhiy Storchaka in branch '3.4':
    bpo-21044: tarfile.open() now handles fileobj with an integer 'name'
    http://hg.python.org/cpython/rev/d6b71971b228

    New changeset 4c2f3240ad65 by Serhiy Storchaka in branch 'default':
    bpo-21044: tarfile.open() now handles fileobj with an integer 'name'
    http://hg.python.org/cpython/rev/4c2f3240ad65

    @serhiy-storchaka
    Copy link
    Member

    Committed with rewritten tests. Thank you for your contribution Martin.

    I ran into a related issue with the gettarinfo() method. Would that fall under the scope of this bug, or should I raise a separate one?

    Yes, this looks as a separate bug.

    @vadmium
    Copy link
    Member

    vadmium commented Jul 17, 2014

    Opened bpo-21996 for the “gettarinfo” method.

    Also, Serhiy, I think you may have got me mixed up with someone else. I don’t think I did any patches here, so I probably shouldn’t be credited for them :)

    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Jul 17, 2014

    Also, Serhiy, I think you may have got me mixed up with someone else. I don’t think I did any patches here, so I probably shouldn’t be credited for them :)

    Yeah, but I don't mind if I'm not in the ACKS file for a one-line patch though :P

    @zware
    Copy link
    Member

    zware commented Jul 17, 2014

    This change appears to have broken 2.7 on Windows:

    http://buildbot.python.org/all/builders/x86%20Windows7%202.7/builds/2707/steps/test/logs/stdio

    @zware zware reopened this Jul 17, 2014
    @seirl
    Copy link
    Mannequin Author

    seirl mannequin commented Jul 17, 2014

    This change does not need to be merged on 2.7 anyway, as the os.fdopen sets the name attribute to '<fdopen>' and not to the fd, this check is not required prior to python 3.

    Still, it would be interesting to investigate why this breaks 2.7 though.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 22, 2014

    New changeset 51699f5f5430 by Serhiy Storchaka in branch '2.7':
    Backout 308f3c1e36d3. This change (bpo-21044) does not need to be merged on
    http://hg.python.org/cpython/rev/51699f5f5430

    @serhiy-storchaka
    Copy link
    Member

    Thanks Zachary for pointing to buildbot failure.

    Also, Serhiy, I think you may have got me mixed up with someone else.

    Indeed, I either missed you with Antoine Pietri or missed this issue with bpo-19524. In any case thanks you for your activity on the tracker. And thanks Antoine Pietri for his contribution on this issue.

    Still, it would be interesting to investigate why this breaks 2.7 though.

    Left open until will investigated this. Will open a new issue if there is a bug here.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 23, 2014

    New changeset 825137d0d4ca by Serhiy Storchaka in branch '3.4':
    Correct issue bpo-21044 patch author.
    http://hg.python.org/cpython/rev/825137d0d4ca

    New changeset 4fe27263f9d4 by Serhiy Storchaka in branch 'default':
    Correct issue bpo-21044 patch author.
    http://hg.python.org/cpython/rev/4fe27263f9d4

    @seirl seirl mannequin closed this as completed Jun 1, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants