Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfile requires an actual file on disc; a file-like object is insufficient #54578

Closed
strombrg mannequin opened this issue Nov 8, 2010 · 2 comments
Closed

tarfile requires an actual file on disc; a file-like object is insufficient #54578

strombrg mannequin opened this issue Nov 8, 2010 · 2 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@strombrg
Copy link
Mannequin

strombrg mannequin commented Nov 8, 2010

BPO 10369
Nosy @gustaebel, @merwok
Files
  • tarfile.diff: Patch to enable passing a stat result object to gettarinfo
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gustaebel'
    closed_at = <Date 2012-03-06.10:23:52.051>
    created_at = <Date 2010-11-08.23:42:04.116>
    labels = ['invalid', 'type-feature', 'library']
    title = 'tarfile requires an actual file on disc; a file-like object is insufficient'
    updated_at = <Date 2012-03-06.10:23:52.050>
    user = 'https://bugs.python.org/strombrg'

    bugs.python.org fields:

    activity = <Date 2012-03-06.10:23:52.050>
    actor = 'lars.gustaebel'
    assignee = 'lars.gustaebel'
    closed = True
    closed_date = <Date 2012-03-06.10:23:52.051>
    closer = 'lars.gustaebel'
    components = ['Library (Lib)']
    creation = <Date 2010-11-08.23:42:04.116>
    creator = 'strombrg'
    dependencies = []
    files = ['19549']
    hgrepos = []
    issue_num = 10369
    keywords = ['patch']
    message_count = 2.0
    messages = ['120822', '120858']
    nosy_count = 3.0
    nosy_names = ['lars.gustaebel', 'eric.araujo', 'strombrg']
    pr_nums = []
    priority = 'normal'
    resolution = 'not a bug'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue10369'
    versions = ['Python 3.2']

    @strombrg
    Copy link
    Mannequin Author

    strombrg mannequin commented Nov 8, 2010

    The tarfile module's gettarinfo callable insists on stat'ing the file in question, preventing one from dynamically generating file content by passing a file-like object for addfile's fileobj argument.

    I believe the attached patch fixes this issue. I generated the patch against 2.7 and tested it with 2.7, but it applies cleanly against 3.1 and "feels innocuous". I've also included my test code at the bottom of this comment.

    Why would you want to do this? Imagine you've stored a file in three smaller files (perhaps to save the pieces on small external media, or as part of a deduplication system), with the content divided up into thirds. To subsequently put this file as a whole into a tar archive, it'd be nice if you could just create a file-like object to emit the catenation, rather than having to create a temporary file holding that catenation.

    It's occurred to me that this should be done in a more object oriented style, but that feels a bit inconsistent given that fstat is in the os module, and not provided as an attribute of a file(-like) object. Comments?

    Here's the test code:

    #!/usr/local/cpython-2.7/bin/python

    import os
    import sys
    import copy
    import array
    import stat_tarfile
    
    def my_stat(filename):
            class mutable_stat:
                    pass
            readonly_statobj = os.lstat(filename)
            mutable_statobj = mutable_stat()
            for attribute in dir(readonly_statobj):
                    if not attribute.startswith('_'):
                            value = getattr(readonly_statobj, attribute)
                            setattr(mutable_statobj, attribute, value)
            return mutable_statobj
    
    class generate_file_content:
            def __init__(self, number):
                    self._multiplier = 100
                    self._multipleno = 0
                    self._number = str(number)
                    self._buffer = ''
    
            def read(self, length):
                    while self._multipleno < self._multiplier and len(self._buffer) < length:
                            self._buffer += self._number
                            self._multipleno += 1
                    if self._buffer == '':
                            return ''
                    else:
                            result = self._buffer[:length]
                            self._buffer = self._buffer[length:]
                            return result
    
    def main():
            with stat_tarfile.open(fileobj = sys.stdout, mode = "w|") as tar:
                    for number in xrange(100):
                            #string = str(number) * 100
                            fileobj = generate_file_content(number)
                            statobj = my_stat('/etc/passwd')
                            statobj.st_size = len(str(number)) * 100
                            filename = 'file-%d.txt' % number
                            tarinfo = tar.gettarinfo(filename, statobj = statobj)
                            tarinfo.uid = 1000
                            tarinfo.gid = 1000
                            tarinfo.uname = "dstromberg"
                            tarinfo.gname = "dstromberg"
                            tar.addfile(tarinfo, fileobj)
    
    main()

    @strombrg strombrg mannequin added the stdlib Python modules in the Lib dir label Nov 8, 2010
    @bitdancer bitdancer added the type-feature A feature request or enhancement label Nov 9, 2010
    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Nov 9, 2010

    Hm, why don't you just do this:

    with stat_tarfile.open(fileobj = sys.stdout, mode = "w|") as tar:
        for number in xrange(100):
            fileobj = generate_file_content(number)
            tarinfo = tar.gettarinfo(fileobj=open("/etc/passwd")) 
            tarinfo.name = 'file-%d.txt' % number
            tarinfo.size = len(str(number)) * 100
            tarinfo.uid = 1000
            tarinfo.gid = 1000
            tarinfo.uname = "dstromberg"
            tarinfo.gname = "dstromberg"
            tar.addfile(tarinfo, fileobj)

    Wouldn't that work, too? Or am I missing something?

    @gustaebel gustaebel mannequin self-assigned this Nov 9, 2010
    @gustaebel gustaebel mannequin closed this as completed Mar 6, 2012
    @gustaebel gustaebel mannequin added the invalid label Mar 6, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant