New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarfile requires an actual file on disc; a file-like object is insufficient #54578
Comments
The tarfile module's gettarinfo callable insists on stat'ing the file in question, preventing one from dynamically generating file content by passing a file-like object for addfile's fileobj argument. I believe the attached patch fixes this issue. I generated the patch against 2.7 and tested it with 2.7, but it applies cleanly against 3.1 and "feels innocuous". I've also included my test code at the bottom of this comment. Why would you want to do this? Imagine you've stored a file in three smaller files (perhaps to save the pieces on small external media, or as part of a deduplication system), with the content divided up into thirds. To subsequently put this file as a whole into a tar archive, it'd be nice if you could just create a file-like object to emit the catenation, rather than having to create a temporary file holding that catenation. It's occurred to me that this should be done in a more object oriented style, but that feels a bit inconsistent given that fstat is in the os module, and not provided as an attribute of a file(-like) object. Comments? Here's the test code: #!/usr/local/cpython-2.7/bin/python import os
import sys
import copy
import array
import stat_tarfile
def my_stat(filename):
class mutable_stat:
pass
readonly_statobj = os.lstat(filename)
mutable_statobj = mutable_stat()
for attribute in dir(readonly_statobj):
if not attribute.startswith('_'):
value = getattr(readonly_statobj, attribute)
setattr(mutable_statobj, attribute, value)
return mutable_statobj
class generate_file_content:
def __init__(self, number):
self._multiplier = 100
self._multipleno = 0
self._number = str(number)
self._buffer = ''
def read(self, length):
while self._multipleno < self._multiplier and len(self._buffer) < length:
self._buffer += self._number
self._multipleno += 1
if self._buffer == '':
return ''
else:
result = self._buffer[:length]
self._buffer = self._buffer[length:]
return result
def main():
with stat_tarfile.open(fileobj = sys.stdout, mode = "w|") as tar:
for number in xrange(100):
#string = str(number) * 100
fileobj = generate_file_content(number)
statobj = my_stat('/etc/passwd')
statobj.st_size = len(str(number)) * 100
filename = 'file-%d.txt' % number
tarinfo = tar.gettarinfo(filename, statobj = statobj)
tarinfo.uid = 1000
tarinfo.gid = 1000
tarinfo.uname = "dstromberg"
tarinfo.gname = "dstromberg"
tar.addfile(tarinfo, fileobj)
main() |
Hm, why don't you just do this: with stat_tarfile.open(fileobj = sys.stdout, mode = "w|") as tar:
for number in xrange(100):
fileobj = generate_file_content(number)
tarinfo = tar.gettarinfo(fileobj=open("/etc/passwd"))
tarinfo.name = 'file-%d.txt' % number
tarinfo.size = len(str(number)) * 100
tarinfo.uid = 1000
tarinfo.gid = 1000
tarinfo.uname = "dstromberg"
tarinfo.gname = "dstromberg"
tar.addfile(tarinfo, fileobj) Wouldn't that work, too? Or am I missing something? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: