This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author David.Nesting
Recipients David.Nesting
Date 2010-11-16.18:17:42
SpamBayes Score 4.9960036e-16
Marked as misclassified No
Message-id <1289931464.79.0.618120236196.issue10436@psf.upfronthosting.co.za>
In-reply-to
Content
When opening a tarfile with mode "r|" (streaming mode), extractfile("filename") and extractfile(mytarfile.getmembers()[0]) raise "tarfile.StreamError: seeking backwards is not allowed".  extractfile(mytarfile.next()) succeeds.  A more complete test case:

"""
import tarfile
import StringIO

# Create a simple tar file in memory.  This could easily be a real tar file
# though.
data = StringIO.StringIO()
tf = tarfile.open(fileobj=data, mode="w")
tarinfo = tarfile.TarInfo(name="testfile")
filedata = StringIO.StringIO("test data")
tarinfo.size = len(filedata.getvalue())
tf.addfile(tarinfo, fileobj=filedata)
tf.close()
data.seek(0)

# Open as an uncompressed stream
tf = tarfile.open(fileobj=data, mode="r|")

#f = tf.extractfile("testfile")
#print "%s: %s" % (f.name, f.read())
#
#Traceback (most recent call last):
#  File "./bug.py", line 19, in <module>
#    print "%s: %s" % (f.name, f.read())
#  File "/usr/lib/python2.7/tarfile.py", line 815, in read
#    buf += self.fileobj.read()
#  File "/usr/lib/python2.7/tarfile.py", line 735, in read
#    return self.readnormal(size)
#  File "/usr/lib/python2.7/tarfile.py", line 742, in readnormal
#    self.fileobj.seek(self.offset + self.position)
#  File "/usr/lib/python2.7/tarfile.py", line 554, in seek
#    raise StreamError("seeking backwards is not allowed")
#tarfile.StreamError: seeking backwards is not allowed

#for member in tf.getmembers():
#  f = tf.extractfile(member)
#  print "%s: %s" % (f.name, f.read())
#
# Same traceback

while True:
  member = tf.next()
  if member is None:
    break
  f = tf.extractfile(member)
  print "%s: %s" % (f.name, f.read())

# This works.
"""

It appears that extractfile("filename") invokes getmember("filename"), which invokes getmembers().  getmembers() scans the entire file before returning results, and by doing so, it's read past and discarded the actual file data, which makes it impossible for us to actually extract it.

If this is accurate, this seems tricky to completely fix.  You could make getmembers() a generator that doesn't read too far ahead so that the file's contents are still available if someone wants to retrieve them for each file yielded.  getmember("filename") could just scan forward through the file until it hits a match, but you'd still lose the ability to do a getmember("filename") on a file that we skipped over.

If nothing else, document that extractfile("filename"), getmember() and getmembers() won't work reliably in streaming mode, and possibly raise an exception whenever someone tries just to make behavior consistent.
History
Date User Action Args
2010-11-16 18:17:44David.Nestingsetrecipients: + David.Nesting
2010-11-16 18:17:44David.Nestingsetmessageid: <1289931464.79.0.618120236196.issue10436@psf.upfronthosting.co.za>
2010-11-16 18:17:43David.Nestinglinkissue10436 messages
2010-11-16 18:17:42David.Nestingcreate