This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author teamnoir
Recipients teamnoir
Date 2013-08-15.05:20:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1376544025.63.0.959353089302.issue18744@psf.upfronthosting.co.za>
In-reply-to
Content
There's a problem with tarfile.  Write a program to traverse the contents of a modest sized tar archive.  Make sure your tar archive is compressed.  Then read the tar archive with your program.

I'm finding that allowing tarfile to read a compressed archive costs me somewhere on the order of a 60x performance penalty by comparison to opening the file with gzip, then passing the gzip contents to tarfile.  Programs that could take a few minutes are literally taking a few hours when using tarfile.

This seems stupid.  The tarfile library could do the same thing I'm doing manually, in fact, I had assumed that it would and was surprised by the performance I was seeing, so I ran with the profiler and saw millions of decompression calls.  It's almost as though the tarfile library is decompressing the entire archive for every member extraction.

Note, you can get even worse performance if you sort the member names and then extract in that order.  I'm not sure whether this "should" matter since the tar file order is sequential.
History
Date User Action Args
2013-08-15 05:20:25teamnoirsetrecipients: + teamnoir
2013-08-15 05:20:25teamnoirsetmessageid: <1376544025.63.0.959353089302.issue18744@psf.upfronthosting.co.za>
2013-08-15 05:20:25teamnoirlinkissue18744 messages
2013-08-15 05:20:24teamnoircreate