Title: 3.2: tarfile.getmembers causes 100% cpu usage on Windows
Created on 2011-02-16 18:12 by srid, last changed 2011-02-23 11:55 by lars.gustaebel. This issue is now closed.

msg128685 - (view) Author: Sridhar Ratnakumar (srid) Date: 2011-02-16 18:12
tarfile.getmembers has become extremely slow on Windows. This was triggered in r85916 by Lars Gustaebel on Oct 29, 2010 to "add read support for all missing variants of the GNU sparse extensions".

To reproduce, use this "tgz" file:

It contains another tgz file called "data.tar.gz". Run `.getmembers()` on data.tar.gz.


This invokes that seems to be cause of slowness (or rather a hang). 

I had to workaround this issue by monkey-patching the above `read` function to revert the change:

+if sys.version_info[:2] >= (3,2):
+    import tarfile
+    class _FileInFileNoSparse(tarfile._FileInFile):
+        def read(self, size):
+            if size is None:
+                size = self.size - self.position
+            else:
+                size = min(size, self.size - self.position)
+   + self.position)
+            self.position += size
+            return
+    tarfile._FileInFile = _FileInFileNoSparse
+'Monkey patching `` to disable part of r85916 (py3k)')

We caught this bug as part of testing ActiveState PyPM on Python 3.2

If you want the easiest way to reproduce this, I can send you (in private) an internal build of ActivePython-3.2 containing PyPM. Running "pypm install numpy" (with breakpoints in is all that is required to reproduce.
msg128840 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011-02-19 10:42 does lots of unnecessary seeking and reads the same block again and again. The attached patch fixes that. Please try if it works for you.
msg128931 - (view) Author: Sridhar Ratnakumar (srid) Date: 2011-02-21 02:28
Lars, the attached patch fixes the issue. I'll add this to ActivePython 3.2. Thanks.
msg129178 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011-02-23 11:55
Thanks for your great report. This is fixed now in r88528 (py3k) and r88529 (release32-maint).
