Author asnakelover
Recipients asnakelover, brian.curtin, pitrou
Date 2009-12-10.22:27:12
SpamBayes Score 3.60822e-15
Marked as misclassified No
Message-id <1260484035.55.0.626629150168.issue7471@psf.upfronthosting.co.za>
In-reply-to
Content
Hope this reply works right, the python bug interface is a bit confusing
for this newbie, it doesn't say "Reply" anywhere - sorry if it goes FUBAR.

I tried the splitlines() version you suggested, it thrashed my machine
so badly I pressed alt+sysrq+f (which invokes kernel oom_kill) after
about 1 minute so I didn't lose anything important. About half a minute
later the machine came back to life. In other words: the splitlines
version used way, way too much memory - far worse even than making a
cStringIO from a GzipFile instance.read().

It's not just a GzipFile.readline() issue either, c.py calls .read() and
tries to turn the result into a cStringIO and that was the worst one of
my three previous tests. I'm going to look at this purely from a
consumer angle and not even look at gzip module source, from this angle
(a consumer), zcat out performs it by a factor of 10 when gzip module is
used with .readline() and by a good deal more when I try to read the
whole gzip file as a string to turn into a cStringIO to emulate as
closely as possible what happens with forking a zcat process. When I
tried to splitlines() it was even worse. This is probably a RAM issue,
but it just brings us back to - should gzip module eat so much ram when
shelling out to zcat uses far less?
History
Date User Action Args
2009-12-10 22:27:16asnakeloversetrecipients: + asnakelover, pitrou, brian.curtin
2009-12-10 22:27:15asnakeloversetmessageid: <1260484035.55.0.626629150168.issue7471@psf.upfronthosting.co.za>
2009-12-10 22:27:14asnakeloverlinkissue7471 messages
2009-12-10 22:27:12asnakelovercreate