Message 96217 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	asnakelover
Recipients	asnakelover, brian.curtin, pitrou
Date	2009-12-10.22:27:12
SpamBayes Score	3.608225e-15
Marked as misclassified	No
Message-id	<1260484035.55.0.626629150168.issue7471@psf.upfronthosting.co.za>
In-reply-to

Content
Hope this reply works right, the python bug interface is a bit confusing for this newbie, it doesn't say "Reply" anywhere - sorry if it goes FUBAR. I tried the splitlines() version you suggested, it thrashed my machine so badly I pressed alt+sysrq+f (which invokes kernel oom_kill) after about 1 minute so I didn't lose anything important. About half a minute later the machine came back to life. In other words: the splitlines version used way, way too much memory - far worse even than making a cStringIO from a GzipFile instance.read(). It's not just a GzipFile.readline() issue either, c.py calls .read() and tries to turn the result into a cStringIO and that was the worst one of my three previous tests. I'm going to look at this purely from a consumer angle and not even look at gzip module source, from this angle (a consumer), zcat out performs it by a factor of 10 when gzip module is used with .readline() and by a good deal more when I try to read the whole gzip file as a string to turn into a cStringIO to emulate as closely as possible what happens with forking a zcat process. When I tried to splitlines() it was even worse. This is probably a RAM issue, but it just brings us back to - should gzip module eat so much ram when shelling out to zcat uses far less?

Hope this reply works right, the python bug interface is a bit confusing
for this newbie, it doesn't say "Reply" anywhere - sorry if it goes FUBAR.

I tried the splitlines() version you suggested, it thrashed my machine
so badly I pressed alt+sysrq+f (which invokes kernel oom_kill) after
about 1 minute so I didn't lose anything important. About half a minute
later the machine came back to life. In other words: the splitlines
version used way, way too much memory - far worse even than making a
cStringIO from a GzipFile instance.read().

It's not just a GzipFile.readline() issue either, c.py calls .read() and
tries to turn the result into a cStringIO and that was the worst one of
my three previous tests. I'm going to look at this purely from a
consumer angle and not even look at gzip module source, from this angle
(a consumer), zcat out performs it by a factor of 10 when gzip module is
used with .readline() and by a good deal more when I try to read the
whole gzip file as a string to turn into a cStringIO to emulate as
closely as possible what happens with forking a zcat process. When I
tried to splitlines() it was even worse. This is probably a RAM issue,
but it just brings us back to - should gzip module eat so much ram when
shelling out to zcat uses far less?

History
Date	User	Action	Args
2009-12-10 22:27:16	asnakelover	set	recipients: + asnakelover, pitrou, brian.curtin
2009-12-10 22:27:15	asnakelover	set	messageid: <1260484035.55.0.626629150168.issue7471@psf.upfronthosting.co.za>
2009-12-10 22:27:14	asnakelover	link	issue7471 messages
2009-12-10 22:27:12	asnakelover	create