Author pitrou
Recipients asnakelover, brian.curtin, pitrou
Date 2009-12-10.23:03:20
SpamBayes Score 8.01898e-09
Marked as misclassified No
Message-id <1260486235.3414.6.camel@localhost>
In-reply-to <1260485677.82.0.178769470321.issue7471@psf.upfronthosting.co.za>
Content
> The gz in question is 17mb compressed and 247mb uncompressed. Calling
> zcat the python process uses between 250 and 260 mb with the whole
> string in memory using zcat as a fork. Numbers for the gzip module
> aren't obtainable except for readline(), which doesn't use much memory
> but is very slow. Other methods thrash the machine to death.
> 
> The machine has 300mb free RAM from a total of 1024mb.

That would be the explanation. Reading the whole file at once and then
doing splitlines() on the result consumes twice the memory, since a list
of lines must be constructed while the original data is still around. If
you had more than 600 MB free RAM the splitlines() solution would
probably be adequate :-)

Doing repeated calls to splitlines() on chunks of limited size (say 1MB)
would probably be fast enough without using too much memory. It would be
a bit less trivial to implement though, and it seems you are ok with the
subprocess solution.
History
Date User Action Args
2009-12-10 23:03:22pitrousetrecipients: + pitrou, brian.curtin, asnakelover
2009-12-10 23:03:21pitroulinkissue7471 messages
2009-12-10 23:03:20pitroucreate