Message96221
> The gz in question is 17mb compressed and 247mb uncompressed. Calling
> zcat the python process uses between 250 and 260 mb with the whole
> string in memory using zcat as a fork. Numbers for the gzip module
> aren't obtainable except for readline(), which doesn't use much memory
> but is very slow. Other methods thrash the machine to death.
>
> The machine has 300mb free RAM from a total of 1024mb.
That would be the explanation. Reading the whole file at once and then
doing splitlines() on the result consumes twice the memory, since a list
of lines must be constructed while the original data is still around. If
you had more than 600 MB free RAM the splitlines() solution would
probably be adequate :-)
Doing repeated calls to splitlines() on chunks of limited size (say 1MB)
would probably be fast enough without using too much memory. It would be
a bit less trivial to implement though, and it seems you are ok with the
subprocess solution. |
|
Date |
User |
Action |
Args |
2009-12-10 23:03:22 | pitrou | set | recipients:
+ pitrou, brian.curtin, asnakelover |
2009-12-10 23:03:21 | pitrou | link | issue7471 messages |
2009-12-10 23:03:20 | pitrou | create | |
|