Message96272
I tried passing a size to readline to see if increasing the chunk helps
(test file was 120meg with 700k lines). For values 1k-10k all took
around 30 seconds, with a value of 100 it took 80 seconds, with a value
of 100k it ran for several minutes before I killed it. The default
starts at 100 and quickly maxes to 512, which seems to be a sweet spot
(thanks whomever figured that out!).
I profiled it and function overhead seems to be the real killer. 30% of
the time is spent in readline(). The next() function does almost
nothing and consumes 1/4th the time of readline(). Ditto for read() and
_unread(). Even lowly len() consumes 1/3rd the time of readline()
because it is called over 2million times.
There doesn't seem to be any way to speed this up without rewriting the
whole thing as a C module. I'm closing the bug WONTFIX. |
|
Date |
User |
Action |
Args |
2009-12-11 20:33:47 | jackdied | set | recipients:
+ jackdied, pitrou, brian.curtin, asnakelover |
2009-12-11 20:33:47 | jackdied | set | messageid: <1260563627.31.0.730723249837.issue7471@psf.upfronthosting.co.za> |
2009-12-11 20:32:47 | jackdied | link | issue7471 messages |
2009-12-11 20:32:46 | jackdied | create | |
|