This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jackdied
Recipients asnakelover, brian.curtin, jackdied, pitrou
Date 2009-12-11.20:32:46
SpamBayes Score 1.7624285e-10
Marked as misclassified No
Message-id <1260563627.31.0.730723249837.issue7471@psf.upfronthosting.co.za>
In-reply-to
Content
I tried passing a size to readline to see if increasing the chunk helps
(test file was 120meg with 700k lines).  For values 1k-10k all took
around 30 seconds, with a value of 100 it took 80 seconds, with a value
of 100k it ran for several minutes before I killed it.  The default
starts at 100 and quickly maxes to 512, which seems to be a sweet spot
(thanks whomever figured that out!).

I profiled it and function overhead seems to be the real killer.  30% of
the time is spent in readline().  The next() function does almost
nothing and consumes 1/4th the time of readline().  Ditto for read() and
_unread().  Even lowly len() consumes 1/3rd the time of readline()
because it is called over 2million times.

There doesn't seem to be any way to speed this up without rewriting the
whole thing as a C module.  I'm closing the bug WONTFIX.
History
Date User Action Args
2009-12-11 20:33:47jackdiedsetrecipients: + jackdied, pitrou, brian.curtin, asnakelover
2009-12-11 20:33:47jackdiedsetmessageid: <1260563627.31.0.730723249837.issue7471@psf.upfronthosting.co.za>
2009-12-11 20:32:47jackdiedlinkissue7471 messages
2009-12-11 20:32:46jackdiedcreate