Message 96272 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jackdied
Recipients	asnakelover, brian.curtin, jackdied, pitrou
Date	2009-12-11.20:32:46
SpamBayes Score	1.7624285e-10
Marked as misclassified	No
Message-id	<1260563627.31.0.730723249837.issue7471@psf.upfronthosting.co.za>
In-reply-to

Content
I tried passing a size to readline to see if increasing the chunk helps (test file was 120meg with 700k lines). For values 1k-10k all took around 30 seconds, with a value of 100 it took 80 seconds, with a value of 100k it ran for several minutes before I killed it. The default starts at 100 and quickly maxes to 512, which seems to be a sweet spot (thanks whomever figured that out!). I profiled it and function overhead seems to be the real killer. 30% of the time is spent in readline(). The next() function does almost nothing and consumes 1/4th the time of readline(). Ditto for read() and _unread(). Even lowly len() consumes 1/3rd the time of readline() because it is called over 2million times. There doesn't seem to be any way to speed this up without rewriting the whole thing as a C module. I'm closing the bug WONTFIX.

I tried passing a size to readline to see if increasing the chunk helps
(test file was 120meg with 700k lines).  For values 1k-10k all took
around 30 seconds, with a value of 100 it took 80 seconds, with a value
of 100k it ran for several minutes before I killed it.  The default
starts at 100 and quickly maxes to 512, which seems to be a sweet spot
(thanks whomever figured that out!).

I profiled it and function overhead seems to be the real killer.  30% of
the time is spent in readline().  The next() function does almost
nothing and consumes 1/4th the time of readline().  Ditto for read() and
_unread().  Even lowly len() consumes 1/3rd the time of readline()
because it is called over 2million times.

There doesn't seem to be any way to speed this up without rewriting the
whole thing as a C module.  I'm closing the bug WONTFIX.

History
Date	User	Action	Args
2009-12-11 20:33:47	jackdied	set	recipients: + jackdied, pitrou, brian.curtin, asnakelover
2009-12-11 20:33:47	jackdied	set	messageid: <1260563627.31.0.730723249837.issue7471@psf.upfronthosting.co.za>
2009-12-11 20:32:47	jackdied	link	issue7471 messages
2009-12-11 20:32:46	jackdied	create