This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gvanrossum
Recipients
Date 2002-03-20.03:43:06
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=6380

There are two forces at work here.

You want the most common case (a single "for line in file"
that consumes the whole file) to run blindingly fast.

And you want full generality, basically equating next() with
readline() (except raising StopIteration on EOF).

Unfortunately, the only way to go blindingly fast is to do
aggressive buffering, and that's what xreadlines does. Until
we rewrite the entire I/O system so we have total control
over buffering ourselves, it's not easy to mix xreadlines
with other operations (readline, seek, tell).

We could make the default file iterator use readline, but
then it would become much slower, and we'd have to teach
people about xreadlines if they want speed. Or we could use
the current solution, where speed is the default, and you
have to be more careful when you use an unusual coding
style, like breaking out of the for loop and continuing in a
second for loop.

I'm not sure which requirement is more common (speed, or
generality), but since looping over all the lines of a (big)
file is such a common pattern, I would bet the speed is the
more important of the two.

In the past we've had a lot of flak about the slowness of
the general solution of looping over all lines in a file;
the xreadlines-based iterator is much faster, and I am
reluctant to change this back in Python 2.3; I'd rather
document it carefully (after all, "for line in lines" is a
new construct in Python 2.2, and people have to be told
about it; we might as well tell them about the limitations
and how to work around them).
History
Date User Action Args
2007-08-23 13:59:33adminlinkissue524804 messages
2007-08-23 13:59:33admincreate