Message 9490 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gvanrossum
Recipients
Date	2002-03-20.03:43:06
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=6380 There are two forces at work here. You want the most common case (a single "for line in file" that consumes the whole file) to run blindingly fast. And you want full generality, basically equating next() with readline() (except raising StopIteration on EOF). Unfortunately, the only way to go blindingly fast is to do aggressive buffering, and that's what xreadlines does. Until we rewrite the entire I/O system so we have total control over buffering ourselves, it's not easy to mix xreadlines with other operations (readline, seek, tell). We could make the default file iterator use readline, but then it would become much slower, and we'd have to teach people about xreadlines if they want speed. Or we could use the current solution, where speed is the default, and you have to be more careful when you use an unusual coding style, like breaking out of the for loop and continuing in a second for loop. I'm not sure which requirement is more common (speed, or generality), but since looping over all the lines of a (big) file is such a common pattern, I would bet the speed is the more important of the two. In the past we've had a lot of flak about the slowness of the general solution of looping over all lines in a file; the xreadlines-based iterator is much faster, and I am reluctant to change this back in Python 2.3; I'd rather document it carefully (after all, "for line in lines" is a new construct in Python 2.2, and people have to be told about it; we might as well tell them about the limitations and how to work around them).

Logged In: YES 
user_id=6380

There are two forces at work here.

You want the most common case (a single "for line in file"
that consumes the whole file) to run blindingly fast.

And you want full generality, basically equating next() with
readline() (except raising StopIteration on EOF).

Unfortunately, the only way to go blindingly fast is to do
aggressive buffering, and that's what xreadlines does. Until
we rewrite the entire I/O system so we have total control
over buffering ourselves, it's not easy to mix xreadlines
with other operations (readline, seek, tell).

We could make the default file iterator use readline, but
then it would become much slower, and we'd have to teach
people about xreadlines if they want speed. Or we could use
the current solution, where speed is the default, and you
have to be more careful when you use an unusual coding
style, like breaking out of the for loop and continuing in a
second for loop.

I'm not sure which requirement is more common (speed, or
generality), but since looping over all the lines of a (big)
file is such a common pattern, I would bet the speed is the
more important of the two.

In the past we've had a lot of flak about the slowness of
the general solution of looping over all lines in a file;
the xreadlines-based iterator is much faster, and I am
reluctant to change this back in Python 2.3; I'd rather
document it carefully (after all, "for line in lines" is a
new construct in Python 2.2, and people have to be told
about it; we might as well tell them about the limitations
and how to work around them).

History
Date	User	Action	Args
2007-08-23 13:59:33	admin	link	issue524804 messages
2007-08-23 13:59:33	admin	create