One of the principles of Python is that "There should be one-- and
preferably only one --obvious way to do it." It seems that the "for
line in file" idiom is The Way to iterate over the lines of a file, and
older more explicit methods are deprecated. PEP 234 says that this:
for line in file:
...
is equivalent to this:
for line in iter(file.readline, ""):
...
or this:
while 1:
line = file.readline()
if not line:
break
...
However, "for line in file" does not behave the same as the other two if
the file is a named pipe. This is presumably due to the "hidden
read-ahead buffer" in the low-level implementation of the next() method
of the file iterator
(http://docs.python.org/lib/bltin-file-objects.html), meant to increase
the speed at which it reads regular physical files. Since not enough
data exists in the pipe to fill the buffer yet, the lines are only read
in a burst after the buffer has been filled or when the pipe is closed.
My application is monitoring a pipe for new lines from a logging
program, and I want each line read as soon as it is written. Sure,
there are other ways to get this functionality, but I don't see why "for
line in file" shouldn't behave the same way for any file-like object.
I wonder if it can be made to internally use the read-ahead buffer for
closed physical files, and a different method for open named pipes. I
wonder if reading pipes character-by-character causes any significant
slowdown compared to the read-ahead buffer when the pipe resides in
memory instead of a disk.
Forgive me if this is not really a bug, but it seems to my beginner eyes
that things are not working the way they should.
http://python-forum.org/pythonforum/viewtopic.php?t=9300
http://ubuntuforums.org/showthread.php?t=916518
|