classification
Title: "for line in file" doesn't work for pipes
Type: behavior Stage:
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, endolith, fmoreau
Priority: normal Keywords:

Created on 2008-09-19 02:58 by endolith, last changed 2014-01-19 11:44 by fmoreau. This issue is now closed.

Messages (3)
msg73419 - (view) Author: (endolith) Date: 2008-09-19 02:58
One of the principles of Python is that "There should be one-- and
preferably only one --obvious way to do it."  It seems that the "for
line in file" idiom is The Way to iterate over the lines of a file, and
older more explicit methods are deprecated.  PEP 234 says that this:

    for line in file:
        ...

is equivalent to this:

    for line in iter(file.readline, ""):
        ...

or this:

    while 1:
        line = file.readline()
        if not line:
            break
        ...

However, "for line in file" does not behave the same as the other two if
the file is a named pipe.  This is presumably due to the "hidden
read-ahead buffer" in the low-level implementation of the next() method
of the file iterator
(http://docs.python.org/lib/bltin-file-objects.html), meant to increase
the speed at which it reads regular physical files.  Since not enough
data exists in the pipe to fill the buffer yet, the lines are only read
in a burst after the buffer has been filled or when the pipe is closed.
 My application is monitoring a pipe for new lines from a logging
program, and I want each line read as soon as it is written.  Sure,
there are other ways to get this functionality, but I don't see why "for
line in file" shouldn't behave the same way for any file-like object.

I wonder if it can be made to internally use the read-ahead buffer for
closed physical files, and a different method for open named pipes.  I
wonder if reading pipes character-by-character causes any significant
slowdown compared to the read-ahead buffer when the pipe resides in
memory instead of a disk.

Forgive me if this is not really a bug, but it seems to my beginner eyes
that things are not working the way they should.

http://python-forum.org/pythonforum/viewtopic.php?t=9300
http://ubuntuforums.org/showthread.php?t=916518
msg73423 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-19 08:35
Python 2.6 and 3.0 come with a completely new I/O implementation, which
correctly handle pipes in this regard (I just tested).

http://docs.python.org/dev/library/io.html

With the 3.0 version, the built-in open() is an alias for io.open;
with 2.6, you have to use io.open() explicitely.
msg208474 - (view) Author: Francis Moreau (fmoreau) Date: 2014-01-19 11:44
Sorry for reopening this bug, but I agree with the OP, and I can still see the exact same behaviour on python 2.7.6 (archlinux).

At least, the documentation should clarify that doing "for line in file" is not strictly equivalent to the "readline" way regarding to the buffering policy used with pipes.

I'm also dubious about the buffering optimisation for the pipe case but readline() documentation should state that it will never use such buffering mechanism so we can safely use it when dealing with pipe.

Thanks
History
Date User Action Args
2014-01-19 11:44:27fmoreausetnosy: + fmoreau
messages: + msg208474
2011-12-02 21:53:51terry.reedysetversions: + Python 2.7, - Python 2.5
2008-09-19 08:35:12amaury.forgeotdarcsetstatus: open -> closed
resolution: works for me
messages: + msg73423
nosy: + amaury.forgeotdarc
2008-09-19 02:58:23endolithcreate