Created on 2008-09-19 02:58 by endolith, last changed 2014-01-19 11:44 by fmoreau. This issue is now closed.
|msg73419 - (view)||Author: (endolith)||Date: 2008-09-19 02:58|
One of the principles of Python is that "There should be one-- and preferably only one --obvious way to do it." It seems that the "for line in file" idiom is The Way to iterate over the lines of a file, and older more explicit methods are deprecated. PEP 234 says that this: for line in file: ... is equivalent to this: for line in iter(file.readline, ""): ... or this: while 1: line = file.readline() if not line: break ... However, "for line in file" does not behave the same as the other two if the file is a named pipe. This is presumably due to the "hidden read-ahead buffer" in the low-level implementation of the next() method of the file iterator (http://docs.python.org/lib/bltin-file-objects.html), meant to increase the speed at which it reads regular physical files. Since not enough data exists in the pipe to fill the buffer yet, the lines are only read in a burst after the buffer has been filled or when the pipe is closed. My application is monitoring a pipe for new lines from a logging program, and I want each line read as soon as it is written. Sure, there are other ways to get this functionality, but I don't see why "for line in file" shouldn't behave the same way for any file-like object. I wonder if it can be made to internally use the read-ahead buffer for closed physical files, and a different method for open named pipes. I wonder if reading pipes character-by-character causes any significant slowdown compared to the read-ahead buffer when the pipe resides in memory instead of a disk. Forgive me if this is not really a bug, but it seems to my beginner eyes that things are not working the way they should. http://python-forum.org/pythonforum/viewtopic.php?t=9300 http://ubuntuforums.org/showthread.php?t=916518
|msg73423 - (view)||Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *||Date: 2008-09-19 08:35|
Python 2.6 and 3.0 come with a completely new I/O implementation, which correctly handle pipes in this regard (I just tested). http://docs.python.org/dev/library/io.html With the 3.0 version, the built-in open() is an alias for io.open; with 2.6, you have to use io.open() explicitely.
|msg208474 - (view)||Author: Francis Moreau (fmoreau)||Date: 2014-01-19 11:44|
Sorry for reopening this bug, but I agree with the OP, and I can still see the exact same behaviour on python 2.7.6 (archlinux). At least, the documentation should clarify that doing "for line in file" is not strictly equivalent to the "readline" way regarding to the buffering policy used with pipes. I'm also dubious about the buffering optimisation for the pipe case but readline() documentation should state that it will never use such buffering mechanism so we can safely use it when dealing with pipe. Thanks
messages: + msg208474
|2011-12-02 21:53:51||terry.reedy||set||versions: + Python 2.7, - Python 2.5|
|2008-09-19 08:35:12||amaury.forgeotdarc||set||status: open -> closed|
resolution: works for me
messages: + msg73423
nosy: + amaury.forgeotdarc