Issue23455
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015-02-12 18:48 by dalke, last changed 2022-04-11 14:58 by admin.
Messages (1) | |||
---|---|---|---|
msg235850 - (view) | Author: Andrew Dalke (dalke) * | Date: 2015-02-12 18:48 | |
The file iterator is "deemed broken". As I don't think it should be made non-broken, I suggest the documentation should be changed to point out when file iteration is broken. I also think the term 'broken' is a label with needlessly harsh connotations and should be softened. The iterator documentation uses the term 'broken' like this (quoting here from https://docs.python.org/3.4/library/stdtypes.html): Once an iterator’s __next__() method raises StopIteration, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken. (Older versions comment "This constraint was added in Python 2.3; in Python 2.2, various iterators are broken according to this rule.") An IOBase is supposed to support the iterator protocol (says https://docs.python.org/3.4/library/io.html#io.IOBase ). However, it does not, nor does the documentation say that it's broken in the face of a changing file (eg, when another process appends to a log file). % ./python.exe Python 3.5.0a1+ (default:4883f9046b10, Feb 11 2015, 04:30:46) [GCC 4.8.4] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> f = open("empty") >>> next(f) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration >>> >>> ^Z Suspended % echo "Hello!" >> empty % fg ./python.exe >>> next(f) 'Hello!\n' This is apparently well-known behavior, as I've come across several references to it on various Python-related lists, including this one from Miles in 2008: https://mail.python.org/pipermail/python-list/2008-September/491920.html . Strictly speaking, file objects are broken iterators: Fredrik Lundh in the same thread ( https://mail.python.org/pipermail/python-list/2008-September/521090.html ) says: it's a design guideline, not an absolute rule The 7+ years of 'broken' behavior in Python suggests that /F is correct. But while 'broken' could be considered a meaningless label, it carries with it some rather negative connotations. It sounds like developers are supposed to make every effort to avoid broken code, when that's not something Python itself does. It also means that my code can be called "broken" solely because it assumed Python file iterators are non-broken. I am not happy when people say my code is broken. It is entirely reasonable that a seek(0) would reset the state and cause next(it) to not continue to raise a StopIteration exception. However, errors can arise when using Python file objects, as an iterator, to parse a log file or any other files which are appended to by another process. Here's an example of code that can break. It extracts the first and last elements of an iterator; more specifically, the first and last lines of a file. If there are no lines it returns None for both values; and if there's only one line then it returns the same line as both values. def get_first_and_last_elements(it): first = last = next(it, None) for last in it: pass return first, last This code expects a non-broken iterator. If passed a file, and the file were 1) initially empty when the next() was called, and 2) appended to by the time Python reaches the for loop, then it's possible for first value to be None while last is a string. This is unexpected, undocumented, and may lead to subtle errors. There are work-arounds, like ensuring that the StopIteration only occurs once: def get_first_and_last_elements(it): first = last = next(it, None) if last is not None: for last in it: pass return first, last but much existing code expects non-broken iterators, such as the Python example implementation at https://docs.python.org/2/library/itertools.html#itertools.dropwhile . (I have a reproducible failure using it, a fork(), and a file iterator with a sleep() if that would prove useful.) Another option is to have a wrapper around file object iterators to keep raising StopIteration, like: def safe_iter(it): yield from it # -or- (line for line in file_iter) but people need to know to do this with file iterators or other potentially broken iterators. The current documentation does not say when file iterators are broken, and I don't know which other iterators are also broken. I realize this is a tricky issue. I don't think it's possible now to change the file's StopIteration behavior. I expect that there is code which depends on the current brokenness, the ability to seek() and re-iterate is useful, and the idea that next() returns text if and only if readline() is not empty is useful and well-entrenched. Pypy has the same behavior as CPython so any change will take some time to propagate to the other implementations. Instead, I'm fine with a documentation change in io.html . It currently says: IOBase (and its subclasses) support the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding unicode strings). See readline() below. I suggest adding something like: The file iterator does not completely follow the iterator protocol. If new data is added to the file after the iterator raises a StopIteration then next(file) will resume returning lines. The safest way to iterate over lines in a log file or other changing file is use a generator comprehension: (line for line in file) The iterator may also resume after using seek() to move the file position. You'll note that I failed to use the term "broken". This should really start The file iterator is broken. I find that term rather harsh, and since broken iterators are acceptable in Python, I suggest toning down or qualifying the use of "broken" in stdtypes.html. I have no suggestions for an improved version. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:12 | admin | set | github: 67643 |
2015-07-21 07:29:07 | ethan.furman | set | nosy:
- ethan.furman |
2015-03-02 07:43:48 | ezio.melotti | set | nosy:
+ pitrou type: behavior |
2015-02-12 18:55:28 | ethan.furman | set | nosy:
+ ethan.furman |
2015-02-12 18:48:54 | dalke | create |