classification
Title: for line in sys.stdin: doesn't notice EOF the first time
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Don Hatch, Finkregh, benjamin.peterson, dankegel, doko, draghuram, eric.araujo, ggenellina, jary, marhar, mjpieters, nvetoshkin, quark, r_mosaic, ralph.corderoy, vstinner
Priority: normal Keywords:

Created on 2007-01-12 10:34 by doko, last changed 2016-11-15 16:32 by quark.

Messages (20)
msg30996 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2007-01-12 10:34
[forwarded from http://bugs.debian.org/315888]

for line in sys.stdin: doesn't notice EOF the first time when reading from tty.

The test program:

    import sys
    for line in sys.stdin:
            print line,
    print "eof"

A sample session:

    liw@esme$ python foo.py
    foo         <--- I pressed Enter and then Ctrl-D
    foo         <--- then this appeared, but not more
    eof         <--- this only came when I pressed Ctrl-D a second time
    liw@esme$

Seems to me that there is some buffering issue where Python needs to
read end-of-file twice to notice it on all levels. Once should be 
enough.

msg30997 - (view) Author: Gabriel Genellina (ggenellina) Date: 2007-01-14 04:20
Same thing occurs on Windows. Even worse, if the line does not end with CR, Ctrl-Z (EOF in Windows, equivalent to Ctrl-D) has to be pressed 3 times:

D:\Temp>python foo.py
foo  <--- I pressed Enter
^Z   <--- I pressed Ctrl-Z and then Enter again
foo  <--- this appeared
^Z   <--- I pressed Ctrl-Z and then Enter again
<EOF>

D:\Temp>python foo.py
foo^Z   <--- I pressed Ctrl-Z and then Enter
^Z      <--- cursor stays here; I pressed Ctrl-Z and then Enter again
^Z      <--- cursor stays here; I pressed Ctrl-Z and then Enter again
foo <EOF>
msg30998 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-01-22 16:34

I am not entirely sure that this is a bug.

$ cat testfile
line1
line2

$ python foo.py < testfile

This command behaves as expected. Only when the input is from tty, the above described behaviour happens. That could be because of the terminal settings where characters may be buffered until a newline is entered.

msg30999 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-01-22 16:34

I am not entirely sure that this is a bug.

$ cat testfile
line1
line2

$ python foo.py < testfile

This command behaves as expected. Only when the input is from tty, the above described behaviour happens. That could be because of the terminal settings where characters may be buffered until a newline is entered.

msg31000 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-01-22 17:37

Sorry for my duplicate comment. It was a mistake. On closer examination, the OP's description does seem to indicate some issue. Please look at (attached) stdin_noiter.py which uses readline() directly and it does not have the problem described here. It properly detects EOF on first CTRL-D. This points to some problem with the iterator function fileobject.c:file_iternext(). I think that the first CTRL-D might be getting lost somewhere in the read ahead code (which only comes into picture with iterator).
msg31001 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-01-22 17:45

Ok. This may sound stupid but I couldn't find a way to attach a file to this bug report. So I am copying the code here:

************
import sys

line = sys.stdin.readline()
while (line):
    print  line,
    line = sys.stdin.readline()

print "eof"
*************
msg31002 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-01-24 17:20

I tested two kinds of inputs with iter and noiter verisons. I posted "noter" code and OP's code is the iter version.

1) For input without newline at all (line1<CTRL-D><CTRL-D><CTRL-D>) behaves same with both versions.
2) The noiter version prints "eof" with "line1\n<CTRL-D>" while the iter version requires an additional CTRL-D. This is because iter version uses read ahead which is implemented using fread() . A simple C program using fread() behaves exactly same way. 

I tested on Linux but am sure windows behaviour (as posted by  gagenellina) will have same reasons. Since the issue is with platform's stdio library, I don't think python should fix anything here. However, it may be worthwhile to mention something about this in documentation. I will open a bug for this purpose. 





msg31003 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-04-25 18:04

BTW, I opened bug 1643712 for doc change.
msg31004 - (view) Author: Mark Harrison (marhar) Date: 2007-08-10 20:01
I think this should be considered a bug.  These
two command lines (on unix) should behave the same:

cat | ./foo.py
./foo.py

But they do not.  The first (using cat) behaves typically,
needing only one control-D.  The second needs the two
control-D's as noted.
msg113247 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-08 10:42
This is fixed in py3k but still exists in 2.7.
msg113264 - (view) Author: √Čric Araujo (eric.araujo) * (Python committer) Date: 2010-08-08 13:48
Benjamin, is it too late too have this fixed in 2.7?
msg124124 - (view) Author: Vetoshkin Nikita (nvetoshkin) Date: 2010-12-16 12:12
I guess http://bugs.python.org/issue1195 might be related
msg152176 - (view) Author: Ralph Corderoy (ralph.corderoy) Date: 2012-01-28 18:31
This most certainly is a bug under Unix and an annoying one.  "Since the
issue is with platform's stdio library" is wrong;  stdio is being used
incorrectly.  It would be nice to see it fixed in the 2.x line.

I've two test programs.

    $ head -42 stdin2.6 stdin3.1
    ==> stdin2.6 <==
    #! /usr/bin/python2.6

    import sys

    for l in sys.stdin:
        print repr(l)
    print 'end'

    ==> stdin3.1 <==
    #! /usr/bin/python3.1

    import sys

    for l in sys.stdin:
        print(repr(l))
    print('end')

    $

For both of them I will type "1 Enter 2 Enter 3 Enter Ctrl-D" without
the spaces, Ctrl-D being my tty's EOF, stty -a.

    $ ./stdin2.6
    1
    2
    3
    '1\n'
    '2\n'
    '3\n'

On the EOF the first iteration of sys.stdin returns and then so do the
others with the buffered lines.  The loop doesn't terminate, a second
Ctrl-D is required, giving.

    end
    $

Next,

    $ ./stdin3.1
    1
    '1\n'
    2
    '2\n'
    3
    '3\n'
    end
    $

perfect output.  Only one Ctrl-D required and better still each line is
returned as it's entered.

ltrace shows python2.6 uses fread(3).  I'm assuming it treats only a
zero return as EOF whereas whenever the return value is less than the
number of requested elements, EOF could have been reached;  feof(3) must
be called afterwards to decide.  Really, ferror(3) should also be called
to see if, as well as returning some elements, an error was detected.

It's this lack of feof() that means the second fread() is required to
trigger the flawed `only 0 return is EOF' logic.

Here's some C that shows stdio works fine if feof() and ferror() are
combined with fread().

    #include <stdio.h>

    int main(void)
    {
        unsigned char s[8192], *p;
        size_t n;

        while ((n = fread(s, 1, 8192, stdin))) {
            printf("%zd", n);
            p = s;
            while (n--)
                printf(" %02hhx", *p++);
            putchar('\n');

            if (feof(stdin)) {
                puts("end");
                break;
            }
            if (ferror(stdin)) {
                fputs("error", stderr);
                return 1;
            }
        }

        return 0;
    }
msg257703 - (view) Author: Dan Kegel (dankegel) Date: 2016-01-07 16:58
Still present in python 2.7.9, but fixed in python 3.4.3.
Also, in python 3.4.3, output is immediate, there seems to be no
input buffering.
msg259630 - (view) Author: Don Hatch (Don Hatch) Date: 2016-02-05 04:56
I've reported the unfriendly input withholding that several people have
observed and mentioned here as a separate bug: http://bugs.python.org/issue26290 . The symptom is different but I suspect it has exactly the same underlying cause (incorrect use of stdio) and fix that Ralph Corderoy has described clearly here.
msg280848 - (view) Author: Martijn Pieters (mjpieters) * Date: 2016-11-15 14:26
This bug affects all use of `file.__iter__` and interrupts (EINTR), not just sys.stdin.

You can reproduce the issue by reading from a (slow) pipe in a terminal window and resizing that window, for example; the interrupt is not handled and a future call ends up raising `IOError: [Errno 0] Error`, a rather confusing message.

The Mercurial community is switching away from using direct iteration over this bug; Jun's excellent analysis is included and enlightening:

   https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-November/090522.html

The fix is to use

    interrupted = ferror(f->f_fp) && errno == EINTR;
    // ..
    if (interrupted) {
        clearerr(f->f_fp);
        if (PyErr_CheckSignals()) {
            Py_DECREF(v);
            return NULL;
        }
    }

and check for interrupted == 0 in the chunksize == 0 case after Py_UniversalNewlineFread calls, as file_read does, for example, but which readahead doesn't (where the only public user of readahead is file_iternext).
msg280851 - (view) Author: Martijn Pieters (mjpieters) * Date: 2016-11-15 14:35
It looks like readahead was missed when http://bugs.python.org/issue12268 was fixed.
msg280854 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-15 14:45
Martijn Pieters: Sadly, Python 2 I/O are full of bugs in corner cases :-/

First of all, in most cases, Python 2 uses the libc for I/O, but the libc has known bugs including segfaults:
https://haypo-notes.readthedocs.io/python.html#bugs-in-the-c-stdio-used-by-the-python-i-o

Python 3 is better to handle EINTR. EINTR should now be "fully supported" in Python 3.5 thanks for the PEP 475. I mean in the Python core, I don't expect that any third party implement the PEP 475. Hopefully, most third party module don't implement syscall wrappers themself, but reuse Python which handles EINTR for them.

To come back to Python 2: yeah, we still have to fix issues to make the code more robust in corner cases, and enhance error reporting. It seems like fread() errors are not checked correctly in some places.
msg280856 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-15 14:53
I don't see any simple solution to get a 100% reliable I/O stack on Python 2.

Python 3.5 contains a pure Python implementation of the io module: _pyio.FileIO uses os.read() and os.write(). In Python 3.4 and older, the _pyio still used io.FileIO (implemented in C). But try to recall Python 3.0 which had *very* bad I/O performance because its io module was fully implemented in pure Python!

The uvloop project proved that Python can be very efficient for (network) I/O using code written with Cython. But I also know that Mercurial cares of PyPy which is not really Cython-friendly.

Even if fread() bugs are fixed in Python 2.7.x+1, you will still hit bugs on Python 2.7.x and older.

Maybe it can be a strong motivation to pursue your Python 3 efforts :-)
msg280865 - (view) Author: Jun Wu (quark) Date: 2016-11-15 16:31
haypo: The file.__iter__ EINTR issue may be better discussed as a separate bug report. It's not related to stdin or EOF or Windows.

Since we have some EINTR fixes for Python 2.7.4, I think it's reasonable to fix the remaining EINTR issues for 2.7.13+.

If I have read fileobject.c correctly, readahead() is the only remaining place having the EINTR issue.

If you agree that we should fix readahead(), I can prepare the patch.
History
Date User Action Args
2016-11-15 16:32:00quarksetnosy: + quark
messages: + msg280865
2016-11-15 14:53:02vstinnersetmessages: + msg280856
2016-11-15 14:45:45vstinnersetnosy: + vstinner
messages: + msg280854
2016-11-15 14:35:14mjpieterssetmessages: + msg280851
2016-11-15 14:26:27mjpieterssetnosy: + mjpieters
messages: + msg280848
2016-06-21 12:58:01martin.panterlinkissue26290 superseder
2016-02-05 04:56:10Don Hatchsetnosy: + Don Hatch
messages: + msg259630
2016-01-07 16:58:44dankegelsetnosy: + dankegel
messages: + msg257703
2014-02-03 19:04:33BreamoreBoysetnosy: - BreamoreBoy
2013-01-13 15:38:06jarysetnosy: + jary
2012-01-28 18:31:19ralph.corderoysetnosy: + ralph.corderoy
messages: + msg152176
2010-12-16 12:12:57nvetoshkinsetnosy: + nvetoshkin
messages: + msg124124
2010-12-15 08:46:59Finkreghsetnosy: + Finkregh
2010-08-08 13:48:05eric.araujosetnosy: + benjamin.peterson, eric.araujo
messages: + msg113264
2010-08-08 10:42:01BreamoreBoysetnosy: + BreamoreBoy

messages: + msg113247
versions: + Python 2.7, - Python 2.6
2009-08-05 02:18:00r_mosaicsetnosy: + r_mosaic
2009-03-30 19:05:01ajaksu2linkissue1643712 dependencies
2009-03-30 18:48:08ajaksu2settitle: for line in sys.stdin: doesn't notice EOF the first time -> for line in sys.stdin: doesn't notice EOF the first time
stage: test needed
type: behavior
versions: + Python 2.6, - Python 2.5
2007-01-12 10:34:13dokocreate