classification
Title: file.write and file.read don't handle EINTR
Type: behavior Stage: resolved
Components: IO Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: amaury.forgeotdarc, eggy, neologix, pitrou
Priority: normal Keywords: patch

Created on 2011-01-20 14:12 by eggy, last changed 2011-02-25 21:36 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
fwrite.py eggy, 2011-01-20 14:12
fread.py eggy, 2011-01-20 14:13
fwrite.py eggy, 2011-01-20 15:08
eintr_io.patch pitrou, 2011-01-21 00:06
Messages (10)
msg126616 - (view) Author: Mark Florisson (eggy) * Date: 2011-01-20 14:12
In both Python versions EINTR is not handled properly in the file.write and file.read methods. 

------------------------- file.write -------------------------
In Python 2, file.write can write a short amount of
bytes, and when it is interrupted there is no way to tell how many bytes it
actually wrote. In Python 2 it raises an IOError with EINTR, whereas in Python 3 it
simply stops writing and returns the amount of bytes written.

Here is the output of fwrite with Python 2.7 (see attached files). Note also how inconsistent the IOError vs OSError difference is:

python2.7 fwrite.py
Writing 100000 bytes, interrupt me with SIGQUIT (^\)
^\^\(3, <frame object at 0x9535ab4>)
Traceback (most recent call last):
  File "fwrite.py", line 16, in <module>
    print(write_file.write(b'a' * 100000))
IOError: [Errno 4] Interrupted system call
read 65536 bytes
^\(3, <frame object at 0x9535ab4>)
Traceback (most recent call last):
  File "fwrite.py", line 21, in <module>
    print('read %d bytes' % len(os.read(r, 100000)))
OSError: [Errno 4] Interrupted system call

Because os.read blocks on the second call to read, we know that only 65536 of
the 100000 bytes were written.

------------------------- file.read -------------------------
When interrupting file.read in Python 3, it may have read bytes that are inaccessible. 
In Python 2 it returns the bytes, whereas in Python 3 it
raises an IOError with EINTR.

A demonstration:

$ python3.2 fread.py
Writing 7 bytes
Reading 20 bytes... interrupt me with SIGQUIT (^\)
^\(3, <frame object at 0x8e1d2d4>)
Traceback (most recent call last):
  File "fread.py", line 18, in <module>
    print('Read %d bytes using file.read' % len(read_file.read(20)))
IOError: [Errno 4] Interrupted system call
Reading any remaining bytes...
^\(3, <frame object at 0x8e1d2d4>)
Traceback (most recent call last):
  File "fread.py", line 23, in <module>
    print('reading: %r' % os.read(r, 4096))
OSError: [Errno 4] Interrupted system call

Note how in Python 2 it stops reading when interrupted and it returns our
bytes, but in Python 3 it raises IOError while there is no way to access the
bytes that it read.

So basically, this behaviour is just plain wrong as EINTR is not an error, and
this behaviour makes it impossible for the caller to handle the situation
correctly.

Here is how I think Python should behave. I think that it should be possible to 
interrupt both read and write calls, however, it should also be possible for the user to handle these cases. 

file.write, on EINTR, could decide to continue writing if no Python signal handler raised an exception.
Analogously, file.read could decide to keep on reading on EINTR if no Python signal handler raised an exception.

This way, it is possible for the programmer to write interruptable code while
at the same time having proper file.write and file.read behaviour in case code
should not be interrupted.
KeyboardInterrupt would still interrupt read and write calls, because it
raises an exception. If the programmer decided that writes should finish
before allowing such an exception, the programmer could replace the default
signal handler for SIGINT. 

So, in pseudo-code:
    
    bytes_written = 0

    while bytes_written < len(buf):
        result = write(buf)
        
        if result < 0:
            if errno == EINTR 
                if PyErr_CheckSignals() < 0:
                    /* Propagate exception from signal handler */
                    return NULL
                continue
            else:
                PyErr_SetFromErrno(PyExc_IOError)
                return NULL
        
        buf += result
        bytes_written += result

    return bytes_written

Similar code could be used for file.read with the obvious adjustments.

However, in case of an error (either from the write call or from a Python signal handler), 
it would still be unclear how many bytes were actually written. Maybe (I think
this part would be bonus points) we could put the number of bytes written on the exception object in this case, or make it retrievable in some other thread-safe way.

For files with file descriptors in nonblocking mode (and maybe other cases) it will still return a short amount of bytes.
msg126617 - (view) Author: Mark Florisson (eggy) * Date: 2011-01-20 14:13
Here is fread.py (why can you only attach one file at a time? :P)
msg126620 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-20 14:55
What behaviour would you expect instead?
msg126621 - (view) Author: Mark Florisson (eggy) * Date: 2011-01-20 14:56
I think this sums it up: 
file.write, on EINTR, could decide to continue writing if no Python signal handler raised an exception.
Analogously, file.read could decide to keep on reading on EINTR if no Python signal handler raised an exception.
msg126622 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-20 14:59
Oops, sorry, had missed the relevant part in your original message.
msg126623 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-20 15:03
> file.write, on EINTR, could decide to continue writing if no Python
> signal handler raised an exception.
> Analogously, file.read could decide to keep on reading on EINTR if no
> Python signal handler raised an exception.

Ok. This would only be done in buffered mode, though, so your fwrite.py example would have to be changed slightly (drop the ",0" in fdopen()).
msg126624 - (view) Author: Mark Florisson (eggy) * Date: 2011-01-20 15:08
> Ok. This would only be done in buffered mode, though, so your fwrite.py example would have to be changed slightly (drop the ",0" in fdopen()).

Indeed, good catch. So apparently file.write (in buffered mode) is also "incorrect" in Python 3.
msg126668 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-21 00:06
Here is a patch for Python 3.2+.
msg127961 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-05 01:06
I hadn't noticed that issue9504 is similar. The patch there does less things, although it also touches FileIO.readall().
msg129434 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-25 21:36
Committed in r88610 (3.3), r88611 (3.2) and r88612 (2.7).
History
Date User Action Args
2011-02-25 21:36:28pitrousetstatus: open -> closed

messages: + msg129434
resolution: fixed
stage: patch review -> resolved
2011-02-05 01:13:11pitrousetassignee: pitrou

nosy: + neologix
2011-02-05 01:12:58pitrouunlinkissue9504 superseder
2011-02-05 01:06:29pitrousetmessages: + msg127961
2011-02-05 01:05:51pitroulinkissue9504 superseder
2011-01-21 00:06:29pitrousetfiles: + eintr_io.patch

nosy: + amaury.forgeotdarc
messages: + msg126668

keywords: + patch
stage: patch review
2011-01-20 15:08:00eggysetfiles: + fwrite.py

messages: + msg126624
2011-01-20 15:03:09pitrousetmessages: + msg126623
2011-01-20 14:59:52pitrousetmessages: + msg126622
2011-01-20 14:56:57eggysetmessages: + msg126621
2011-01-20 14:55:25pitrousetnosy: + pitrou

messages: + msg126620
versions: - Python 2.6, Python 2.5, Python 3.1
2011-01-20 14:13:11eggysetfiles: + fread.py

messages: + msg126617
2011-01-20 14:12:26eggycreate