Author eggy
Recipients eggy
Date 2011-01-20.14:12:26
SpamBayes Score 6.60583e-15
Marked as misclassified No
Message-id <1295532751.32.0.53142512873.issue10956@psf.upfronthosting.co.za>
In-reply-to
Content
In both Python versions EINTR is not handled properly in the file.write and file.read methods. 

------------------------- file.write -------------------------
In Python 2, file.write can write a short amount of
bytes, and when it is interrupted there is no way to tell how many bytes it
actually wrote. In Python 2 it raises an IOError with EINTR, whereas in Python 3 it
simply stops writing and returns the amount of bytes written.

Here is the output of fwrite with Python 2.7 (see attached files). Note also how inconsistent the IOError vs OSError difference is:

python2.7 fwrite.py
Writing 100000 bytes, interrupt me with SIGQUIT (^\)
^\^\(3, <frame object at 0x9535ab4>)
Traceback (most recent call last):
  File "fwrite.py", line 16, in <module>
    print(write_file.write(b'a' * 100000))
IOError: [Errno 4] Interrupted system call
read 65536 bytes
^\(3, <frame object at 0x9535ab4>)
Traceback (most recent call last):
  File "fwrite.py", line 21, in <module>
    print('read %d bytes' % len(os.read(r, 100000)))
OSError: [Errno 4] Interrupted system call

Because os.read blocks on the second call to read, we know that only 65536 of
the 100000 bytes were written.

------------------------- file.read -------------------------
When interrupting file.read in Python 3, it may have read bytes that are inaccessible. 
In Python 2 it returns the bytes, whereas in Python 3 it
raises an IOError with EINTR.

A demonstration:

$ python3.2 fread.py
Writing 7 bytes
Reading 20 bytes... interrupt me with SIGQUIT (^\)
^\(3, <frame object at 0x8e1d2d4>)
Traceback (most recent call last):
  File "fread.py", line 18, in <module>
    print('Read %d bytes using file.read' % len(read_file.read(20)))
IOError: [Errno 4] Interrupted system call
Reading any remaining bytes...
^\(3, <frame object at 0x8e1d2d4>)
Traceback (most recent call last):
  File "fread.py", line 23, in <module>
    print('reading: %r' % os.read(r, 4096))
OSError: [Errno 4] Interrupted system call

Note how in Python 2 it stops reading when interrupted and it returns our
bytes, but in Python 3 it raises IOError while there is no way to access the
bytes that it read.

So basically, this behaviour is just plain wrong as EINTR is not an error, and
this behaviour makes it impossible for the caller to handle the situation
correctly.

Here is how I think Python should behave. I think that it should be possible to 
interrupt both read and write calls, however, it should also be possible for the user to handle these cases. 

file.write, on EINTR, could decide to continue writing if no Python signal handler raised an exception.
Analogously, file.read could decide to keep on reading on EINTR if no Python signal handler raised an exception.

This way, it is possible for the programmer to write interruptable code while
at the same time having proper file.write and file.read behaviour in case code
should not be interrupted.
KeyboardInterrupt would still interrupt read and write calls, because it
raises an exception. If the programmer decided that writes should finish
before allowing such an exception, the programmer could replace the default
signal handler for SIGINT. 

So, in pseudo-code:
    
    bytes_written = 0

    while bytes_written < len(buf):
        result = write(buf)
        
        if result < 0:
            if errno == EINTR 
                if PyErr_CheckSignals() < 0:
                    /* Propagate exception from signal handler */
                    return NULL
                continue
            else:
                PyErr_SetFromErrno(PyExc_IOError)
                return NULL
        
        buf += result
        bytes_written += result

    return bytes_written

Similar code could be used for file.read with the obvious adjustments.

However, in case of an error (either from the write call or from a Python signal handler), 
it would still be unclear how many bytes were actually written. Maybe (I think
this part would be bonus points) we could put the number of bytes written on the exception object in this case, or make it retrievable in some other thread-safe way.

For files with file descriptors in nonblocking mode (and maybe other cases) it will still return a short amount of bytes.
History
Date User Action Args
2011-01-20 14:12:31eggysetrecipients: + eggy
2011-01-20 14:12:31eggysetmessageid: <1295532751.32.0.53142512873.issue10956@psf.upfronthosting.co.za>
2011-01-20 14:12:26eggylinkissue10956 messages
2011-01-20 14:12:26eggycreate