classification
Title: buffered read() and write() does not raise BlockingIOError
Type: behavior Stage: patch review
Components: IO, Library (Lib) Versions: Python 3.4, Python 3.3, Python 3.2, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: anacrolix, benjamin.peterson, neologix, petri.lehtinen, pitrou, python-dev, sbt, stutzbach, vadmium
Priority: normal Keywords: patch

Created on 2011-11-02 15:08 by sbt, last changed 2014-06-14 09:22 by vadmium.

Files
File name Uploaded Description Edit
blockingioerror.py sbt, 2011-11-02 15:08
blockingioerror.py sbt, 2011-11-02 22:28
write_blockingioerror.patch sbt, 2011-11-05 00:20 review
write_blockingioerror.patch sbt, 2011-11-07 17:52 review
write_blockingioerror.patch sbt, 2011-11-15 12:24 review
write_blockingioerror.patch sbt, 2011-11-18 14:35
Messages (28)
msg146841 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-02 15:08
According to the the documentation, BufferedReader.read() and BufferedWriter.write() should raise io.BlockingIOError if the file is in non-blocking mode and the operation cannot succeed without blocking.

However, BufferedReader.read() returns None (which is what RawIOBase.read() is documented as doing), and BufferedWriter.write() raises IOError with a message like

    raw write() returned invalid length -1 (should have been 
    between 0 and 5904)

I tested this on Linux with Python 2.6, 2.7 and 3.x.

Attached is a unit test.
msg146878 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-02 22:28
BufferedReader.readinto() should also raise BlockingIOError according to the docs.  Updated unittest checks for that also.

BTW, The documentation for BufferedIOBase.read() says that BlockingIOError should be raised if nothing can be read in non-blocking mode.  BufferedReader inherits from BufferedIOBase and overrides the read() method.  This is the documentation for BufferedReader.read():

    read([n])
        Read and return n bytes, or if n is not given or negative, 
        until EOF or if the read call would block in non-blocking mode.

This sentence is complete gobbledygook, and it makes no mention of what should happen if nothing can be read in non-blocking mode.   So I presume behaviour for BufferedReader.read() should match the documented behaviour for BufferedIOBase.read().
msg146936 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-03 15:14
Wierdly, it looks like BlockingIO is not raised anywhere in the code for the C implementation of io.

Even more wierdly, in the Python implementation of io, BlockingIOError is only ever raised by except clauses which have already caught BlockingIOError.  So, of course, these clauses are dead code.

The only code in CPython which can ever succesfully raise BlockingIOError is MockNonBlockWriterIO.write() in test/test_io.py.

I don't know what the correct behaviour is for flush() and close() if you get EAGAIN.  I think flush() should raise an error rather than blocking, and that close() should delegate to self.raw.close() before raising the error.

The docs say that read(), readinto() and write() can raise BlockingIOError.  But what should readall() and readline() do?  Should we just try to emulate whatever Python's old libc IO system did (with BlockingIOError replacing IOError(EAGAIN))?
msg146940 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-03 15:49
> Wierdly, it looks like BlockingIO is not raised anywhere in the code
> for the C implementation of io.

That would explain why it isn't raised :)

This is a hairy issue: read(n) is documented as returning either n bytes or nothing. But what if less than n bytes are available non-blocking? Currently we return a partial read. readline() behaviour is especially problematic:

>>> fcntl.fcntl(r, fcntl.F_SETFL, os.O_NDELAY)
0
>>> rf = open(r, mode='rb')
>>> os.write(w, b'xy')
2
>>> rf.read(3)
b'xy'
>>> os.write(w, b'xy')
2
>>> rf.readline()
b'xy'

We should probably raise BlockingIOError in these cases, but that complicates the implementation quite a bit: where do we buffer the partial data? The internal (fixed size) buffer might not be large enough.

write() is a bit simpler, since BlockingIOError has a "characters_written" attribute which is meant to inform you of the partial success: we can just reuse that. That said, BlockingIOError could grow a "partial_read" attribute containing the read result...

Of course, we may also question whether it's useful to use buffered I/O objects around non-blocking file descriptors; if you do non-blocking I/O, you generally want to be in control, which means not having any implicit buffer between you and the OS.

(this may be a topic for python-dev)
msg146998 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-04 08:24
> This is a hairy issue

Indeed.

Performing partial read/write may sound imperfect, but using buffered I/O around non-blockind FD is definitely not a good idea.
Also, the advantage of the current approach is that at least, no data is ever lost (and changing the behavior to raise a BlockingIOError might break some code out there in the wild).

Note that Java's BufferedInputStream and ReadableByteChannel also return partial reads.

So I'm somewhat inclined to keep the current behavior (it would however probably be a good idea to update the documentation to warn about this limitation, though).
msg147003 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-04 14:03
> Also, the advantage of the current approach is that at least, no data
> is ever lost

But what about the buggy readline() behaviour?
msg147004 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-04 14:33
> Note that Java's BufferedInputStream and ReadableByteChannel also 
> return partial reads.

Apparently, they are specified to, even for blocking streams (which I find a bit weird, and the language in the docs seems deliberately vague). Python's buffered read(), though, is specified to return the requested number of bytes (unless EOF happens).
msg147009 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-04 15:04
No one has suggested raising BlockingIOError and DISCARDING the data when a partial read has occurred.  The docs seem to imply that the partially read data should be returned since they only say that BlockingIOError should be raised if there is NOTHING to read.  Clearly this should all be spelt out properly.

That leaves the question of whether, when there is NOTHING to read, BlockingIOError should be raised (as the docs say) or None should be returned (as is done now).  I don't mind either way as long as the docs match reality.

The part which really needs addressing is partial writes.  Currently, if a write fails with EAGAIN then IOError is raised and there is no way to work out how much data was written/buffered.  The docs say that BlockingIOError should be raised with the e.args[2] set to indicate the number of bytes written/buffered.  This at least should be fixed.

I will work on a patch.
msg147011 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-04 15:31
> But what about the buggy readline() behaviour?

Just tell people that if the return value is a string which does not end in '\n' then it might caused by EOF or EAGAIN.  They can just call readline() again to check which.
msg147012 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-04 15:48
The third arg of BlockingIOError is used in two quite different ways.

In write(s) it indicates the number of bytes of s which have been "consumed" (ie written to the raw file or buffered).

But in flush() and flush_unlocked() (in _pyio) it indicates the number of bytes from the internal buffer which have been written to the raw file.

I think this explains the following comment in write():

                # We're full, so let's pre-flush the buffer
                try:
                    self._flush_unlocked()
                except BlockingIOError as e:
                    # We can't accept anything else.
                    # XXX Why not just let the exception pass through?
                    raise BlockingIOError(e.errno, e.strerror, 0)

I don't think flush() should try to tell us how many bytes were flushed: we only need to know whether we need to try again.
msg147013 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-04 16:25
> Apparently, they are specified to, even for blocking streams (which 
> I find a bit weird, and the language in the docs seems deliberately
> vague). 

"""
As an additional convenience, it attempts to read as many bytes as possible by repeatedly invoking the read method of the underlying stream. This iterated read continues until one of the following conditions becomes true:

The specified number of bytes have been read,
The read method of the underlying stream returns -1, indicating end-of-file, or
The available method of the underlying stream returns zero, indicating that further input requests would block.
"""

As I understand it, it will return the number of bytes asked, unless EOF or EAGAIN/EWOULDBLOCK. It would seem reasonable to me to add the same note for non-blocking FDs to Python's read().

>> But what about the buggy readline() behaviour?
> Just tell people that if the return value is a string which does not 
> end in '\n' then it might caused by EOF or EAGAIN.  They can just call 
> readline() again to check which.

Sounds reasonable.

> No one has suggested raising BlockingIOError and DISCARDING the data 
> when a partial read has occurred.

The problem is that if we raise BlockingIOError, we can only buffer a limited amount of data.

> The docs seem to imply that the partially read data should be returned 
> since they only say that BlockingIOError should be raised if there is 
> NOTHING to read.  Clearly this should all be spelt out properly.

Agreed.

> That leaves the question of whether, when there is NOTHING to 
> read, BlockingIOError should be raised (as the docs say) or None
> should be returned (as is done now).

I don't have a string feeling: if we don't raise BlockingIOError on partial reads, then it probably makes sense to keep None.
msg147023 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-04 18:32
Currently a BlockingIOError exception raised by flush() sets
characters_written to the number of bytes fushed from the internal
buffer.  This is undocument (although there is a unit test which tests
for it) and causes confusion because characters_written has conflicting
meanings depending on whether the exception was raised by flush() or
write().  I would propose setting characters_written to zero on
BlockingIOError exceptions raised by flush().  Are there any reasons not
to make this change?

Also, the docs say that the raw file wrapped by
BufferedReader/BufferedWriter should implement RawIOBase.  This means
that self.raw.write() should return None instead of raising
BlockingIOError.  But the implementation tries to cope with
BlockingIOError coming from a raw write.  In fact, the
MockNonBlockWriterIO class in unit tests is used as a raw file, but its
write() method raises BlockingIOError.

It would simplify matters a lot to insist that raw files implement
RawIOBase properly.

BTW, when I try to change characters_written of an existing
BlockingIOError exception using the pointer returned by
_buffered_check_blocking_error(), it appears not to work: the exception
continues to have characters_written == 0 -- not sure why...
msg147027 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-04 19:38
> >> But what about the buggy readline() behaviour?
> > Just tell people that if the return value is a string which does not 
> > end in '\n' then it might caused by EOF or EAGAIN.  They can just call 
> > readline() again to check which.
> 
> Sounds reasonable.

But then what's the point of using buffered I/O at all? If it can't
offer anything more than raw I/O, I'd rather do something like raise a
RuntimeError("buffered I/O doesn't work with non-blocking streams") when
the raw stream returns None. Returning partial results on a buffered's
readline() is not something we should ever do.

(actually, raw I/O readline() is probably buggy as well)
msg147028 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-04 20:19
> But then what's the point of using buffered I/O at all? If it can't
> offer anything more than raw I/O, I'd rather do something like raise
> a RuntimeError("buffered I/O doesn't work with non-blocking streams")
> when the raw stream returns None.

Well, ideally it should be an UnsupportedOperation, but that's an option. The only think I didn't like about this is that we should ideally raise this error upon the first method call, not when - and if - we receive EAGAIN.
Another possibility would be that, since lines are usually reasonably sized, they should fit in the buffer (which is 8KB by default). So we could do the extra effort of buffering the data and return it once the line is complete: if the buffer fills up before we got the whole line, then we could raise a RuntimeError("Partial line read"). Note that I didn't check if it's easily feasible (i.e. we should avoid introducing kludges in the I/O layer just to handle thi corner case).

> Returning partial results on a buffered's readline() is not something
> we should ever do.

Yeah, I know.
Java made the choice of making readline() block, which is IMHO even worse (I mean, it defeats the whole point of non-blocking I/O...).
msg147048 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-04 21:46
> Another possibility would be that, since lines are usually reasonably
> sized, they should fit in the buffer (which is 8KB by default). So we 
> could do the extra effort of buffering the data and return it once the 
> line is complete: if the buffer fills up before we got the whole line, 
> then we could raise a RuntimeError("Partial line read"). Note that I 
> didn't check if it's easily feasible (i.e. we should avoid introducing 
> kludges in the I/O layer just to handle thi corner case).

Discarding data rarely is worse than always throwing an exception.
msg147061 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-05 00:20
The attached patch makes BufferedWrite.write() raise BlockingIOError when the raw file is non-blocking and the write would block.
msg147071 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-05 07:30
> write() is a bit simpler, since BlockingIOError has 
> a "characters_written" attribute which is meant to inform you of the 
> partial success: we can just reuse that. That said, BlockingIOError 
> could grow a "partial_read" attribute containing the read result...

Now that I think about it, it's probably the best solution:
always raise a BlockingIOError in case of partial write, with characters_written set correctly (sbt's patch).
And do the same thing on partial read/readline, and return the partially read data as an attribute of BlockingIOError (we could also return a characters_read that would indicate the exact number of bytes read: then the user could call read()/read_into() with exactly characters_read).
That could certainly break existing - sloppy - code, but this would be more much consistent than the current behavior.
msg147242 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-07 17:52
Testing the patch a bit more thoroughly, I found that data received from the readable end of the pipe can be corrupted by the C implementation.  This seems to be because two of the previously dormant codepaths did not properly maintain the necessary invariants.

I got the failures to go away by adding

        self->pos += avail;

in two places.  However, I really do not know what all the attributes mean.  (Should self->raw_pos also be modified...?)  Someone familiar with the code would need to check whether things are being done properly.  This new patch adds some XXX comments in places  in bufferedio.c which I am unsure about.
msg147387 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-10 00:06
Hi,

> Testing the patch a bit more thoroughly, I found that data received
> from the readable end of the pipe can be corrupted by the C
> implementation.  This seems to be because two of the previously
> dormant codepaths did not properly maintain the necessary invariants.

Ouch. Were they only non-blocking codepaths?

> in two places.  However, I really do not know what all the attributes
> mean.  (Should self->raw_pos also be modified...?)

raw_pos is the position which the underlying raw stream is currently at.
It only needs to be modified when a successful write(), read() or seek()
is done on the raw stream.

Another comment: you set errno to EAGAIN, but it is not sure that was
the actual errno raised by the raw stream (although that's quite
likely). You might want to reflect the actual C errno (but you'd better
set it to 0 before the system call, then).
msg147405 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-10 14:57
> Ouch. Were they only non-blocking codepaths?

Yes.

> raw_pos is the position which the underlying raw stream is currently
> at.  It only needs to be modified when a successful write(), read()
> or seek() is done on the raw stream.

Do you mean self->raw_pos should give the same answer as self.raw.tell()?  (But that seems to be the definition of self->abs_pos.)  Or is it the buffer offset which corresponds to self.raw.tell()?
msg147409 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-10 20:13
> Do you mean self->raw_pos should give the same answer as
> self.raw.tell()?  (But that seems to be the definition of
> self->abs_pos.)  Or is it the buffer offset which corresponds to
> self.raw.tell()?

The latter.
msg147664 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-15 12:24
Here is an updated patch which uses the real errno.

It also gets rid of the restore_pos argument of _bufferedwriter_flush_unlocked() which is always set to false -- 
I guess buffered_flush_and_rewind_unlocked() is used instead.
msg147682 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-15 15:16
Thanks again. Just a nit: the tests should be in MiscIOTest, since they don't directly instantiate the individual classes. Also, perhaps it would be nice to check that the exception's "errno" attribute is EAGAIN.
msg147875 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-18 14:35
> Thanks again. Just a nit: the tests should be in MiscIOTest, since 
> they don't directly instantiate the individual classes. Also, perhaps 
> it would be nice to check that the exception's "errno" attribute is 
> EAGAIN.


Done.
msg147916 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-18 19:26
Thanks. Who should I credit? "sbt"?
msg147923 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-18 21:22
> Thanks. Who should I credit? "sbt"?

Yeah, thanks.
msg148074 - (view) Author: Roundup Robot (python-dev) Date: 2011-11-21 19:31
New changeset ac2c4c62b486 by Antoine Pitrou in branch '3.2':
Issue #13322: Fix BufferedWriter.write() to ensure that BlockingIOError is
http://hg.python.org/cpython/rev/ac2c4c62b486

New changeset 3cd1985ed04f by Antoine Pitrou in branch 'default':
Issue #13322: Fix BufferedWriter.write() to ensure that BlockingIOError is
http://hg.python.org/cpython/rev/3cd1985ed04f

New changeset e84e17643eeb by Antoine Pitrou in branch '2.7':
Issue #13322: Fix BufferedWriter.write() to ensure that BlockingIOError is
http://hg.python.org/cpython/rev/e84e17643eeb
msg151937 - (view) Author: Matt Joiner (anacrolix) Date: 2012-01-25 08:40
The patches only fix write? What about read?

http://bugs.python.org/issue13858
History
Date User Action Args
2014-06-14 09:22:01vadmiumsetnosy: + vadmium

versions: + Python 3.4
2012-01-25 08:40:06anacrolixsetnosy: + anacrolix
messages: + msg151937
2012-01-25 08:22:32neologixlinkissue13858 superseder
2011-11-21 19:31:25python-devsetnosy: + python-dev
messages: + msg148074
2011-11-18 21:22:35sbtsetmessages: + msg147923
2011-11-18 19:26:16pitrousetmessages: + msg147916
2011-11-18 14:35:45sbtsetfiles: + write_blockingioerror.patch

messages: + msg147875
2011-11-15 15:16:05pitrousetmessages: + msg147682
stage: needs patch -> patch review
2011-11-15 12:24:27sbtsetfiles: + write_blockingioerror.patch

messages: + msg147664
2011-11-10 20:13:48pitrousetmessages: + msg147409
2011-11-10 14:57:03sbtsetmessages: + msg147405
2011-11-10 00:06:08pitrousetmessages: + msg147387
2011-11-07 17:52:41sbtsetfiles: + write_blockingioerror.patch

messages: + msg147242
2011-11-05 07:30:22neologixsetmessages: + msg147071
2011-11-05 00:20:56sbtsetfiles: + write_blockingioerror.patch
keywords: + patch
messages: + msg147061
2011-11-04 21:46:28sbtsetmessages: + msg147048
2011-11-04 20:19:26neologixsetmessages: + msg147028
2011-11-04 19:38:37pitrousetmessages: + msg147027
2011-11-04 18:32:17sbtsetmessages: + msg147023
2011-11-04 16:25:03neologixsetmessages: + msg147013
2011-11-04 15:48:20sbtsetmessages: + msg147012
2011-11-04 15:31:12sbtsetmessages: + msg147011
2011-11-04 15:04:06sbtsetmessages: + msg147009
2011-11-04 14:33:39pitrousetmessages: + msg147004
2011-11-04 14:03:41pitrousetmessages: + msg147003
2011-11-04 08:24:29neologixsetmessages: + msg146998
2011-11-03 15:49:21pitrousetnosy: + benjamin.peterson, stutzbach, neologix

messages: + msg146940
stage: needs patch
2011-11-03 15:14:18sbtsetmessages: + msg146936
2011-11-02 22:28:30sbtsetfiles: + blockingioerror.py

messages: + msg146878
2011-11-02 19:23:24petri.lehtinensetnosy: + petri.lehtinen
2011-11-02 15:39:12pitrousetnosy: + pitrou

components: + Library (Lib), IO
versions: - Python 2.6, Python 3.1, Python 3.4
2011-11-02 15:12:32sbtsettype: behavior
versions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4
2011-11-02 15:08:55sbtcreate