classification
Title: document expected/required behavior of 3.x io subsystem with respect to buffering
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.0, Python 3.1
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, pitrou, r.david.murray
Priority: normal Keywords:

Created on 2009-02-19 21:36 by r.david.murray, last changed 2009-03-06 01:34 by benjamin.peterson. This issue is now closed.

Messages (6)
msg82499 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-02-19 21:36
The python 3.x io library appears to allow mixing calls to __next__ and
calls to readline.  As another consequence, where previously it was
necessary to use 'readline' to read single lines from a pipe without
blocking waiting for a buffer fill, now one can apparently use 'for'.

Reading the code in io.py, it seems as though this is intentional:
readline and __next__ use the same buffer, and the method on
TextIOWrapper that fills the buffer calls 'read1', which in the one
place where it has a docstring says that only as much data as is
available is read, in contrast to 'read', which reads exactly n bytes
(to EOF).

Assuming this is the intended behavior, it should be documented, and if
they do not exist (I didn't see any) tests should be added to confirm
the behavior.  If this behavior is an implementation artifact, then this
should be documented so that code is less likely to depend on it.

Once I know whether or not this behavior _should_ be part of the test
suite, I'll be happy to work on doc and test patches.
msg82555 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-02-20 23:05
Yes, this behavior is documented.

There is one test that mixes iteration and readline, in 
test.test_io.TextIOWrapperTest.testSeeking, but it shows a difference 
between the two methods: to be faster, __next__ disables the tell() 
function.
msg82559 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-02-21 05:01
Heh, I was reading the code instead of the documentation.  Silly me :).

The 'buffering' argument of open, and the 'line_buffering' argument of
TextIOWrapper do address my documentation concern about whether or not
readline can be used to read lines from a source without blocking as
long as at least one line is available.  However, I do not see any
documentation of the relationship between readline and next, except the
negative one that no restriction on mixing them is documented.  Since
there used to be such a restriction, noting that there isn't one is
probably worthwhile.  Or perhaps better, a specification that '__next__'
calls 'readline'.  (And the behavior difference with respect to tell,
which I don't fully understand, should also be documented.)

TextIOWrapper says it wraps a BufferedIOBase object.  The doc for that
class talks about read possibly doing readahead for a 'non-interactive'
file.  What about pipes?  They are not mentioned, and it would be good
to have short reads on pipes for the same reason that it is good to have
short reads on ttys.  The IOBase only specifies an 'isatty' method.

It took me a while to understand what was going on well enough to write
the above paragraphs, partly because I read the code first :).  I'm
finding the documentation confusing, so I might as well talk about the
issues I ran into.

The implementation of TextIOWrapper calls the buffer's 'read1' method in
_read_chunk.  But BufferedIOBase does not define read1, nor does its
base class, IOBase.  open does pass TextIOWrapper a BufferedReader
object which does define read1, which is why it works, but I don't see
any documentation saying that the buffer argument to TextIOWrapper must
provide a 'read1' method with certain semantics.  The docs indicate
BufferedIOBase is the minimum requirement.  Is the definition of 'read1'
missing from BufferedIOBase?

I also notice that while open passes TextIOWrapper an appropriate value
for line_buffering (assuming we ignore the pipe issue), TextIOWrapper
ignores it for reading.  I suppose that that is an implementation detail
only.

After staring at it for a while, I finally came to an understanding of
the statement in BufferedIOBase that "A typical implementation should
not inherit from a RawIOBase implementation, but wrap one like
BufferedWriter and BufferedReader."  I understand this to mean that
BufferedIOBase is really an ABC and should be concretized as a wrapper
rather than doing multiple inheritance with RawIOBase.  I don't think I
would have figured that out without reading the code, though.  I believe
that adding the word 'do' at the end of that sentence would make the
meaning clear.

A similarly confusing bit is the first sentence of the documentation of
TextIOWrapper.  Currently it says "A buffered text stream over a
BufferedIOBase raw stream..."  I think it would be clearer if it said
something like "A buffered text stream wrapper around a BufferedIOBase
derived wrapper around a raw stream..."  That's a bit unwieldy, but it
is nontheless (IMO) more comprehensible.

I will look at the tests more carefully and make sure that all my use
cases are in fact covered.

--RDM
msg82565 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-21 13:14
RDM, all the classes you mentioned should indeed be able to do "short
reads" on pipes, sockets and the like. That's how they are tested in
test_io.py: against mock raw i/o classes which only return a few bytes
at a time (e.g. only 5 bytes will be filled in a 4096-byte buffer).

However, I encourage you once again to *experiment* with the 3.x i.o
library and share your results with us. This is the best way for us to
know whether common use cases are really covered.

(Amaury, as for the `telling` flag in TextIOWrapper, we should so some
performance measurements, and then suppress it if there's no difference)
msg82568 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-02-21 14:43
On Sat, 21 Feb 2009 at 13:14, Antoine Pitrou wrote:
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> RDM, all the classes you mentioned should indeed be able to do "short
> reads" on pipes, sockets and the like. That's how they are tested in
> test_io.py: against mock raw i/o classes which only return a few bytes
> at a time (e.g. only 5 bytes will be filled in a 4096-byte buffer).

My questions in the last comment were directed at trying to clarify
the documentation.

I think my most important point there is whether or not 'read1' should
be added to the BufferedIOBase ABI.  I believe it should be, since
if a class derived from BufferedIOBase does not implement it and is
passed to TextIOWrapper, it will fail.

> However, I encourage you once again to *experiment* with the 3.x i.o
> library and share your results with us. This is the best way for us to
> know whether common use cases are really covered.

As I said, I plan to do so.  I needed to understand the intent first,
though, and reading the docs resulted in some doc questions.  Should I
be opening each point in a separate issue and/or providing a suggested
doc patch?  I'm new to trying to help out via the tracker, so best
practice pointers are welcome.

--RDM
msg83232 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-03-06 01:24
I have now had time to test this.  My use case (reading text from a pipe
and wanting the most recent complete line returned immediatly as it
is avaialable) is satisified by both the io.py code and the io c
code from the py3k branch.  I haven't been able to figure out how
to write an automated test, though :(.

I also had a fun time (I mean that literally) wandering through the code
and docs convincing myself that the 3.1a0 docs cover all the issues I
raised in this ticket.  I played around with a test case that helped
convince me the implementations match the 3.1a0 docs for the issues I
was concerned about.

So, IMO this ticket can be closed as fixed in 3.1a0.

--RDM
History
Date User Action Args
2009-03-06 01:34:49benjamin.petersonsetstatus: open -> closed
resolution: fixed
2009-03-06 01:24:06r.david.murraysetmessages: + msg83232
2009-02-21 14:43:16r.david.murraysetmessages: + msg82568
2009-02-21 13:14:13pitrousetnosy: + pitrou
messages: + msg82565
2009-02-21 05:01:46r.david.murraysetmessages: + msg82559
2009-02-20 23:05:54amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg82555
2009-02-19 21:36:08r.david.murraycreate