io.BufferedReader.peek(): Documentation differs from Implementation #50061

trott · 2009-04-22T06:15:52Z

BPO	5811
Nosy	@ncoghlan, @pitrou, @benjaminp, @vadmium, @desbma
Files	peek.diff peek2.diff peek3.diff peek4.diff peek-one-byte.patch: Document ≥ 1 byte returned

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2009-04-22.06:15:51.564>
labels = ['type-bug', 'library', 'docs']
title = 'io.BufferedReader.peek(): Documentation differs from Implementation'
updated_at = <Date 2020-09-21.18:05:58.451>
user = 'https://bugs.python.org/trott'

bugs.python.org fields:

activity = <Date 2020-09-21.18:05:58.451>
actor = 'desbma'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation', 'Library (Lib)']
creation = <Date 2009-04-22.06:15:51.564>
creator = 'trott'
dependencies = []
files = ['14281', '14282', '14304', '14305', '37661']
hgrepos = []
issue_num = 5811
keywords = ['patch']
message_count = 25.0
messages = ['86274', '86275', '86276', '89311', '89312', '89313', '89314', '89315', '89324', '89326', '89327', '89328', '89329', '89340', '89349', '89366', '89388', '89389', '89399', '89400', '89401', '189599', '233750', '233801', '235024']
nosy_count = 8.0
nosy_names = ['ncoghlan', 'pitrou', 'benjamin.peterson', 'trott', 'conf', 'docs@python', 'martin.panter', 'desbma']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue5811'
versions = ['Python 2.7', 'Python 3.2', 'Python 3.4']

trott · 2009-04-22T06:15:48Z

The documented behavior of io.BufferedReader.peek([n]) states:

peek([n])
Return 1 (or n if specified) bytes from a buffer without advancing the 
position.

Thereas the parameter n is the _max_ length of returned bytes.

Implementation is:

    def _peek_unlocked(self, n=0):
        want = min(n, self.buffer_size)
        have = len(self._read_buf) - self._read_pos
        if have < want:
            to_read = self.buffer_size - have
            current = self.raw.read(to_read)
            if current:
                self._read_buf = self._read_buf[self._read_pos:] + 
current
                self._read_pos = 0
        return self._read_buf[self._read_pos:]

Where you may see that the parameter n is the _min_ length
of returned bytes. So peek(1) will return _not_ just 1 Byte, but the
remaining bytes in the buffer.

trott · 2009-04-22T06:20:09Z

Note: this is also in Python 2.6

trott · 2009-04-22T06:29:14Z

Proposed patch to fix this:

set the default of n to 1 as stated by docs:

def _peek_unlocked(self, n=1):

return n bytes:

return self._read_buf[self._read_pos:self._read_pos+n]

ncoghlan · 2009-06-13T02:27:13Z

Assigned to Benjamin for assessment - this should be considered for rc2
since it's still broken in 3.1:

>>> f = open('setup.py', 'rb')
>>> len(f.peek(10))
4096
>>> len(f.peek(1))
4096
>>> len(f.peek(4095))
4096
>>> len(f.peek(10095))
4096

Brought up on python-dev in this thread:
http://mail.python.org/pipermail/python-dev/2009-June/089986.html

And previously here:
http://mail.python.org/pipermail/python-dev/2009-April/088229.html

The thread from April suggests the current behaviour may be intentional,
in which case it is the documentation that needs to be fixed, as it is
currently not just misleading but flat out wrong. Then again, Benjamin's
initial response to that thread was to support the idea of changing
peek() so that the argument actually was a cap.

The previous documentation that Alexandre quotes in the April was
changed to the current description in late April without any
corresponding change to the implementation:
http://svn.python.org/view/python/branches/py3k/Doc/library/io.rst?r1=62422&r2=62430

However, the old description was also wrong for the io-c implementation
since it just returns the current buffered data from peek, no matter
what argument you pass in.

benjaminp · 2009-06-13T03:38:36Z

I think the argument should be used as a upper bound; I will look at
this tomorrow.

conf · 2009-06-13T04:20:56Z

Hey guys, I did a patch about this one.
I didn't do many tests but I guess it is ok (it works like I think it
should).
What do you think?

conf · 2009-06-13T04:23:11Z

Oops I overlooked I minor flaw.
A second version.

conf · 2009-06-13T04:48:44Z

There's a problem with my patch... When the size of the data we want to
peek is too big ( > buffer_len - start ) the cursor will move, thus
there isn't a case where the peek function would work properly (except
when we want to peek() just 1 byte).
Couldn't we use a read() followed by a seek() instead?

pitrou · 2009-06-13T12:21:50Z

Lucas, it is indeed impossible for peek() to return more than the buffer
size and remain compatible with non-seekable raw streams. That's why it
/never/ returns more than the buffer size.

As for the fact that peek() doesn't behave as documented, I disagree.
Here is what the docstring says:

    """Returns buffered bytes without advancing the position.

    The argument indicates a desired minimal number of bytes; we
    do at most one raw read to satisfy it.  We never return more
    than self.buffer_size.
    """

Please note : "a desired /minimal/ number of bytes" (minimal, not
maximal). Furthermore, "We never return more than self.buffer_size." The
behaviour looks ok to me.

conf · 2009-06-13T12:37:38Z

We could fill the buffer while moving its start point to 0. I guess this
behavior would require a new function (or a new parameter to
Modules/_io/bufferedio.c:_bufferedreader_fill_buffer() ).
If you are ok with that I could write a patch.

pitrou · 2009-06-13T12:40:18Z

We could, however, enforce the passed argument and only return the whole
remaining buffer when no argument is given. This is a bit like Frederick
Reeve's proposal on python-dev, but less sophisticated and therefore
less tedious to implement.

In any case, I'm not sure it should be committed before the 3.1 release.
The second and last release candidate is supposed to be today.

pitrou · 2009-06-13T12:47:34Z

We could fill the buffer while moving its start point to 0. I guess this
behavior would require a new function (or a new parameter to
Modules/_io/bufferedio.c:_bufferedreader_fill_buffer() ).
If you are ok with that I could write a patch.

The buffer is used for both reading and writing and you have to be
careful when shifting it. Besides, the same change (or similar) should
also be done in the Python implementation (in _pyio.py).

If you come up with a patch, please add some tests and check the whole
regression suite passes.

benjaminp · 2009-06-13T12:57:05Z

I'm downgrading this because it can't be changed until after 3.1 is
released.

ncoghlan · 2009-06-14T01:12:26Z

It's not the docstring that is wrong for the current behaviour, it's the
IO.BufferedReader documentation:

"""
peek([n])
Return 1 (or n if specified) bytes from a buffer without advancing
the position. Only a single read on the raw stream is done to satisfy
the call. The number of bytes returned may be less than requested since
at most all the buffer’s bytes from the current position to the end are
returned.
"""

That gives absolutely no indication that the call might return more
bytes than expected, and the indication that leaving out the argument
will return only the next byte is flat out wrong.

benjaminp · 2009-06-14T14:38:08Z

I updated the documentation in r73429. Is that better?

pitrou · 2009-06-14T19:32:54Z

Rather than "only a single read on the raw stream", it should be "at
most a single read on the raw stream", IMHO.

conf · 2009-06-15T05:21:06Z

Here is a patch that passes all the tests (I had to change some of them
though, they were expecting erroneous behaviours IMHO).
The biggest problem was the read1 testing, I've tried to get the maximum
of bytes less than or equal to what the user wanted while executing at
most 1 raw_read()'s.

I have created a new test for peek()'ing a number of bytes bigger than
could possibly be stored on the buffer.

conf · 2009-06-15T05:22:02Z

Here, it's a patch that passes all the tests (I had to change some of them
though, they were expecting erroneous behaviours IMHO).
The biggest problem was the read1 testing, I've tried to get the maximum
of bytes less than or equal to what the user wanted while executing at
most 1 raw_read()'s.

I have created a new test for peek()'ing a number of bytes bigger than
could possibly be stored on the buffer.

pitrou · 2009-06-15T10:37:07Z

I haven't read the patch in detail but I don't think you should have
changed read1(). read1() is there for optimization purposes and its
current behaviour makes sense IMO.

ncoghlan · 2009-06-15T10:39:12Z

The doc revision definitely does a better job of characterising the
current underspecified behaviour :)

I agree with Antoine that "at most a single read" would be better wording.

conf · 2009-06-15T13:02:27Z

Ok
A new patch without read1() changes.
Only one test fails, a read1() test:

======================================================================
FAIL: test_read1 (test.test_io.PyBufferedRWPairTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/home/lucas/Codes/python-stuff/py3k/Lib/test/test_io.py", line
1139, in test_read1
    self.assertEqual(pair.read1(3), b"abc")
AssertionError: b'a' != b'abc'

Since I've changed peek_unlocked() (which is used once by read1()), I
guess there's a problem with read1() expectations about it.

BreamoreBoy · 2013-05-19T15:19:53Z

Looks like this has slipped under the radar. I'll leave working on it to the experts :)

vadmium · 2015-01-09T12:59:21Z

Is the current documentation as accurate as it can be?

“The number of bytes returned may be less or more than requested”

To me this has always made this method practically useless. A valid implementation could just always return b"". I noticed the BZ2File.peek() documentation (BZ2File is apparently trying to imitate BufferedReader) is slightly more useful:

“At least one byte of data will be returned (unless at EOF)”

That could be used for (say) peeking for a LF following a CR. But still the “size” parameter does not seem very useful. In fact, LZMAFile.peek() says the size is ignored.

vadmium · 2015-01-10T04:26:53Z

Here is a simple documentation patch to guarantee that at least one byte is normally returned. This would make the method much more useful, and compatible with the BZ2File and LZMAFile interfaces, allowing them to use BufferedReader, as I propose to do in bpo-15955.

Even if nobody is interested in Torsten’s patch to limit the return length, I suggest my patch be considered :)

vadmium · 2015-01-30T06:48:55Z

The non-blocking behaviour that I documented in my patch is under question in bpo-13322. I think it would be nice change the implementation to either return None or raise BlockingIOError.

trott mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 22, 2009

ncoghlan added the release-blocker label Jun 13, 2009

ncoghlan assigned benjaminp Jun 13, 2009

benjaminp removed their assignment Jun 13, 2009

benjaminp removed the release-blocker label Jun 13, 2009

vadmium added the docs Documentation in the Doc dir label Jan 10, 2015

vadmium assigned docspython Jan 10, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

serhiy-storchaka added the topic-IO label Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

io.BufferedReader.peek(): Documentation differs from Implementation #50061

io.BufferedReader.peek(): Documentation differs from Implementation #50061

trott mannequin commented Apr 22, 2009

trott mannequin commented Apr 22, 2009

trott mannequin commented Apr 22, 2009

trott mannequin commented Apr 22, 2009

ncoghlan commented Jun 13, 2009

benjaminp commented Jun 13, 2009

conf mannequin commented Jun 13, 2009

conf mannequin commented Jun 13, 2009

conf mannequin commented Jun 13, 2009

pitrou commented Jun 13, 2009

conf mannequin commented Jun 13, 2009

pitrou commented Jun 13, 2009

pitrou commented Jun 13, 2009

benjaminp commented Jun 13, 2009

ncoghlan commented Jun 14, 2009

benjaminp commented Jun 14, 2009

pitrou commented Jun 14, 2009

conf mannequin commented Jun 15, 2009

conf mannequin commented Jun 15, 2009

pitrou commented Jun 15, 2009

ncoghlan commented Jun 15, 2009

conf mannequin commented Jun 15, 2009

BreamoreBoy mannequin commented May 19, 2013

vadmium commented Jan 9, 2015

vadmium commented Jan 10, 2015

vadmium commented Jan 30, 2015

io.BufferedReader.peek(): Documentation differs from Implementation #50061

io.BufferedReader.peek(): Documentation differs from Implementation #50061

Comments

trott mannequin commented Apr 22, 2009

trott mannequin commented Apr 22, 2009

trott mannequin commented Apr 22, 2009

trott mannequin commented Apr 22, 2009

ncoghlan commented Jun 13, 2009

benjaminp commented Jun 13, 2009

conf mannequin commented Jun 13, 2009

conf mannequin commented Jun 13, 2009

conf mannequin commented Jun 13, 2009

pitrou commented Jun 13, 2009

conf mannequin commented Jun 13, 2009

pitrou commented Jun 13, 2009

pitrou commented Jun 13, 2009

benjaminp commented Jun 13, 2009

ncoghlan commented Jun 14, 2009

benjaminp commented Jun 14, 2009

pitrou commented Jun 14, 2009

conf mannequin commented Jun 15, 2009

conf mannequin commented Jun 15, 2009

pitrou commented Jun 15, 2009

ncoghlan commented Jun 15, 2009

conf mannequin commented Jun 15, 2009

BreamoreBoy mannequin commented May 19, 2013

vadmium commented Jan 9, 2015

vadmium commented Jan 10, 2015

vadmium commented Jan 30, 2015