Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io.BufferedReader.peek(): Documentation differs from Implementation #50061

Open
trott mannequin opened this issue Apr 22, 2009 · 25 comments
Open

io.BufferedReader.peek(): Documentation differs from Implementation #50061

trott mannequin opened this issue Apr 22, 2009 · 25 comments
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@trott
Copy link
Mannequin

trott mannequin commented Apr 22, 2009

BPO 5811
Nosy @ncoghlan, @pitrou, @benjaminp, @vadmium, @desbma
Files
  • peek.diff
  • peek2.diff
  • peek3.diff
  • peek4.diff
  • peek-one-byte.patch: Document ≥ 1 byte returned
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2009-04-22.06:15:51.564>
    labels = ['type-bug', 'library', 'docs']
    title = 'io.BufferedReader.peek(): Documentation differs from Implementation'
    updated_at = <Date 2020-09-21.18:05:58.451>
    user = 'https://bugs.python.org/trott'

    bugs.python.org fields:

    activity = <Date 2020-09-21.18:05:58.451>
    actor = 'desbma'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation', 'Library (Lib)']
    creation = <Date 2009-04-22.06:15:51.564>
    creator = 'trott'
    dependencies = []
    files = ['14281', '14282', '14304', '14305', '37661']
    hgrepos = []
    issue_num = 5811
    keywords = ['patch']
    message_count = 25.0
    messages = ['86274', '86275', '86276', '89311', '89312', '89313', '89314', '89315', '89324', '89326', '89327', '89328', '89329', '89340', '89349', '89366', '89388', '89389', '89399', '89400', '89401', '189599', '233750', '233801', '235024']
    nosy_count = 8.0
    nosy_names = ['ncoghlan', 'pitrou', 'benjamin.peterson', 'trott', 'conf', 'docs@python', 'martin.panter', 'desbma']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue5811'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.4']

    @trott
    Copy link
    Mannequin Author

    trott mannequin commented Apr 22, 2009

    The documented behavior of io.BufferedReader.peek([n]) states:

    peek([n])
    Return 1 (or n if specified) bytes from a buffer without advancing the 
    position.

    Thereas the parameter n is the _max_ length of returned bytes.

    Implementation is:

        def _peek_unlocked(self, n=0):
            want = min(n, self.buffer_size)
            have = len(self._read_buf) - self._read_pos
            if have < want:
                to_read = self.buffer_size - have
                current = self.raw.read(to_read)
                if current:
                    self._read_buf = self._read_buf[self._read_pos:] + 
    current
                    self._read_pos = 0
            return self._read_buf[self._read_pos:]

    Where you may see that the parameter n is the _min_ length
    of returned bytes. So peek(1) will return _not_ just 1 Byte, but the
    remaining bytes in the buffer.

    @trott trott mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 22, 2009
    @trott
    Copy link
    Mannequin Author

    trott mannequin commented Apr 22, 2009

    Note: this is also in Python 2.6

    @trott
    Copy link
    Mannequin Author

    trott mannequin commented Apr 22, 2009

    Proposed patch to fix this:

    set the default of n to 1 as stated by docs:

    def _peek_unlocked(self, n=1):

    return n bytes:

    return self._read_buf[self._read_pos:self._read_pos+n]

    @ncoghlan
    Copy link
    Contributor

    Assigned to Benjamin for assessment - this should be considered for rc2
    since it's still broken in 3.1:

    >>> f = open('setup.py', 'rb')
    >>> len(f.peek(10))
    4096
    >>> len(f.peek(1))
    4096
    >>> len(f.peek(4095))
    4096
    >>> len(f.peek(10095))
    4096

    Brought up on python-dev in this thread:
    http://mail.python.org/pipermail/python-dev/2009-June/089986.html

    And previously here:
    http://mail.python.org/pipermail/python-dev/2009-April/088229.html

    The thread from April suggests the current behaviour may be intentional,
    in which case it is the documentation that needs to be fixed, as it is
    currently not just misleading but flat out wrong. Then again, Benjamin's
    initial response to that thread was to support the idea of changing
    peek() so that the argument actually was a cap.

    The previous documentation that Alexandre quotes in the April was
    changed to the current description in late April without any
    corresponding change to the implementation:
    http://svn.python.org/view/python/branches/py3k/Doc/library/io.rst?r1=62422&r2=62430

    However, the old description was also wrong for the io-c implementation
    since it just returns the current buffered data from peek, no matter
    what argument you pass in.

    @benjaminp
    Copy link
    Contributor

    I think the argument should be used as a upper bound; I will look at
    this tomorrow.

    @conf
    Copy link
    Mannequin

    conf mannequin commented Jun 13, 2009

    Hey guys, I did a patch about this one.
    I didn't do many tests but I guess it is ok (it works like I think it
    should).
    What do you think?

    @conf
    Copy link
    Mannequin

    conf mannequin commented Jun 13, 2009

    Oops I overlooked I minor flaw.
    A second version.

    @conf
    Copy link
    Mannequin

    conf mannequin commented Jun 13, 2009

    There's a problem with my patch... When the size of the data we want to
    peek is too big ( > buffer_len - start ) the cursor will move, thus
    there isn't a case where the peek function would work properly (except
    when we want to peek() just 1 byte).
    Couldn't we use a read() followed by a seek() instead?

    @pitrou
    Copy link
    Member

    pitrou commented Jun 13, 2009

    Lucas, it is indeed impossible for peek() to return more than the buffer
    size and remain compatible with non-seekable raw streams. That's why it
    /never/ returns more than the buffer size.

    As for the fact that peek() doesn't behave as documented, I disagree.
    Here is what the docstring says:

        """Returns buffered bytes without advancing the position.
    
        The argument indicates a desired minimal number of bytes; we
        do at most one raw read to satisfy it.  We never return more
        than self.buffer_size.
        """
    

    Please note : "a desired /minimal/ number of bytes" (minimal, not
    maximal). Furthermore, "We never return more than self.buffer_size." The
    behaviour looks ok to me.

    @conf
    Copy link
    Mannequin

    conf mannequin commented Jun 13, 2009

    We could fill the buffer while moving its start point to 0. I guess this
    behavior would require a new function (or a new parameter to
    Modules/_io/bufferedio.c:_bufferedreader_fill_buffer() ).
    If you are ok with that I could write a patch.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 13, 2009

    We could, however, enforce the passed argument and only return the whole
    remaining buffer when no argument is given. This is a bit like Frederick
    Reeve's proposal on python-dev, but less sophisticated and therefore
    less tedious to implement.

    In any case, I'm not sure it should be committed before the 3.1 release.
    The second and last release candidate is supposed to be today.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 13, 2009

    We could fill the buffer while moving its start point to 0. I guess this
    behavior would require a new function (or a new parameter to
    Modules/_io/bufferedio.c:_bufferedreader_fill_buffer() ).
    If you are ok with that I could write a patch.

    The buffer is used for both reading and writing and you have to be
    careful when shifting it. Besides, the same change (or similar) should
    also be done in the Python implementation (in _pyio.py).

    If you come up with a patch, please add some tests and check the whole
    regression suite passes.

    @benjaminp
    Copy link
    Contributor

    I'm downgrading this because it can't be changed until after 3.1 is
    released.

    @benjaminp benjaminp removed their assignment Jun 13, 2009
    @ncoghlan
    Copy link
    Contributor

    It's not the docstring that is wrong for the current behaviour, it's the
    IO.BufferedReader documentation:

    """
    peek([n])
    Return 1 (or n if specified) bytes from a buffer without advancing
    the position. Only a single read on the raw stream is done to satisfy
    the call. The number of bytes returned may be less than requested since
    at most all the buffer’s bytes from the current position to the end are
    returned.
    """

    That gives absolutely no indication that the call might return more
    bytes than expected, and the indication that leaving out the argument
    will return only the next byte is flat out wrong.

    @benjaminp
    Copy link
    Contributor

    I updated the documentation in r73429. Is that better?

    @pitrou
    Copy link
    Member

    pitrou commented Jun 14, 2009

    Rather than "only a single read on the raw stream", it should be "at
    most a single read on the raw stream", IMHO.

    @conf
    Copy link
    Mannequin

    conf mannequin commented Jun 15, 2009

    Here is a patch that passes all the tests (I had to change some of them
    though, they were expecting erroneous behaviours IMHO).
    The biggest problem was the read1 testing, I've tried to get the maximum
    of bytes less than or equal to what the user wanted while executing at
    most 1 raw_read()'s.

    I have created a new test for peek()'ing a number of bytes bigger than
    could possibly be stored on the buffer.

    @conf
    Copy link
    Mannequin

    conf mannequin commented Jun 15, 2009

    Here, it's a patch that passes all the tests (I had to change some of them
    though, they were expecting erroneous behaviours IMHO).
    The biggest problem was the read1 testing, I've tried to get the maximum
    of bytes less than or equal to what the user wanted while executing at
    most 1 raw_read()'s.

    I have created a new test for peek()'ing a number of bytes bigger than
    could possibly be stored on the buffer.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 15, 2009

    I haven't read the patch in detail but I don't think you should have
    changed read1(). read1() is there for optimization purposes and its
    current behaviour makes sense IMO.

    @ncoghlan
    Copy link
    Contributor

    The doc revision definitely does a better job of characterising the
    current underspecified behaviour :)

    I agree with Antoine that "at most a single read" would be better wording.

    @conf
    Copy link
    Mannequin

    conf mannequin commented Jun 15, 2009

    Ok
    A new patch without read1() changes.
    Only one test fails, a read1() test:

    ======================================================================
    FAIL: test_read1 (test.test_io.PyBufferedRWPairTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/lucas/Codes/python-stuff/py3k/Lib/test/test_io.py", line
    1139, in test_read1
        self.assertEqual(pair.read1(3), b"abc")
    AssertionError: b'a' != b'abc'

    Since I've changed peek_unlocked() (which is used once by read1()), I
    guess there's a problem with read1() expectations about it.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented May 19, 2013

    Looks like this has slipped under the radar. I'll leave working on it to the experts :)

    @vadmium
    Copy link
    Member

    vadmium commented Jan 9, 2015

    Is the current documentation as accurate as it can be?

    “The number of bytes returned may be less or more than requested”

    To me this has always made this method practically useless. A valid implementation could just always return b"". I noticed the BZ2File.peek() documentation (BZ2File is apparently trying to imitate BufferedReader) is slightly more useful:

    “At least one byte of data will be returned (unless at EOF)”

    That could be used for (say) peeking for a LF following a CR. But still the “size” parameter does not seem very useful. In fact, LZMAFile.peek() says the size is ignored.

    @vadmium
    Copy link
    Member

    vadmium commented Jan 10, 2015

    Here is a simple documentation patch to guarantee that at least one byte is normally returned. This would make the method much more useful, and compatible with the BZ2File and LZMAFile interfaces, allowing them to use BufferedReader, as I propose to do in bpo-15955.

    Even if nobody is interested in Torsten’s patch to limit the return length, I suggest my patch be considered :)

    @vadmium vadmium added the docs Documentation in the Doc dir label Jan 10, 2015
    @vadmium
    Copy link
    Member

    vadmium commented Jan 30, 2015

    The non-blocking behaviour that I documented in my patch is under question in bpo-13322. I think it would be nice change the implementation to either return None or raise BlockingIOError.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants