New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
io.BufferedReader.peek(): Documentation differs from Implementation #50061
Comments
The documented behavior of io.BufferedReader.peek([n]) states: peek([n])
Return 1 (or n if specified) bytes from a buffer without advancing the
position. Thereas the parameter n is the _max_ length of returned bytes. Implementation is: def _peek_unlocked(self, n=0):
want = min(n, self.buffer_size)
have = len(self._read_buf) - self._read_pos
if have < want:
to_read = self.buffer_size - have
current = self.raw.read(to_read)
if current:
self._read_buf = self._read_buf[self._read_pos:] +
current
self._read_pos = 0
return self._read_buf[self._read_pos:] Where you may see that the parameter n is the _min_ length |
Note: this is also in Python 2.6 |
Proposed patch to fix this: set the default of n to 1 as stated by docs: def _peek_unlocked(self, n=1): return n bytes: return self._read_buf[self._read_pos:self._read_pos+n] |
Assigned to Benjamin for assessment - this should be considered for rc2 >>> f = open('setup.py', 'rb')
>>> len(f.peek(10))
4096
>>> len(f.peek(1))
4096
>>> len(f.peek(4095))
4096
>>> len(f.peek(10095))
4096 Brought up on python-dev in this thread: And previously here: The thread from April suggests the current behaviour may be intentional, The previous documentation that Alexandre quotes in the April was However, the old description was also wrong for the io-c implementation |
I think the argument should be used as a upper bound; I will look at |
Hey guys, I did a patch about this one. |
Oops I overlooked I minor flaw. |
There's a problem with my patch... When the size of the data we want to |
Lucas, it is indeed impossible for peek() to return more than the buffer As for the fact that peek() doesn't behave as documented, I disagree.
Please note : "a desired /minimal/ number of bytes" (minimal, not |
We could fill the buffer while moving its start point to 0. I guess this |
We could, however, enforce the passed argument and only return the whole In any case, I'm not sure it should be committed before the 3.1 release. |
The buffer is used for both reading and writing and you have to be If you come up with a patch, please add some tests and check the whole |
I'm downgrading this because it can't be changed until after 3.1 is |
It's not the docstring that is wrong for the current behaviour, it's the """ That gives absolutely no indication that the call might return more |
I updated the documentation in r73429. Is that better? |
Rather than "only a single read on the raw stream", it should be "at |
Here is a patch that passes all the tests (I had to change some of them I have created a new test for peek()'ing a number of bytes bigger than |
Here, it's a patch that passes all the tests (I had to change some of them I have created a new test for peek()'ing a number of bytes bigger than |
I haven't read the patch in detail but I don't think you should have |
The doc revision definitely does a better job of characterising the I agree with Antoine that "at most a single read" would be better wording. |
Ok ====================================================================== Traceback (most recent call last):
File "/home/lucas/Codes/python-stuff/py3k/Lib/test/test_io.py", line
1139, in test_read1
self.assertEqual(pair.read1(3), b"abc")
AssertionError: b'a' != b'abc' Since I've changed peek_unlocked() (which is used once by read1()), I |
Looks like this has slipped under the radar. I'll leave working on it to the experts :) |
Is the current documentation as accurate as it can be? “The number of bytes returned may be less or more than requested” To me this has always made this method practically useless. A valid implementation could just always return b"". I noticed the BZ2File.peek() documentation (BZ2File is apparently trying to imitate BufferedReader) is slightly more useful: “At least one byte of data will be returned (unless at EOF)” That could be used for (say) peeking for a LF following a CR. But still the “size” parameter does not seem very useful. In fact, LZMAFile.peek() says the size is ignored. |
Here is a simple documentation patch to guarantee that at least one byte is normally returned. This would make the method much more useful, and compatible with the BZ2File and LZMAFile interfaces, allowing them to use BufferedReader, as I propose to do in bpo-15955. Even if nobody is interested in Torsten’s patch to limit the return length, I suggest my patch be considered :) |
The non-blocking behaviour that I documented in my patch is under question in bpo-13322. I think it would be nice change the implementation to either return None or raise BlockingIOError. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: