This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: BufferedReader.read1() documentation/implementation difference
Type: Stage:
Components: IO Versions: Python 3.3
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Improve BufferedReader.read1()
View: 13393
Assigned To: Nosy List: benjamin.peterson, hynek, nikratio, pitrou, r.david.murray, serhiy.storchaka, stutzbach
Priority: normal Keywords:

Created on 2013-07-22 04:56 by nikratio, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (11)
msg193496 - (view) Author: Nikolaus Rath (nikratio) * Date: 2013-07-22 04:56
The read1() docstring says: "Reads up to n bytes, with at most one read() system call."

However, in the implementation read1() calls peek() to refill the buffer if necessary, and then returns whatever is available in the buffer.

This means that read1() will only return up to n bytes if n is smaller than the buffer size, otherwise it will return at most <buffer-size> bytes.


With the current implementation, running a loop that calls obj.read1(n) for large n is therefore much less performant than a loop that calls obj.raw.read(n). 

I think a call to read1(n) with empty buffer should always attempt to read n bytes from the raw stream, i.e. the implementation should be changed to match the documentation.
msg193647 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-24 14:00
Unless I'm misunderstanding something, the current implementation does match the current documentation: if <buffer-size> < n, returning <buffer-size> bytes is returning "up to n".  "Up to" means it could be "less than".

So you are advocating a change in behavior...but I believe there is a reason read1 is implemented the way it is, so I suspect that isn't going to happen.  Probably what you want to do is change the buffer size when you open the file so that it is greater than your 'n'.
msg193650 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-24 14:35
> This means that read1() will only return up to n bytes if n is smaller
> than the buffer size, otherwise it will return at most <buffer-size>
> bytes.

Did you actually observe such behaviour? If so, this is a bug.
msg193666 - (view) Author: Nikolaus Rath (nikratio) * Date: 2013-07-24 18:24
On 07/24/2013 07:35 AM, Antoine Pitrou wrote:
> 
> Antoine Pitrou added the comment:
> 
>> This means that read1() will only return up to n bytes if n is smaller
>> than the buffer size, otherwise it will return at most <buffer-size>
>> bytes.
> 
> Did you actually observe such behaviour? If so, this is a bug.

Yes, that's what I see:

$ python3 bug.py
Read 20 bytes using read()
Read 10 bytes using read1()

$ cat bug.py
#!/usr/bin/env python3

import io

raw = io.BytesIO(bytes(200))
buffered = io.BufferedReader(raw, 10)

data = buffered.read(20)
print('Read %d bytes using read()' % len(data))

data = buffered.read1(20)
print('Read %d bytes using read1()' % len(data))

There should be no problem reading 20 bytes with a single call to
BytesIO.read(), yet the buffered reader returns only 10 bytes.
msg193674 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-24 21:28
Le mercredi 24 juillet 2013 à 18:24 +0000, Nikolaus Rath a écrit :
> There should be no problem reading 20 bytes with a single call to
> BytesIO.read(), yet the buffered reader returns only 10 bytes.

Er. Your bug report seems to imply that read1(n) can return *more* than
n bytes, which would be a bug.
That it may return less than n bytes is correct, though.
msg193675 - (view) Author: Nikolaus Rath (nikratio) * Date: 2013-07-24 21:45
On 07/24/2013 02:28 PM, Antoine Pitrou wrote:
> 
> Antoine Pitrou added the comment:
> 
> Le mercredi 24 juillet 2013 à 18:24 +0000, Nikolaus Rath a écrit :
>> There should be no problem reading 20 bytes with a single call to
>> BytesIO.read(), yet the buffered reader returns only 10 bytes.
> 
> Er. Your bug report seems to imply that read1(n) can return *more* than
> n bytes, which would be a bug.
> That it may return less than n bytes is correct, though.

I admit the bug title may not have been chosen carefully enough.

The documentation is correct that read1(n) never returns more than n bytes.

My complaint is that the actual bound is stricter than this, band
read1(n) will never return more than min(n, bufsize) bytes.

To me, this seems unnecessary and counterintuitive. Why am I restricted
by the buffer size if the underlying raw device could provide the
requested amount with one call?

Looking at the source, this restriction seems caused by read1() calling
peek(). I think it would be straightforward to just delegate a read1(n)
call to raw stream's read(n) method.

This would improve performance, and also make the documentation more
accurate.

Did I miss something?

Best,
Nikolaus
msg193676 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-24 21:51
Le mercredi 24 juillet 2013 à 21:45 +0000, Nikolaus Rath a écrit :
> The documentation is correct that read1(n) never returns more than n bytes.
> 
> My complaint is that the actual bound is stricter than this, band
> read1(n) will never return more than min(n, bufsize) bytes.

That's not really true. If the buffer is empty, read1(n) will try to
read at most n bytes from the raw stream. So:
- if the buffer is not empty, the whole buffer is returned and 0 raw I/O
call is issued
- if the buffer is empty, 1 raw I/O call is issued and at most 1 byte is
returned

Therefore, if you call read1(n) in a loop, all calls but the first will
really try to read *n* bytes from the raw stream (because the first call
will have emptied the buffer).

(one implicit goal is to minimize the number of memory copies when
returning data to the application; a well-known consumer of read1()
calls is TextIOWrapper when it fills its internal buffer)
msg193677 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-24 21:52
Le mercredi 24 juillet 2013 à 21:51 +0000, Antoine Pitrou a écrit :
> - if the buffer is empty, 1 raw I/O call is issued and at most 1 byte is
> returned

Make that "at most n bytes are returned", of course.
msg193678 - (view) Author: Nikolaus Rath (nikratio) * Date: 2013-07-24 21:58
On 07/24/2013 02:51 PM, Antoine Pitrou wrote:
> 
> Antoine Pitrou added the comment:
> 
> Le mercredi 24 juillet 2013 à 21:45 +0000, Nikolaus Rath a écrit :
>> The documentation is correct that read1(n) never returns more than n bytes.
>>
>> My complaint is that the actual bound is stricter than this, band
>> read1(n) will never return more than min(n, bufsize) bytes.
> 
> That's not really true. If the buffer is empty, read1(n) will try to
> read at most n bytes from the raw stream. So:
> - if the buffer is not empty, the whole buffer is returned and 0 raw I/O
> call is issued
> - if the buffer is empty, 1 raw I/O call is issued and at most 1 byte is
> returned
> 
> Therefore, if you call read1(n) in a loop, all calls but the first will
> really try to read *n* bytes from the raw stream (because the first call
> will have emptied the buffer).

I agree that this is how it *should* work. But that's not what's
happening in practice:

> $ python3
Python 3.2.3 (default, Feb 20 2013, 14:44:27)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import io
>>> raw = io.BytesIO(bytes(200))
>>> buffered = io.BufferedReader(raw, 10)
>>> while True:
...     buf = buffered.read1(20)
...     print('Read %d bytes' % len(buf))
...     if not buf:
...        break
...
Read 10 bytes
Read 10 bytes
Read 10 bytes
Read 10 bytes
Read 10 bytes
Read 10 bytes
Read 10 bytes
Read 10 bytes
[...]
msg193679 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-24 22:04
Ah, well. This is already fixed, then:

Python 3.4.0a0 (default:ae769deb45b2, Jul 20 2013, 19:28:41) 
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import io
>>> raw = io.BytesIO(bytes(200))
>>> buffered = io.BufferedReader(raw, 10)
>>> while True:
...   buf = buffered.read1(20)
...   print("Got %d bytes" % len(buf))
...   if not buf:
...     break
... 
Got 20 bytes
Got 20 bytes
Got 20 bytes
Got 20 bytes
Got 20 bytes
Got 20 bytes
Got 20 bytes
Got 20 bytes
Got 20 bytes
Got 20 bytes
Got 0 bytes


However, 3.2 didn't get that improvement, sorry. See changeset 27bf3d0b8e5f and issue #13393.
msg193680 - (view) Author: Nikolaus Rath (nikratio) * Date: 2013-07-24 22:17
On 07/24/2013 03:04 PM, Antoine Pitrou wrote:
> However, 3.2 didn't get that improvement, sorry. See changeset 27bf3d0b8e5f and issue #13393.
> 
> ----------
> resolution:  -> duplicate
> status: open -> closed

Duh. So much back and forth just to find it's already fixed. Python is
moving too fast these days... :-).

-Nikolaus
History
Date User Action Args
2022-04-11 14:57:48adminsetgithub: 62724
2013-07-24 22:17:50nikratiosetmessages: + msg193680
2013-07-24 22:06:01pitrousetsuperseder: Improve BufferedReader.read1()
2013-07-24 22:04:19pitrousetstatus: open -> closed
resolution: duplicate
messages: + msg193679
2013-07-24 21:58:51nikratiosetmessages: + msg193678
2013-07-24 21:52:30pitrousetmessages: + msg193677
2013-07-24 21:51:31pitrousetmessages: + msg193676
2013-07-24 21:45:32nikratiosetmessages: + msg193675
2013-07-24 21:28:38pitrousetmessages: + msg193674
2013-07-24 18:24:52nikratiosetmessages: + msg193666
2013-07-24 14:35:28pitrousetmessages: + msg193650
2013-07-24 14:00:58r.david.murraysetnosy: + r.david.murray
messages: + msg193647
2013-07-22 07:44:17serhiy.storchakasetnosy: + pitrou, benjamin.peterson, stutzbach, hynek, serhiy.storchaka
2013-07-22 04:56:15nikratiocreate