Issue18524
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-07-22 04:56 by nikratio, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (11) | |||
---|---|---|---|
msg193496 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-07-22 04:56 | |
The read1() docstring says: "Reads up to n bytes, with at most one read() system call." However, in the implementation read1() calls peek() to refill the buffer if necessary, and then returns whatever is available in the buffer. This means that read1() will only return up to n bytes if n is smaller than the buffer size, otherwise it will return at most <buffer-size> bytes. With the current implementation, running a loop that calls obj.read1(n) for large n is therefore much less performant than a loop that calls obj.raw.read(n). I think a call to read1(n) with empty buffer should always attempt to read n bytes from the raw stream, i.e. the implementation should be changed to match the documentation. |
|||
msg193647 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2013-07-24 14:00 | |
Unless I'm misunderstanding something, the current implementation does match the current documentation: if <buffer-size> < n, returning <buffer-size> bytes is returning "up to n". "Up to" means it could be "less than". So you are advocating a change in behavior...but I believe there is a reason read1 is implemented the way it is, so I suspect that isn't going to happen. Probably what you want to do is change the buffer size when you open the file so that it is greater than your 'n'. |
|||
msg193650 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-07-24 14:35 | |
> This means that read1() will only return up to n bytes if n is smaller > than the buffer size, otherwise it will return at most <buffer-size> > bytes. Did you actually observe such behaviour? If so, this is a bug. |
|||
msg193666 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-07-24 18:24 | |
On 07/24/2013 07:35 AM, Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> This means that read1() will only return up to n bytes if n is smaller >> than the buffer size, otherwise it will return at most <buffer-size> >> bytes. > > Did you actually observe such behaviour? If so, this is a bug. Yes, that's what I see: $ python3 bug.py Read 20 bytes using read() Read 10 bytes using read1() $ cat bug.py #!/usr/bin/env python3 import io raw = io.BytesIO(bytes(200)) buffered = io.BufferedReader(raw, 10) data = buffered.read(20) print('Read %d bytes using read()' % len(data)) data = buffered.read1(20) print('Read %d bytes using read1()' % len(data)) There should be no problem reading 20 bytes with a single call to BytesIO.read(), yet the buffered reader returns only 10 bytes. |
|||
msg193674 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-07-24 21:28 | |
Le mercredi 24 juillet 2013 à 18:24 +0000, Nikolaus Rath a écrit : > There should be no problem reading 20 bytes with a single call to > BytesIO.read(), yet the buffered reader returns only 10 bytes. Er. Your bug report seems to imply that read1(n) can return *more* than n bytes, which would be a bug. That it may return less than n bytes is correct, though. |
|||
msg193675 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-07-24 21:45 | |
On 07/24/2013 02:28 PM, Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > > Le mercredi 24 juillet 2013 à 18:24 +0000, Nikolaus Rath a écrit : >> There should be no problem reading 20 bytes with a single call to >> BytesIO.read(), yet the buffered reader returns only 10 bytes. > > Er. Your bug report seems to imply that read1(n) can return *more* than > n bytes, which would be a bug. > That it may return less than n bytes is correct, though. I admit the bug title may not have been chosen carefully enough. The documentation is correct that read1(n) never returns more than n bytes. My complaint is that the actual bound is stricter than this, band read1(n) will never return more than min(n, bufsize) bytes. To me, this seems unnecessary and counterintuitive. Why am I restricted by the buffer size if the underlying raw device could provide the requested amount with one call? Looking at the source, this restriction seems caused by read1() calling peek(). I think it would be straightforward to just delegate a read1(n) call to raw stream's read(n) method. This would improve performance, and also make the documentation more accurate. Did I miss something? Best, Nikolaus |
|||
msg193676 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-07-24 21:51 | |
Le mercredi 24 juillet 2013 à 21:45 +0000, Nikolaus Rath a écrit : > The documentation is correct that read1(n) never returns more than n bytes. > > My complaint is that the actual bound is stricter than this, band > read1(n) will never return more than min(n, bufsize) bytes. That's not really true. If the buffer is empty, read1(n) will try to read at most n bytes from the raw stream. So: - if the buffer is not empty, the whole buffer is returned and 0 raw I/O call is issued - if the buffer is empty, 1 raw I/O call is issued and at most 1 byte is returned Therefore, if you call read1(n) in a loop, all calls but the first will really try to read *n* bytes from the raw stream (because the first call will have emptied the buffer). (one implicit goal is to minimize the number of memory copies when returning data to the application; a well-known consumer of read1() calls is TextIOWrapper when it fills its internal buffer) |
|||
msg193677 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-07-24 21:52 | |
Le mercredi 24 juillet 2013 à 21:51 +0000, Antoine Pitrou a écrit : > - if the buffer is empty, 1 raw I/O call is issued and at most 1 byte is > returned Make that "at most n bytes are returned", of course. |
|||
msg193678 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-07-24 21:58 | |
On 07/24/2013 02:51 PM, Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > > Le mercredi 24 juillet 2013 à 21:45 +0000, Nikolaus Rath a écrit : >> The documentation is correct that read1(n) never returns more than n bytes. >> >> My complaint is that the actual bound is stricter than this, band >> read1(n) will never return more than min(n, bufsize) bytes. > > That's not really true. If the buffer is empty, read1(n) will try to > read at most n bytes from the raw stream. So: > - if the buffer is not empty, the whole buffer is returned and 0 raw I/O > call is issued > - if the buffer is empty, 1 raw I/O call is issued and at most 1 byte is > returned > > Therefore, if you call read1(n) in a loop, all calls but the first will > really try to read *n* bytes from the raw stream (because the first call > will have emptied the buffer). I agree that this is how it *should* work. But that's not what's happening in practice: > $ python3 Python 3.2.3 (default, Feb 20 2013, 14:44:27) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import io >>> raw = io.BytesIO(bytes(200)) >>> buffered = io.BufferedReader(raw, 10) >>> while True: ... buf = buffered.read1(20) ... print('Read %d bytes' % len(buf)) ... if not buf: ... break ... Read 10 bytes Read 10 bytes Read 10 bytes Read 10 bytes Read 10 bytes Read 10 bytes Read 10 bytes Read 10 bytes [...] |
|||
msg193679 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-07-24 22:04 | |
Ah, well. This is already fixed, then: Python 3.4.0a0 (default:ae769deb45b2, Jul 20 2013, 19:28:41) [GCC 4.7.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import io >>> raw = io.BytesIO(bytes(200)) >>> buffered = io.BufferedReader(raw, 10) >>> while True: ... buf = buffered.read1(20) ... print("Got %d bytes" % len(buf)) ... if not buf: ... break ... Got 20 bytes Got 20 bytes Got 20 bytes Got 20 bytes Got 20 bytes Got 20 bytes Got 20 bytes Got 20 bytes Got 20 bytes Got 20 bytes Got 0 bytes However, 3.2 didn't get that improvement, sorry. See changeset 27bf3d0b8e5f and issue #13393. |
|||
msg193680 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-07-24 22:17 | |
On 07/24/2013 03:04 PM, Antoine Pitrou wrote: > However, 3.2 didn't get that improvement, sorry. See changeset 27bf3d0b8e5f and issue #13393. > > ---------- > resolution: -> duplicate > status: open -> closed Duh. So much back and forth just to find it's already fixed. Python is moving too fast these days... :-). -Nikolaus |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:48 | admin | set | github: 62724 |
2013-07-24 22:17:50 | nikratio | set | messages: + msg193680 |
2013-07-24 22:06:01 | pitrou | set | superseder: Improve BufferedReader.read1() |
2013-07-24 22:04:19 | pitrou | set | status: open -> closed resolution: duplicate messages: + msg193679 |
2013-07-24 21:58:51 | nikratio | set | messages: + msg193678 |
2013-07-24 21:52:30 | pitrou | set | messages: + msg193677 |
2013-07-24 21:51:31 | pitrou | set | messages: + msg193676 |
2013-07-24 21:45:32 | nikratio | set | messages: + msg193675 |
2013-07-24 21:28:38 | pitrou | set | messages: + msg193674 |
2013-07-24 18:24:52 | nikratio | set | messages: + msg193666 |
2013-07-24 14:35:28 | pitrou | set | messages: + msg193650 |
2013-07-24 14:00:58 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg193647 |
2013-07-22 07:44:17 | serhiy.storchaka | set | nosy:
+ pitrou, benjamin.peterson, stutzbach, hynek, serhiy.storchaka |
2013-07-22 04:56:15 | nikratio | create |