Issue 20578: BufferedIOBase.readinto1 is missing

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/64777

classification

Title:	BufferedIOBase.readinto1 is missing
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	benjamin.peterson, hynek, loewis, nikratio, pitrou, python-dev, stutzbach, vstinner
Priority:	normal	Keywords:	patch

Created on 2014-02-10 00:13 by nikratio, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue20578.diff	nikratio, 2014-03-27 03:35		review
issue20578_r2.diff	nikratio, 2014-04-12 23:25		review
benchmark.py	nikratio, 2014-04-14 01:58
benchmark.py	nikratio, 2014-04-14 02:26
issue20578_r3.diff	nikratio, 2014-04-15 01:31		review
benchmark_r3.py	nikratio, 2014-04-15 01:32
issue20578_r4.diff	nikratio, 2014-06-08 22:14		review
issue20578_r5.diff	nikratio, 2014-06-15 18:42
issue20578_r6.diff	nikratio, 2014-06-15 19:08		review

Messages (22)
msg210794 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-02-10 00:13
It would be nice to have a readinto1 method to complement the existing read, readinto, and read1 methods of io.BufferedIOBase.
msg210795 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-02-10 00:41
(I'll work on a patch for this myself, this bug is just to prevent duplicate work)
msg214930 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-03-27 03:35
I have attached a patch that adds readinto1() to BufferedReader and BufferedRWPair. An example use case for this method is receiving a large stream over a protocol like HTTP. You want to use a buffered reader so you can efficiently parse the header, but after that you want to stream the data as it comes in, i.e. you want to use read1 or, for improved performance, readinto1. Feedback is welcome.
msg215263 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2014-03-31 20:06
The IO APIs are a visible part of the stdlib, so I'd suggest asking on the python-dev mailing-list first.
msg215979 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-04-12 18:51
Seems as if no one has an opinion on this at all: https://mail.python.org/pipermail/python-dev/2014-April/133739.html What next?
msg215984 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2014-04-12 22:01
I put some review into rietveld. In addition, please avoid code duplication of more than three subsequent lines of code.
msg215988 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-04-12 23:25
Thanks for the review! Attached is a new patch. I was actually pretty careful to avoid any code duplication.. are you refering to the readinto1() implementations for BytesIO and BufferedReader in Lib/_pyio.py?
msg215989 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2014-04-13 00:23
Put up a new review.
msg216049 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-04-14 01:58
Here's a little script to estimate the performance difference between using read1 and readinto1 to read large amounts of data. On my system, I get: C readinto1: 4.960e-01 seconds C read1: 4.055e-01 seconds Python readinto1: 1.066e+00 seconds Python read1: 2.565e+00 seconds In other words, _pyio.BufferedReader.readinto1 is more than a factor of 2 faster than _pyio.BufferedReader.read1 and io.readinto1 is faster than io.read1 by about 20%. On its own, I think this would justify keeping an implementation of readinto1 in _pyio.BufferedReader instead of falling back on the default (that is implemented using read1). However, I believe that people who need performance are probably not using _pyio but io, so my argument for keeping it implemented in _pyio is to keep the implementations similiar. I found studying _pyio very helpful to understand the C code in io. If we implement BufferedReader.readinto1 in io, but not in _pyio.BufferedReader, this advantage would be reduced. That said, I am primary interested in getting readinto1 into io. So I'm happy to either extend the patch to also provide a fast readinto implementation for _pyio (to align it with io), or to remove the readinto1 implementation in _pyio.
msg216053 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2014-04-14 02:16
Can you please extend your benchmark to also measure read and readinto? I'm puzzled why you are treating readinto1 differently from readinto.
msg216055 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-04-14 02:19
(Rietveld is giving me errors, so I'm replying here) On 2014/04/13 02:22:23, loewis wrote: >>> Again, why a separate implementation here? >> >> For performance reasons. Relying on the default implementation >> would fall back to using read1(), which means a new bytes object >> is created first. > > Hmm. > a) if performance was relevant, it should apply to readinto() as well. I didn't even notice the readinto implementation was missing. But I agree, if we keep readinto1(), we should also add readinto(). > b) if the code is duplicated for performance only, a comment should > state that explicitly. I'm very sorry, but I still don't see which code in readinto1() is duplicate. You don't mean duplicate between io and _pyio, do you? > c) to justify a performance argument, you should really provide hard > numbers that demonstrate a performance gain justifying the code > duplication. I posted a small benchmark to the issue tracker. Personally, I think the more important argument is to keep the Python and C implementations similar. It's really nice if you can look at _pyio to find out at least roughly what happens in io. (Yes, I did put performance first in my last reply, but only because I thought you were asking about readinto1 in general, rather than the additional Python implementation in _pyio.)
msg216056 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-04-14 02:26
> Can you please extend your benchmark to also measure read and readinto? Yes - but I don't quite understand why it matters (if you need read1/readinto1, you cannot just use read/readinto instead). C readinto1: 4.638e-01 seconds C read1: 4.026e-01 seconds C readinto: 4.655e-01 seconds C read: 4.028e-01 seconds Python readinto1: 1.056e+00 seconds Python read1: 2.429e+00 seconds Python readinto: 1.895e+00 seconds Python read: 1.218e+00 seconds That shows that the Python readinto is definetely not up-to-par and could use improvement as well. Is that what you're getting at? > I'm puzzled why you are treating readinto1 differently from readinto. Maybe this is why we seem to be talking past each other :-). I did not look or work on readinto at all. All I noticed is that there is a read1, but no readinto1. So I implemented a readinto1 as well as I could.
msg216059 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2014-04-14 02:55
> I didn't even notice the readinto implementation was missing. But I > agree, if we keep readinto1(), we should also add readinto(). [...] > Maybe this is why we seem to be talking past each other :-). I did not > look or work on readinto at all. All I noticed is that there is a read1, > but no readinto1. So I implemented a readinto1 as well as I could. I see. It's not actually true that there is no readinto - it's inherited from the base class. I think it is more important that the implementation is consistent than that it is performant (but achieving both should be possible). Whether or not _pyio needs to be performant, I don't know. Having it consistent with _io would be desirable, but might not be possible.
msg216266 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-04-15 01:31
Attached is an updated patch that - removes the code duplication in _pyio.BufferedIOBase - adds an internal _readinto helper method to _pyio.BufferedReader that makes the implementation similar to io.BufferedReader. - implements _pyio.BuffereadReader.{readinto,readinto1} in terms of the new helper method and, as a side effect, also increases their performance. Performance of the _pyio implementation on my system is: pre-patch: readinto: 5.130e+00 seconds readinto1 not available post-patch: readinto: 2.039e+00 seconds readinto1: 1.995e+00 seconds
msg220012 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-06-08 03:07
New changeset 0fb7789b5eeb by Benjamin Peterson in branch 'default': add BufferedIOBase.readinto1 (closes #20578) http://hg.python.org/cpython/rev/0fb7789b5eeb
msg220015 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2014-06-08 03:18
Looks like test_file is unhappy: http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.x/builds/1758/steps/test/logs/stdio
msg220018 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-06-08 06:18
New changeset b1e99b4ec374 by Benjamin Peterson in branch 'default': backout 0fb7789b5eeb for test breakage (#20578) http://hg.python.org/cpython/rev/b1e99b4ec374
msg220061 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-06-08 22:14
Thanks for taking the time, and apologies about the test failure. I was probably too eager and ran only the test_io suite instead of everything. I looked at the failure, and the problem is that the default Python BufferedIOBase.readinto implementation is semi-broken. It should work with any object implementing the memoryview protocol (like the C implementation), but it really only works with bytearray objects. The testIteration test only worked (prior to the patch) because there is a special case for array objects with format 'b': try: b[:n] = data except TypeError as err: import array if not isinstance(b, array.array): raise err b[:n] = array.array('b', data) In other words, trying to read into any other object has always failed. In particular, even format code 'B' fails: >>> import _pyio >>> from array import array >>> buf = array('b', b'x' * 10) >>> _pyio.open('/dev/zero', 'rb').readinto(buf) 10 >>> buf = array('B', b'x' * 10) >>> _pyio.open('/dev/zero', 'rb').readinto(buf) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 1096, in readinto buf[:len_] = array.array('b', buf2) TypeError: bad argument type for built-in operation The readline implementation that my patch adds for BufferedReader does not contain this special case, and therefore with the patch even the test with a 'b'-array fails. For now, I've added the same special casing of 'b'-type arrays to the _readline() implementation in BufferedReader. This fixes the immediate problem (and this time I definitely ran the entire testsuite). However, the fix is certainly not what I would consider a good solution.. but I guess that would better be addressed by a separate patch that also fixes the same issue in BufferedIOBase?
msg220062 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-06-08 22:17
I used the wrong interpreter when cutting and pasting the example above, here's the correct version to avoid confusion with the traceback: >>> import _pyio >>> from array import array >>> buf = array('b', b'x' * 10) >>> _pyio.open('/dev/zero', 'rb').readinto(buf) 10 >>> buf = array('B', b'x' * 10) >>> _pyio.open('/dev/zero', 'rb').readinto(buf) Traceback (most recent call last): File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 662, in readinto b[:n] = data TypeError: can only assign array (not "bytes") to array slice During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 667, in readinto b[:n] = array.array('b', data) TypeError: bad argument type for built-in operation
msg220662 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-06-15 18:42
As discussed on python-devel, I'm attaching a new patch that uses memoryview.cast to ensure that the pure-Python readinto() now works with any object implementing the buffer protocol.
msg220665 - (view)	Author: Nikolaus Rath (nikratio) *	Date: 2014-06-15 19:08
(refreshed patch, no changes)
msg221313 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-06-22 21:17
New changeset 213490268be4 by Benjamin Peterson in branch 'default': add BufferedIOBase.readinto1 (closes #20578) http://hg.python.org/cpython/rev/213490268be4

History
Date	User	Action	Args
2022-04-11 14:57:58	admin	set	github: 64777
2014-06-22 21:17:55	python-dev	set	status: open -> closed resolution: fixed messages: + msg221313
2014-06-15 19:08:21	nikratio	set	files: + issue20578_r6.diff messages: + msg220665
2014-06-15 18:42:44	nikratio	set	files: + issue20578_r5.diff messages: + msg220662
2014-06-08 22:17:01	nikratio	set	messages: + msg220062
2014-06-08 22:14:25	nikratio	set	files: + issue20578_r4.diff messages: + msg220061
2014-06-08 06:32:17	benjamin.peterson	set	status: closed -> open resolution: fixed -> (no value)
2014-06-08 06:18:20	python-dev	set	messages: + msg220018
2014-06-08 03:18:42	benjamin.peterson	set	messages: + msg220015
2014-06-08 03:07:36	python-dev	set	status: open -> closed nosy: + python-dev messages: + msg220012 resolution: fixed stage: resolved
2014-04-15 01:32:18	nikratio	set	files: + benchmark_r3.py
2014-04-15 01:31:56	nikratio	set	files: + issue20578_r3.diff messages: + msg216266
2014-04-14 02:55:24	loewis	set	messages: + msg216059
2014-04-14 02:26:37	nikratio	set	files: + benchmark.py messages: + msg216056
2014-04-14 02:19:52	nikratio	set	messages: + msg216055
2014-04-14 02:16:51	loewis	set	messages: + msg216053
2014-04-14 01:58:19	nikratio	set	files: + benchmark.py messages: + msg216049
2014-04-13 00:23:07	loewis	set	messages: + msg215989
2014-04-12 23:25:09	nikratio	set	files: + issue20578_r2.diff messages: + msg215988
2014-04-12 22:01:55	loewis	set	nosy: + loewis messages: + msg215984
2014-04-12 18:51:48	nikratio	set	messages: + msg215979
2014-04-01 00:53:26	vstinner	set	nosy: + vstinner
2014-03-31 20:06:56	pitrou	set	messages: + msg215263
2014-03-27 03:36:25	nikratio	set	nosy: + benjamin.peterson, stutzbach, hynek
2014-03-27 03:35:46	nikratio	set	files: + issue20578.diff keywords: + patch messages: + msg214930
2014-02-10 00:41:55	nikratio	set	messages: + msg210795
2014-02-10 00:17:57	vstinner	set	nosy: + pitrou
2014-02-10 00:13:49	nikratio	create