classification
Title: Add prefetch() for Buffered IO (experiment)
Type: enhancement Stage:
Components: IO Versions: Python 3.4, Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, haypo, jcon, nadeem.vawda, pitrou, serhiy.storchaka, stutzbach
Priority: normal Keywords: patch

Created on 2011-05-10 19:37 by jcon, last changed 2013-09-19 15:26 by serhiy.storchaka.

Files
File name Uploaded Description Edit
issue12053-pyio.patch jcon, 2011-05-28 19:17 _pyio prefetch() implementation review
issue12053-tests.patch jcon, 2011-05-28 19:17 test cases review
prefetch.patch jcon, 2011-10-04 01:14 C impl + pyio + tests review
Messages (4)
msg135731 - (view) Author: John O'Connor (jcon) Date: 2011-05-10 19:37
A prefetch() method for Buffered IO may greatly assist 3rd party buffering among other gains. If nothing else, it is worth experimenting with. 

Discussion on the topic is here: http://mail.python.org/pipermail/python-ideas/2010-September/008180.html

A summary of the method proposed (by Antoine Pitrou):

prefetch(self, buffer, skip, minread)

Skip `skip` bytes from the stream.  Then, try to read at
least `minread` bytes and write them into `buffer`. The file
pointer is advanced by at most `skip + minread`, or less if
the end of file was reached. The total number of bytes written
in `buffer` is returned, which can be more than `minread`
if additional bytes could be prefetched (but, of course,
cannot be more than `len(buffer)`).

Arguments:
- `buffer`: a writable buffer (e.g. bytearray)
- `skip`: number of bytes to skip (must be >= 0)
- `minread`: number of bytes to read (must be >= 0 and <= len(buffer))
msg137143 - (view) Author: John O'Connor (jcon) Date: 2011-05-28 19:17
I started a draft in python. I am attaching the _pyio version along with tests. I will continue work on the C implementation and eventually documentation if this is well received. It seems straightforward, I am interested to see what you guys think.

Also, there are now 2 places which use hasattr(self, "peek"). I was wondering if it would make sense to add peek() to BufferedIOBase and raise UnsupportedOperation or return b"".

Some benchmarks..

$ ./python -m timeit -s "from _pyio import open;f = open('LICENSE', 'rb'); b=bytearray(128)" 'while f.prefetch(b, 4, 4): pass'
_pyio.BufferedIOBase.prefetch:
100000 loops, best of 3: 10.6 usec per loop
_pyio.BufferedReader.prefetch:
100000 loops, best of 3: 6 usec per loop

$ ./python -m timeit -s "from _pyio import open;f = open('LICENSE', 'rb');b=bytearray(4);" 'while f.read(4): f.readinto(b)'
100000 loops, best of 3: 5.07 usec per loop
msg138118 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-10 17:17
> I started a draft in python. I am attaching the _pyio version along
> with tests. I will continue work on the C implementation and
> eventually documentation if this is well received. It seems
> straightforward, I am interested to see what you guys think.

Thank you. I think performance measurements are prematurate until we
have an optimized C implementation anyway.

I think ultimately we also want a default implementation of read(),
peek() and read1() which uses prefetch(), so that BufferedReader
implementations only have to implement prefetch().
(care must be taken to avoid infinite loops)

That said, I think the python-dev mailing-list needs to be convinced of
the usefulness of prefetch() (if it was only me, there wouldn't be any
problem :-)). Perhaps you want to run another discussion there.
msg144848 - (view) Author: John O'Connor (jcon) Date: 2011-10-04 01:14
Here is an update with the C implementation. I think a working prototype will be helpful before another round on python-dev. 


I'm not sure how to handle unseekable, non-blocking streams where the read returns before `skip` bytes are exhausted. If prefetch() returns 0, then the caller would then have to use tell() to ensure subsequent reads are sane. In other words it seems prefetch() will leave the stream in an unpredictable state. Antoine, what are your thoughts?
History
Date User Action Args
2013-09-19 15:26:22serhiy.storchakasetnosy: + serhiy.storchaka
2011-10-04 01:14:46jconsetfiles: + prefetch.patch

messages: + msg144848
2011-06-10 17:17:58pitrousetmessages: + msg138118
2011-05-28 19:17:32jconsetfiles: + issue12053-tests.patch
2011-05-28 19:17:13jconsetfiles: + issue12053-pyio.patch
keywords: + patch
messages: + msg137143
2011-05-10 19:50:41nadeem.vawdasetnosy: + nadeem.vawda
2011-05-10 19:37:24jconcreate