Title: Provide access to buffer of asyncio.StreamReader
Type: enhancement Stage: resolved
Components: asyncio Versions: Python 3.8
Status: closed Resolution: fixed
Dependencies: 32251 Superseder:
Assigned To: Nosy List: Bruce Merry, asvetlov, bmerry, yselivanov
Priority: normal Keywords:

Created on 2017-11-16 19:09 by Bruce Merry, last changed 2019-06-03 09:18 by bmerry. This issue is now closed.

Messages (8)
msg306397 - (view) Author: Bruce Merry (Bruce Merry) Date: 2017-11-16 19:09
While asyncio.StreamReader.readuntil is an improvement on only having readline, it is still quite limited e.g. you cannot have multiple possible terminators. The real problem is that it's not possible to roll your own without accessing _underscore fields (other than by reading one byte at a time, which I'm guessing would be bad for performance). I'm not sure exactly what a public API to assist would look like, but I think the following would be a good start:

1. A get_buffer method, that returns (self._buffer, self._eof); the caller must treat the buffer as readonly.
2. A wait_for_data method to wait for the return value of get_buffer to change (basically like current _wait_for_data)
3. Access to the _limit attribute.

With that available, I think readuntil or more complex variants of it could be implemented externally using only the public interface (consumption of data from the buffer would be via readexactly rather than by messing with the buffer array directly).
msg308770 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2017-12-20 18:48
If the problm is in readuntil() functionality -- let's discuss th function improvement (in separate issue).

Exposing streams internals is antipattern and very bad idea.
I suggest closing the issue.

Yury, what is your opinion?
msg308772 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-20 18:50
I'd be more comfortable with the idea of exposing the buffer when we have BufferedProtocol.  Let's wait on this one.
msg327613 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-10-12 21:56
So we have BufferedProtocol in 3.7; now we need to re-implement asyncio streams on top of it.  But even after doing that I'm not that sure we want to expose the low-level buffer.
msg327648 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2018-10-13 10:21
Exposing internal buffer means committing on a new API contract forever.

I feel a need for reacher read*() API but pretty sure that making internal buffer public is a bad idea. With BufferedProtocol it could be even worse: SLAB allocators can spit a buffer into several separate chunks.

`str.startswith()` supports a tuple of separators, maybe we can do the same for streaming API
msg327657 - (view) Author: Bruce Merry (Bruce Merry) Date: 2018-10-13 16:13
A sequence of possible terminators would cover my immediate use case and certainly be an improvement.

To facilitate more general use cases without exposing implementation details, would it be practical and maintainable to have a "putback" method that prepends data to the buffer? It might not be fast in all cases (e.g. it might have to make a copy of what's still in the buffer), but possibly BufferedReader could detect the common case (putting back a suffix of what's just been read) and adjust its offsets into its internal buffer (although I'm not at all familiar with BufferedReader, so feel free to tell me I'm talking nonsense).
msg344273 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-06-02 11:09
1. Access to an internal buffer is not an option.
2. Pushing data back to stream after fetching is not an option too: it kills almost any possible optimization and makes code overcomplicated. aiohttp used to have "unread_data()" API but we deprecated it and going to remove the method entirely. A wrapper around the stream with puback functionality is an option though unless it doesn't touch underlying stream implementation.
3. Extending a set of reader operations is a good idea, please make a separate issue with a concrete proposal if needed.
msg344393 - (view) Author: Bruce Merry (bmerry) * Date: 2019-06-03 09:18
Ok, I'll open a separate issue to allow a tuple of possible separators.
Date User Action Args
2019-06-03 09:18:31bmerrysetnosy: + bmerry
messages: + msg344393
2019-06-02 11:09:41asvetlovsetstatus: open -> closed
resolution: fixed
messages: + msg344273

stage: resolved
2018-10-13 16:13:59Bruce Merrysetmessages: + msg327657
2018-10-13 10:21:57asvetlovsetmessages: + msg327648
2018-10-12 21:56:22yselivanovsetmessages: + msg327613
stage: needs patch -> (no value)
2018-10-12 21:53:18cheryl.sabellasetdependencies: + Add asyncio.BufferedProtocol
stage: needs patch
versions: + Python 3.8, - Python 3.7
2017-12-20 18:50:31yselivanovsetmessages: + msg308772
2017-12-20 18:48:17asvetlovsetnosy: + asvetlov
messages: + msg308770
2017-11-16 19:09:48Bruce Merrycreate