This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: asyncio.StreamReader.readuntil is not general enough
Type: enhancement Stage: resolved
Components: asyncio Versions: Python 3.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Allow multiple separators in Stream.readuntil
View: 37141
Assigned To: Nosy List: Bruce Merry, asvetlov, socketpair, xtreak, yselivanov
Priority: normal Keywords:

Created on 2017-12-21 05:52 by Bruce Merry, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg308852 - (view) Author: Bruce Merry (Bruce Merry) Date: 2017-12-21 05:52
I'd proposed one specific solution in Issue 32052 which asvetlov didn't like, so as requested I'm filing a bug about the problem rather than the solution.

The specific case I have is reading a protocol in which either \r or \n can be used to terminate lines. With StreamReader.readuntil, it's only possible to specify one separator, so it can't easily be used (*).

Some nice-to-have features, from specific to general:
1. Specify multiple alternate separators.
2. Specify a regex for a separator.
3. Specify a regex for the line.
4. Specify a callback that takes a string and returns the position of the end of the line, if any.

Of course, some of these risk quadratic-time behaviour if they have to check the whole buffer every time the buffer is extended, so that would need to be considered in the design. In the last case, the callback could take care of it itself by maintaining internal state.


(*) I actually have a solution for this case (https://github.com/ska-sa/aiokatcp/blob/bd8263cefe213003a218fac0dd8c5207cc76aeef/aiokatcp/connection.py#L44-L52), but it only works because \r and \n are semantically equivalent in the particular protocol I'm parsing.
msg308853 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2017-12-21 06:06
Support multiple separators looks easy, I don't expect any performance impact.
Like we already have it for strings: s.startswith(('\n', '\r'))

Regexps are more expensive thing, callbacks are kind of evil.

Let's add a patch for multiple separators first, maybe it covers 99.9% use cases.
msg352078 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-09-12 09:37
I think this is a duplicate of issue37141 where multiple separators are requested for readuntil. I guess we can close one of them as duplicates.
msg352080 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-09-12 09:39
Agree
History
Date User Action Args
2022-04-11 14:58:55adminsetgithub: 76576
2019-09-12 09:39:28asvetlovsetstatus: open -> closed
superseder: Allow multiple separators in Stream.readuntil
messages: + msg352080

resolution: duplicate
stage: resolved
2019-09-12 09:37:13xtreaksetnosy: + xtreak
messages: + msg352078
2018-07-26 19:48:50apatrushevsetnosy: + socketpair
2017-12-21 06:06:37asvetlovsetmessages: + msg308853
2017-12-21 06:00:11yselivanovsetnosy: + asvetlov
2017-12-21 05:52:54Bruce Merrycreate