Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple separators in Stream.readuntil #81322

Closed
bmerry mannequin opened this issue Jun 3, 2019 · 26 comments
Closed

Allow multiple separators in Stream.readuntil #81322

bmerry mannequin opened this issue Jun 3, 2019 · 26 comments
Labels
topic-asyncio type-feature A feature request or enhancement

Comments

@bmerry
Copy link
Mannequin

bmerry mannequin commented Jun 3, 2019

BPO 37141
Nosy @asvetlov, @1st1, @bmerry, @websurfer5
PRs
  • gh-81322: support multiple separators in Stream.readuntil #16429
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-06-03.09:32:30.680>
    labels = ['type-feature', '3.9', 'expert-asyncio']
    title = 'Allow multiple separators in Stream.readuntil'
    updated_at = <Date 2019-09-26.19:48:53.882>
    user = 'https://github.com/bmerry'

    bugs.python.org fields:

    activity = <Date 2019-09-26.19:48:53.882>
    actor = 'yselivanov'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['asyncio']
    creation = <Date 2019-06-03.09:32:30.680>
    creator = 'bmerry'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37141
    keywords = ['patch']
    message_count = 11.0
    messages = ['344397', '344398', '344399', '344403', '344404', '344405', '344409', '352089', '352179', '353323', '353335']
    nosy_count = 4.0
    nosy_names = ['asvetlov', 'yselivanov', 'bmerry', 'Jeffrey.Kintscher']
    pr_nums = ['16429']
    priority = 'normal'
    resolution = 'later'
    stage = 'patch review'
    status = 'pending'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue37141'
    versions = ['Python 3.9']

    @bmerry
    Copy link
    Mannequin Author

    bmerry mannequin commented Jun 3, 2019

    Text-based protocols sometimes allow a choice of newline separator - I work with one that allows either \r or \n. Unfortunately that doesn't work with StreamReader.readuntil, which only accepts a single separator, so I've had to do some hacky things to obtain lines without having to

    From discussion in bpo-32052, it sounded like extending StreamReader.readuntil to support a tuple of separators would be feasible.

    @bmerry bmerry mannequin added 3.8 only security fixes topic-asyncio type-feature A feature request or enhancement labels Jun 3, 2019
    @asvetlov
    Copy link
    Contributor

    asvetlov commented Jun 3, 2019

    Would you make a PR?
    I guess to modify Stream.readuntil()
    StreamReader is deprecated, I'm not sure if we should add new functionality to this class.

    @bmerry
    Copy link
    Mannequin Author

    bmerry mannequin commented Jun 3, 2019

    I wasn't aware of that deprecation - it doesn't seem to be mentioned at https://docs.python.org/3.8/library/asyncio-stream.html. What is the replacement?

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Jun 3, 2019

    Docs will be updated soon.
    The change has landed a week ago, I had no time for docs update before 3.8 beta. Sorry for that.

    The idea is: StreamReader and StreamWriter are merged into just Stream, open_connection() is replaced with connect() etc.

    @bmerry
    Copy link
    Mannequin Author

    bmerry mannequin commented Jun 3, 2019

    Ok. Does the new Stream still have a similar interface for readuntil i.e. is this still a relevant request against the new API? I'm happy to let deprecated APIs stay as-is.

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Jun 3, 2019

    Yes, Stream supports all StreamReader and StreamWriter methods

    @bmerry
    Copy link
    Mannequin Author

    bmerry mannequin commented Jun 3, 2019

    Ok, I've changed the issue title to refer to Stream. Since this would be a new feature, I assume it's off the table for 3.8, but I'll see if I get time to implement a PR in time for 3.9 (and get someone at work to sign off on the contributor agreement, which might be the harder part).

    Thanks for the quick and helpful responses.

    @bmerry bmerry mannequin changed the title Allow multiple separators in StreamReader.readuntil Allow multiple separators in Stream.readuntil Jun 3, 2019
    @terryjreedy terryjreedy added 3.9 only security fixes and removed 3.8 only security fixes labels Jun 7, 2019
    @bmerry
    Copy link
    Mannequin Author

    bmerry mannequin commented Sep 12, 2019

    I finally have permission from my employer to sign the contributors agreement, so I'll take a stab at this when I have some free time (unless nobody else gets to it first).

    @asvetlov
    Copy link
    Contributor

    please do

    @bmerry
    Copy link
    Mannequin Author

    bmerry mannequin commented Sep 26, 2019

    I've submitted a PR: #16429

    @1st1
    Copy link
    Member

    1st1 commented Sep 26, 2019

    I already mentioned that in the PR, but we'll have to hit a pause on this. We decided to revert the latest streams implementation from 3.8, see https://bugs.python.org/issue38242. The upshot is that what we have in 3.9 should be more amenable for things like this one.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @kumaraditya303
    Copy link
    Contributor

    So this is stuck because of the new Stream implementation which was supposed to be implemented in 3.8 but today is 3.12. Unless we have a plan to reinvent the streams, this would have to be added to the existing StreamReader. I doubt that we can ever remove the existing streams implementation since it is so widely used.

    @kumaraditya303 kumaraditya303 removed the 3.9 only security fixes label Oct 27, 2022
    @vgrebenschikov
    Copy link

    Worth to ressurect the initiative? ... reimplementing custom reader when multiple separators required ... is tough

    @gvanrossum
    Copy link
    Member

    The PR looks reasonable. Maybe someone can champion it and make it up to date for release in 3.13? I'm willing to release. Assuming the OP (@bmerry) has moved on, it would be easiest for someone else to check out their branch into their own clone of the cpython repo and merge or rebase main on it, etc. Alternatively, @bmerry, if you're still around, please have at it!

    @bmerry
    Copy link
    Contributor

    bmerry commented Apr 2, 2024

    The PR looks reasonable. Maybe someone can champion it and make it up to date for release in 3.13? I'm willing to release. Assuming the OP (@bmerry) has moved on, it would be easiest for someone else to check out their branch into their own clone of the cpython repo and merge or rebase main on it, etc. Alternatively, @bmerry, if you're still around, please have at it!

    I'm still around - I stopped trying to keep the PR up to date with main because it wasn't clear if it would ever reach the front of the priority queue for the async devs. It sounds like @gvanrossum is happy to review an updated version now, so I'm happy to dust it off and update it when I find some free time.

    @gvanrossum
    Copy link
    Member

    Yup let’s do that! Feature freeze is late May, but don’t wait till the last minute.

    @bmerry
    Copy link
    Contributor

    bmerry commented Apr 7, 2024

    @gvanrossum I didn't get any merge conflicts when I merged in main and the tests still pass, so I think this is ready for review.

    @gvanrossum
    Copy link
    Member

    Thanks again!

    There's one detail in a different repo that you may want to get ahead of. There's a declaration for this method in typeshed; ideally we should update this conditionally with a check for sys.version_info.

    @bmerry
    Copy link
    Contributor

    bmerry commented Apr 8, 2024

    Thanks again!

    There's one detail in a different repo that you may want to get ahead of. There's a declaration for this method in typeshed; ideally we should update this conditionally with a check for sys.version_info.

    Thanks for the reminder - I'd meant to ask about typeshed but forgot. I'll sort out a PR there when I get a chance.

    @bmerry
    Copy link
    Contributor

    bmerry commented Apr 8, 2024

    Looking at the typeshed entry for readuntil is giving me second thoughts about this PR. In typeshed, it's written like this:

        # Can be any buffer that supports len(); consider changing to a Protocol if PEP 688 is accepted
        async def readuntil(self, separator: bytes | bytearray | memoryview = b"\n") -> bytes: ...

    On the other hand, in this PR the new use case is detected like this:

            if isinstance(separator, bytes):
                separator = [separator]
            else:
                # Makes sure shortest matches wins, and supports arbitrary iterables
                separator = sorted(separator, key=len)

    So I think this PR will actually break the existing use case of passing a bytearray (or memoryview), because it would be incorrectly interpreted as an iterable of separators rather than a single separator.

    @gvanrossum any thoughts on how to better decide whether we've been passed a single separator or an iterable of them? Perhaps instead of isinstance(separator, bytes), one could test hasattr(separator, "__buffer__") isinstance(separator, collections.abc.Buffer)? Or would it be better to introduce a new parameter (or new method) to make the user tell us the intention instead of guessing?

    bmerry added a commit to bmerry/cpython that referenced this issue Apr 8, 2024
    PR python#16429 introduced support for an iterable of separators in
    Stream.readuntil. Since bytes-like types are themselves iterable, this
    can introduce ambiguities in deciding whether the argument is an
    iterator of separators or a singleton separator. In python#16429, only 'bytes'
    was considered a singleton, but this will break code that passes other
    buffer object types.
    
    The Python library docs don't indicate what separator types were
    permitted in Python <=3.12, but comments in typeshed indicate that it
    would work with types that implement the buffer protocol and provide a
    len(). To keep those cases working the way they did before, I've changed
    the detection logic to consider any instance of collections.abc.Buffer
    as a singleton separator.
    
    There may still be corner cases where this doesn't do what the user
    wants e.g. a numpy array of byte strings will implement the buffer
    protocol and hence be treated as a singleton; but at least those corner
    cases should behave the same in 3.13 as they did in 3.12.
    
    Relates to python#81322.
    @bmerry
    Copy link
    Contributor

    bmerry commented Apr 8, 2024

    I've opened #117653 to use collections.abc.Buffer instead of bytes to decide the behaviour.

    @gvanrossum
    Copy link
    Member

    @gvanrossum any thoughts on how to better decide whether we've been passed a single separator or an iterable of them? Perhaps instead of isinstance(separator, bytes), one could test hasattr(separator, "__buffer__") isinstance(separator, collections.abc.Buffer)? Or would it be better to introduce a new parameter (or new method) to make the user tell us the intention instead of guessing?

    Well, my intuition would be to only allow tuples -- this follows the lead of e.g. str.startswith():

    >>> "abc".startswith("a")
    True
    >>> "abc".startswith(("a", "b"))
    True
    >>> "abc".startswith(["a", "b"])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
        "abc".startswith(["a", "b"])
        ~~~~~~~~~~~~~~~~^^^^^^^^^^^^
    TypeError: startswith first arg must be str or a tuple of str, not list
    >>> 

    @bmerry
    Copy link
    Contributor

    bmerry commented Apr 8, 2024

    Well, my intuition would be to only allow tuples -- this follows the lead of e.g. str.startswith():

    I can see the appeal in that. It's also consistent with isinstance which takes tuples and not lists.

    If someone does pass a list, they're probably going to get a cryptic error message; so it would be nice if we could detect incorrect usage. So maybe if it's not a tuple AND it's not a collections.abc.Buffer, raise a helpful TypeError?

    @gvanrossum
    Copy link
    Member

    If someone does pass a list, they're probably going to get a cryptic error message; so it would be nice if we could detect incorrect usage. So maybe if it's not a tuple AND it's not a collections.abc.Buffer, raise a helpful TypeError?

    Maybe, though I worry about overthinking this. If they pass something incorrect they already get a cryptic error message (e.g. what if they pass a string?). Do we care? Probably not, as long as it doesn't silently do the wrong thing.

    @bmerry
    Copy link
    Contributor

    bmerry commented Apr 10, 2024

    I've created a patch to accept only tuples in #117723.

    bmerry added a commit to bmerry/cpython that referenced this issue Apr 11, 2024
    @bmerry
    Copy link
    Contributor

    bmerry commented Apr 14, 2024

    There's one detail in a different repo that you may want to get ahead of. There's a declaration for this method in typeshed; ideally we should update this conditionally with a check for sys.version_info.

    I've opened python/typeshed#11755

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-asyncio type-feature A feature request or enhancement
    Projects
    Status: Done
    Development

    No branches or pull requests

    7 participants