Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a Generator ABC #68206

Closed
scoder opened this issue Apr 20, 2015 · 36 comments
Closed

add a Generator ABC #68206

scoder opened this issue Apr 20, 2015 · 36 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@scoder
Copy link
Contributor

scoder commented Apr 20, 2015

BPO 24018
Nosy @gvanrossum, @rhettinger, @pitrou, @scoder, @vstinner, @ambv, @vadmium, @1st1
Files
  • generator_abc.patch: implementation of a Generator ABC
  • generator_abc.patch: updated patch that improves close() implementation and test coverage
  • generator_abc.patch: updated patch: check GeneratorExit propagation
  • inspect_docs.patch: update inspect docs to state that isinstance() is better than isgenerator()
  • generator_abc.patch: updated implementation: make throw() abstract
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2015-05-09.05:08:15.882>
    created_at = <Date 2015-04-20.19:36:25.857>
    labels = ['type-feature', 'library']
    title = 'add a Generator ABC'
    updated_at = <Date 2015-05-09.18:38:11.022>
    user = 'https://github.com/scoder'

    bugs.python.org fields:

    activity = <Date 2015-05-09.18:38:11.022>
    actor = 'gvanrossum'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2015-05-09.05:08:15.882>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2015-04-20.19:36:25.857>
    creator = 'scoder'
    dependencies = []
    files = ['39151', '39156', '39157', '39180', '39213']
    hgrepos = []
    issue_num = 24018
    keywords = ['patch']
    message_count = 36.0
    messages = ['241675', '241677', '241681', '241682', '241683', '241692', '241700', '241701', '241702', '241703', '241704', '241705', '241716', '241717', '241723', '241725', '241736', '241846', '241956', '241993', '242054', '242071', '242072', '242077', '242087', '242094', '242102', '242104', '242445', '242612', '242794', '242796', '242797', '242798', '242826', '242827']
    nosy_count = 10.0
    nosy_names = ['gvanrossum', 'rhettinger', 'pitrou', 'scoder', 'vstinner', 'lukasz.langa', 'python-dev', 'Ludovic.Gasc', 'martin.panter', 'yselivanov']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue24018'
    versions = ['Python 3.5']

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 20, 2015

    Currently, CPython tends to assume that generators are always implemented by a Python function that uses the "yield" keyword. However, it is absolutely possible to implement generators as a protocol by adding send(), throw() and close() methods to an iterator.

    The problem is that these will usually be rejected by code that needs to distinguish generators from other input, e.g. in asyncio, as this code will commonly do a type check against types.GeneratorType. Instead, it should check for the expected protocol. The best way to achieve this is to extend the existing ABCs with a Generator ABC that external generator implementations can register on.

    Related to bpo-24004 (asyncio). Asyncio could provide a backport copy of this for older Python versions.

    I assume this is considered a new feature that cannot be added to 3.4 anymore?

    @scoder scoder added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 20, 2015
    @gvanrossum
    Copy link
    Member

    It's missing tests. :-)

    Otherwise looks quite sensible.

    Also, shouldn't you override __subclasshook__ so you don't inherit it from Iterator?

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 20, 2015

    Ok, sure. Here's a new patch that adds tests and docs.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 20, 2015

    Sorry, here's another doc fix.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 20, 2015

    Is it a problem that the check can't be done in a fast way from C code?

    Other than that, sounds good to me.

    @ambv ambv self-assigned this Apr 20, 2015
    @rhettinger
    Copy link
    Contributor

    For the other ABCs, if you define the required abstract methods, you get working versions of all the mixin methods.

    In the case of the Generator ABC, throw() and close() are useless empty stub methods. In the other ABCs, we leave optional methods out entirely. A user should expect that if isinstance(g, Generator) is true that all of the ABC methods will work.

    Also, the return StopIteration in throw() doesn't seem correct. Shouldn't it raise the exception that was thrown?

        >>> def f():
                yield x
    >>> g = f()
    >>> g.throw(KeyError)
    
        Traceback (most recent call last):
          File "<pyshell#11>", line 1, in <module>
            g.throw(KeyError)
          File "<pyshell#9>", line 1, in f
            def f():
        KeyError

    Ideally, there should be a practical example of where this ABC would be useful with anything other than a real generator.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 21, 2015

    Here's a new patch that addresses the review comments. I kept throw() and close() non-abstract and added an example to the tests instead that implements a minimal generator by inheriting these methods from the Generator base class, using it as a mixin. It only needs to implement send().

    I don't think it's unreasonable to assume that there are use cases where a generator that is implemented in a class instead of a yield-function does not require special cleanup actions in its close()/throw(). Or is it *required* that a generator stops working when close() is called?

    There are also iterators that raise StopIteration at some point to signal that they are temporarily exhausted, but then eventually restart returning values from their __next__() method when they have received more to return. Avoids recreating the iterator object after each exhaustion cycle.

    I extended the default implementation of close() to call throw(GeneratorExit), though, because that's how the protocol is defined. Unless users implement throw(), it won't make a difference for them. And if they do, they may even get away with not reimplementing close().

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 21, 2015

    Is it a problem that the check can't be done in a fast way from C code?

    C code can still quickly special case the generator type in the positive case, and it will usually be interested in an exact type match anyway. Determining that an object is *not* (compatible with) a generator might lead to some degradation, but in most cases, at least in the interpreter core, this would be an error case anyway, so being slower here should rarely hurt.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 21, 2015

    Next question: should inspect.isgenerator() be changed? Or should its usages be replaced by isinstance(obj, Generator) ?

    @vadmium
    Copy link
    Member

    vadmium commented Apr 21, 2015

    I agree that there is no big reason why we should force generators to stop working after close(). Your new default implementation of close() is probably the right thing too. I added a few new comments on Reitveld.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 21, 2015

    Good catch with the RuntimeError. Patch updated.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 21, 2015

    should inspect.isgenerator() be changed?

    Replying to myself, one thing that speaks against this approach is that "inspect" also has the functions "getgeneratorlocals()" and "getgeneratorstate()", which depend on "gen.gi_frame". Cython could emulate that, but normal user code usually can't. It's definitely not part of the Generator protocol but an implementation detail of GeneratorType.

    @gvanrossum
    Copy link
    Member

    Is it an option to leave inspect alone? Its definition and use of generators is very focused on the builtin implementation.

    Although this means that code that wants to do the right thing but also be backwards compatible has to use something like

    def isgen(g):
        if hasattr(collections.abc, 'Generator'):
            return isinstance(c, collections.abc.Generator)
        else:
            return inspect.isgenerator(g)
    

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 21, 2015

    Yes, code usually doesn't fall from the sky with a new CPython release. If something wants to make use of this ABC, it still has to find ways to also work with older CPythons. What would be a good fallback? A backport on PyPI?

    I wouldn't even mind shipping this class with Cython and dropping it right into "collections.abc" if it's missing. Cython already contains lots of compatibility code, and we need to patch *something* in all current CPythons either way. Then your code snippet would be the right thing to do for user code (with the twist that Py2.x doesn't have "collections.abc"...).

    @gvanrossum
    Copy link
    Member

    Realistically only Cython will care much about this, so I'm okay if Cython just monkeypatches the collections package.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 21, 2015

    In some distant future, Numba might also care too :)
    (right now, we only support compiling basic generators, i.e. no send() and no throw())

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 21, 2015

    Yes, and there are certainly other compilers and tools. However, it's going to be a short list, so if Py3.5 takes the lead, they can all just agree on the one way to do it and let the first patcher win.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 23, 2015

    Adding a patch for the inspect docs that refers to the Right Way To Do It.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 24, 2015

    Searching for Python code that seems to implement the Generator protocol doesn't return much:

    https://code.openhub.net/search?s=%22def%20throw%28%22%20%22def%20send%28%22%20%22def%20close%28%22&pp=0&fl=Python&mp=1&ml=1&me=1&md=1&ff=1&filterChecked=true

    But at least it pointed me to this:

    https://hg.python.org/cpython/file/5c0247a6f98a/Lib/multiprocessing/managers.py#l922

    So, wrapping generators (apparently for remote calls in this case) seems to be another use case.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 25, 2015

    FYI, here's the patch implementation for Cython:

    https://github.com/cython/cython/blob/cf63ff71c06b16c3a30facdc7859743f4cd495f6/Cython/Utility/Generator.c#L849

    The only difference is that it takes care of changing "__next__" to "next" in Py2.x.

    @LudovicGasc
    Copy link
    Mannequin

    LudovicGasc mannequin commented Apr 26, 2015

    Sorry guys to be basic for you, but if I take my "AsyncIO end-user" hat, I'm not sure to understand the potential end-user source code impacts to use Cython with Python 3.5 and AsyncIO.

    In concrete terms, it's only a low-level change, Cython will monkeypatch CPython if it's missing. I can continue to use asyncio.iscoroutine() function to detect coroutines.
    Or it should be better to change something in AsyncIO libs and/or end-user source code ?

    With the potential async/await inclusion in Python 3.5, it should be good to know if something else is necessary to help for the Cython support.

    @rhettinger
    Copy link
    Contributor

    In the lastest patch, the close() method is now a valid mixin method.

    However, the throw() method should be made abstract because it doesn't provide the required operation (it doesn't even use the "self" argument) or it should be left out entirely (i.e. Numba only supporting next()) if the presence of throw() is not required.

    Going back to Stefan's original use case, the problem being solved is that isinstance(g, types.GeneratorType) rejects regular classes that implement send(), next(), throw(), and close(), presumably because it is going to call those methods and expect that they work.

    If an object tests positive for isinstance(g, collections.abc.Generator), what can we then assume about the object. It would be weird to pass that test, see a throw() method, call it and have it do something other than raise an exception inside the generator.

        g = lazy_readline_from_connection('171.0.0.1')
        if isinstance(g, collections.abc.Generator):
            # We passed the isinstance check, now what does that
            # actually mean?  What is guaranteed to work?
            next(g)
            g.close()    # Expect this to close the connection

    If a working throw() isn't actually required, then the code ought to be checking for isinstance(obj, collections.abc.Iterator) or somesuch; otherwise, was is the point of doing any checks for a generator-like object?

    I don't think this patch should go in until it is capable of doing something meaningful. Otherwise, it looks like at attempt to bluff its way past generator checks that were presumably there for a reason.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 26, 2015

    I think the throw() method should be required.
    If you don't need throw(), send() or close(), then you aren't really asking for a full-blown generator: you are asking for an iterator, so you can just check for collections.Iterator.

    (PS: why is this bug assigned to Lukasz?)

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 26, 2015

    PEP-342 isn't really conclusive here as it intended to define the protocol based on the de-facto design of a yield-based generator function. Trying to abstract from that poses the question how a class based generator implementation should look like. Specifically, it is not clear in this case what "raise an exception *inside* the generator" even means.

    As Antoine pointed out, there definitely has to be more than an Iterator. However, it's not clear that throw() and close() are always required for an implementation. Simple Generators might get away without special error handling and cleanup.

    Similarly, a Generator may not allow (or expect) values to be passed in and only make use of close() for cleanup (and potentially also throw()).

    So, there seem to be at least three use cases for the Generator protocol here:

    1. an Iterator that can accept input values via send()
    2. an Iterator that needs to safely do certain cleanup operations after use (throw/close)
    3. a complete Generator that accepts input values and cleans up after it

    Given that there used to be only one implementation, I think we can assume that a lot of code depends on the complete protocol. Some recipients will certainly be aware of the exact subset of the protocol that the Generator they have in their hands makes use of, but if they don't, they'll just have to assume it supports everything and requires proper cleanup on errors and on termination.

    Therefore, I think it's important to cover the complete protocol in the Generator ABC. I also think it's helpful to not require users to override throw() in a subclass, as they might not need it. "Throwing an exception inside of them" might simply not have a meaning for them. If they do need it, however, then the current implementation might help them to properly raise the exception, given that the 3-argument raise statement is no longer available in Py3.

    Both reasons (whether you need throw() or not) make me think that it should not be abstract. The same applies to close(), which has a meaningful implementation now that subclasses can make use of in order to avoid having to implement both close() and throw().

    And yes, this is about sneaking past generator type checks because most of them are actually *not* there for a reason. Asyncio is just one example.

    @rhettinger
    Copy link
    Contributor

    Therefore, I think it's important to cover the complete protocol
    in the Generator ABC. I also think it's helpful to not require
    users to override throw() in a subclass, as they might not need it.

    Sorry, but I think you're fighting the fundament nature of what the ABCs are supposed to do and I object to the patch going in as-is.

    Please either

    1. drop the throw() method entirely or
    2. make throw an abstractmethod()

    If it is an abstractmethod, the user is free to implement a throw() method that raises a NotImplementedError, but at least they will do so consciously rather than having it come-up expectedly.

    When I teach engineers how to use the collections ABCs, they rely on the rules:

    • when building a class that inherits from an ABC,
      if you supply the required abstract methods,
      then the mixin method just work.

    • if an object passes the isinstance check,
      then it is fair game to call any of the listed
      methods an expect it to work.

    Until now, those rules were followed by all the ABCs. I really don't want to break the basic contract of what the ABC is supposed to mean.

    @rhettinger rhettinger assigned rhettinger and unassigned ambv Apr 27, 2015
    @gvanrossum
    Copy link
    Member

    I'm with Raymond.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 27, 2015

    Please either

    1. drop the throw() method entirely or
    2. make throw an abstractmethod()

    Ok, as I already said, I think it's important to provide the complete protocol as code will usually expect that. Also, close() has a helpful implementation, but it depends on throw(). Thus, I'll go with 2) and make throw() abstract. It's easy enough to call super(), but then it's explicit. I agree that that's better here.

    Patch updated.

    @rhettinger
    Copy link
    Contributor

    The latest patch looks good overall.

    Łukasz, assigning back to you.

    @rhettinger rhettinger assigned ambv and unassigned rhettinger Apr 27, 2015
    @gvanrossum
    Copy link
    Member

    Yeah, looks good -- Łuke, can you commit this?

    @ambv
    Copy link
    Contributor

    ambv commented May 5, 2015

    Yup, will do.

    @scoder
    Copy link
    Contributor Author

    scoder commented May 9, 2015

    This is blocking bpo-24017 (async/await syntax).

    @gvanrossum
    Copy link
    Member

    Ask Yury if he'll commit it for you. It's ready.

    @rhettinger rhettinger assigned rhettinger and unassigned ambv May 9, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 9, 2015

    New changeset ba5d7041e2f5 by Raymond Hettinger in branch 'default':
    Issue bpo-24018: Add a collections.Generator abstract base class.
    https://hg.python.org/cpython/rev/ba5d7041e2f5

    @scoder
    Copy link
    Contributor Author

    scoder commented May 9, 2015

    Thanks! Minor grouch: it should say "collections.*abc*.Generator" in the NEWS entry.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 9, 2015

    New changeset f7cc54086cd2 by Guido van Rossum in branch 'default':
    Fix news entry for bpo-24018.
    https://hg.python.org/cpython/rev/f7cc54086cd2

    @gvanrossum
    Copy link
    Member

    Fixed.

    On Fri, May 8, 2015 at 10:28 PM, Stefan Behnel <report@bugs.python.org>
    wrote:

    Stefan Behnel added the comment:

    Thanks! Minor grouch: it should say "collections.*abc*.Generator" in the
    NEWS entry.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue24018\>


    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants