Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pure Python operator module #60898

Closed
zware opened this issue Dec 16, 2012 · 39 comments
Closed

Add pure Python operator module #60898

zware opened this issue Dec 16, 2012 · 39 comments
Labels
extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@zware
Copy link
Member

zware commented Dec 16, 2012

BPO 16694
Nosy @brettcannon, @rhettinger, @jcea, @pitrou, @ezio-melotti, @merwok, @alex, @bitdancer, @meadori, @zware, @serhiy-storchaka
Files
  • py_operator.v10.diff: Version 10
  • py_operator.v11.diff: Version 11, now with proper git format
  • py_operator.v12.diff: Version 12
  • py_operator.v13.diff: Version 13
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-04-20.17:22:45.334>
    created_at = <Date 2012-12-16.07:24:41.590>
    labels = ['extension-modules', 'type-feature', 'library']
    title = 'Add pure Python operator module'
    updated_at = <Date 2013-05-11.02:57:58.758>
    user = 'https://github.com/zware'

    bugs.python.org fields:

    activity = <Date 2013-05-11.02:57:58.758>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-04-20.17:22:45.334>
    closer = 'pitrou'
    components = ['Extension Modules', 'Library (Lib)']
    creation = <Date 2012-12-16.07:24:41.590>
    creator = 'zach.ware'
    dependencies = []
    files = ['28904', '29844', '29869', '29887']
    hgrepos = []
    issue_num = 16694
    keywords = ['patch']
    message_count = 39.0
    messages = ['177579', '177585', '177587', '177795', '177799', '177810', '177865', '177871', '177895', '177899', '177901', '177902', '177907', '177908', '177910', '177926', '178778', '178787', '178813', '178830', '178838', '180948', '186814', '186883', '186916', '187003', '187007', '187023', '187043', '187046', '187048', '187049', '187050', '187056', '187103', '187148', '187440', '187441', '188890']
    nosy_count = 13.0
    nosy_names = ['brett.cannon', 'rhettinger', 'jcea', 'pitrou', 'ezio.melotti', 'eric.araujo', 'Arfrever', 'alex', 'r.david.murray', 'meador.inge', 'python-dev', 'zach.ware', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16694'
    versions = ['Python 3.4']

    @zware
    Copy link
    Member Author

    zware commented Dec 16, 2012

    (Brett, I've made you nosy due to the relation to bpo-16651.)

    Here is a pure Python implementation of the operator module, or at least a first draft thereof :). I'm attaching the module itself, as well as a patch to integrate it.

    Any and all review is quite welcome. I'm confident in the fact that the module as it stands passes all current tests, but how it gets there is entirely up for debate (namely, the attrgetter, itemgetter, and methodcaller classes, as well as length_hint(), countOf(), and indexOf()).

    Note that there's also a change to hmac.py; _compare_digest() in operator.c doesn't seem to have any relation to the rest of the module (see bpo-15061 discussion) and is private anyway, so operator.py doesn't go near it. hmac.py has to import directly from _operator.

    Thanks,

    Zach Ware

    @zware zware added extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Dec 16, 2012
    @serhiy-storchaka
    Copy link
    Member

    Here is a functional (and more effective) equivalent of attrgetter:

    def attrgetter(attr, *attrs):
        """
        Return a callable object that fetches the given attribute(s) from its operand.
        After f=attrgetter('name'), the call f(r) returns r.name.
        After g=attrgetter('name', 'date'), the call g(r) returns (r.name, r.date).
        After h=attrgetter('name.first', 'name.last'), the call h(r) returns
        (r.name.first, r.name.last).
        """
        if not attrs:
            if not isinstance(attr, str):
                raise TypeError('attribute name must be a string')
            names = attr.split('.')
            def func(obj):
                for name in names:
                    obj = getattr(obj, name)
                return obj
            return func
        else:
            getters = tuple(map(attrgetter, (attr,) + attrs))
            def func(obj):
                return tuple(getter(obj) for getter in getters)
            return func

    @serhiy-storchaka
    Copy link
    Member

    Perhaps Modules/operator.c should be renamed to Modules/_operator.c.

    Also note, that error messages in Python an C implementations sometimes differ.

    @zware
    Copy link
    Member Author

    zware commented Dec 20, 2012

    Sorry to have disappeared on this, other things took priority...

    Thank you for the comments, Serhiy. v2 of the patch renames Modules/operator.c to Modules/_operator.c, and changes that name every place I could find it.

    I also tried to tidy up some of the error message mismatches. I didn't bother with the ones regarding missing arguments, as that would mean checking args and throwing an exception in each and every function.

    I do like the functional attrgetter better than the object version I wrote. The main reason I went with an object version in the first place was because that's what the C implementation used. Is there any reason not to break with the C implementation and use a function instead? The updated patch takes a rather ugly hack to try to use the functional version in an object.

    length_hint() was horrible and has been rewritten. It should be less horrible now :). It should also follow the C implementation quite a bit better.

    @zware
    Copy link
    Member Author

    zware commented Dec 20, 2012

    Considering what a huge headache it was to get my own patch to apply at home on Linux rather than at work on Windows, here's a new version of the patch that straightens out the line ending nightmare present in v2. No other changes made.

    @serhiy-storchaka
    Copy link
    Member

    Sorry, I forgot push a "Publish All My Drafts" button. Please consider other my comments to first patch. I also have added new comments about length_hint().

    Your implementation of attrgetter() looks good. One possible disadvantage of pure functional approach is that attrgetter() will be not a class. Unlikely someone subclass attrgetter, but it can be used in an isinstance() check. You solve this issue.

    The same approach can be applied to itemgetter().

    @zware
    Copy link
    Member Author

    zware commented Dec 21, 2012

    Here's v4, addressing Serhiy's comments on Reitveld.

    @serhiy-storchaka
    Copy link
    Member

    About length_hint():

    I were mean something like (even explicit getattr() not needed):

    try:
    hint = type(obj).__length_hint__
    except AttributeError:
    return default
    try:
    val = hint(obj)
    except TypeError:
    return default
    ...

    This is a little faster because there is only one attribute lookup instead two. This is a little safer because there is a little less chance of race when an attribute changed between two lookups (it is enough non-probably and doesn't matter).

    There is type(obj) here because the C code uses _PyObject_LookupSpecial() which doesn't honor instance attributes and looks only class attributes.

    About concat() and iconcat():

    I think only first argument can be checked. If arguments are not concatenable then '+'/'+=' operator will raise an exception. I'm not sure. Does anyone have any thoughts about this?

    About methodcaller():

    Here is a catch. With this implementation you can't use methodcaller('foo', name='spam') or methodcaller('foo', self='spam') (please add tests for those cases). Here is a trick needed:

    def __init__(*args, **kwargs):
        self = args[0]
        self._name = args[1]
        self._args = args[2:]
        self._kwargs = kwargs

    (You can add a code for better error reporting).

    I have added smaller comments on Rietveld.

    @zware
    Copy link
    Member Author

    zware commented Dec 21, 2012

    Here's another new version. Changes include:

    • Address Serhiy's Rietveld comments
    • Fix length_hint() the way it was meant to be fixed last time.
    • Remove __getitem__ check on 'b' in concat and iconcat. More notes on this below.
    • Fix methodcaller as Serhiy suggested
    • Add test case for methodcaller for 'name' and 'self' keyword arguments
    • Add comments to 'subdivide' the module into the rough sections the docs are divided into. Move length_hint() with other sequence operations to also match the doc order.

    On concat and iconcat: Looking at the glossary, a sequence should actually have both __getitem__ and __len__. The test class in the test case for iconcat only defines __getitem__, though. Should we check only for __getitem__ on the first argument, or check for both __getitem__ and __len__, and add __len__ to the test class? Requiring __len__ may cause breakage for anyone using the Python implementation with a class they defined and used with the C implementation with only __getitem__, so I'm leaning towards only checking for __getitem__. I can't really tell what the C implementation really looks for as I don't speak C, but it almost looks to me like it may be only checking for __getitem__. Latest patch only checks argument 'a' for __getitem__.

    @serhiy-storchaka
    Copy link
    Member

    Good work, Zachary. I have no more nitpicks for you. ;)

    LGTM.

    @serhiy-storchaka
    Copy link
    Member

    One comment to a committer. Don't forget to run hg rename [Modules/operator.c](https://github.com/python/cpython/blob/main/Modules/operator.c) [Modules/_operator.c](https://github.com/python/cpython/blob/main/Modules/_operator.c) before applying the patch.

    @zware
    Copy link
    Member Author

    zware commented Dec 21, 2012

    Nits are no fun; thank you for picking them, Serhiy ;)

    @merwok
    Copy link
    Member

    merwok commented Dec 21, 2012

    FYI Mercurial can use the extended diff format invented by git, which supports renames, changes to file permissions, etc.

    @merwok
    Copy link
    Member

    merwok commented Dec 21, 2012

    The base test class should not inherit from TestCase: it will be picked up by test discovery and then will break, as self.module will be None.

    Typical usage:

    class OperatorTestsMixin:
        module = None
    
    class COperatorTests(OperatorTestsMixin, unittest.TestCase):
        module = _operator

    @zware
    Copy link
    Member Author

    zware commented Dec 21, 2012

    Did not know that about test discovery, thank you Éric. Fixed in v6.

    A few other test modules may need the same fix; I based my changes to Lib/test/test_operator.py on Lib/test/test_heapq.py which has the same issue. I'll open a new report for it and any others I find.

    Also, this patch was created with hg diff -g; the operator.c rename should be well taken care of by this patch.

    @serhiy-storchaka
    Copy link
    Member

    I don't understand what is difference between v5 and v6.

    @serhiy-storchaka serhiy-storchaka self-assigned this Dec 29, 2012
    @zware
    Copy link
    Member Author

    zware commented Jan 1, 2013

    Sorry, I misunderstood Éric's suggestions regarding the tests; v6 is useless. v7 forthcoming.

    @zware
    Copy link
    Member Author

    zware commented Jan 1, 2013

    Ok, I believe the attached v7 properly addresses Éric's concerns about test discovery, and has no other changes unrelated to that compared to v5.

    Thank you very much to Ezio for directing me towards the json tests for an example to work from.

    @serhiy-storchaka
    Copy link
    Member

    v8 LGTM (except some trailing whitespaces).

    @zware
    Copy link
    Member Author

    zware commented Jan 2, 2013

    Note to self: learn to run patchcheck.py before posting. Whitespace issues fixed in v9.

    @serhiy-storchaka
    Copy link
    Member

    If no one objects I will commit this next week.

    @zware
    Copy link
    Member Author

    zware commented Jan 29, 2013

    Since the older Windows project files were removed, v10 removes the patches to them.

    Everything else still applies cleanly.

    Also, in the spirit of what Brett said in 16651 about not re-implementing blindly, I did just look up what Jython, IronPython, and PyPy do for the operator module. The first two implement it in their VM language, and PyPy uses a very specialized version that didn't look easy to adapt to CPython, at least at a glance. It was fun for me to write any way about it, though :)

    @pitrou
    Copy link
    Member

    pitrou commented Apr 13, 2013

    Zachary, I suppose Modules/_operator.c is a rename of Modules/operator.c.
    Could you generate your patch using "hg diff --git" so that history isn't lost here?

    See also http://docs.python.org/devguide/committing.html#minimal-configuration

    @zware
    Copy link
    Member Author

    zware commented Apr 14, 2013

    Zachary, I suppose Modules/_operator.c is a rename of Modules/operator.c.
    Could you generate your patch using "hg diff --git" so that history isn't lost here?

    Of course; I thought I already had, but apparently I messed that up a bit. v11 is in the proper format. In it, you can actually see what was changed in Modules/operator.c, which is the necessary s/operator/_operator/ changes, and a few extra commas removed from a couple of docstrings (to match the docstrings in the new Python versions).

    See also http://docs.python.org/devguide/committing.html#minimal-configuration

    Thank you for that link! I had read through this some time ago, but either missed the part about the diff section, or it just didn't sink in or something. That is now added to my hg config file :)

    @pitrou
    Copy link
    Member

    pitrou commented Apr 14, 2013

    Thank you!
    One optional thing, the code churn could be minimized in test_operator.py by writing "operator = self.module" at the beginning of each test method.
    Otherwise, looks good to me.

    @zware
    Copy link
    Member Author

    zware commented Apr 15, 2013

    Here's another new version of the patch, addressing Ezio's review comments and a few things I found after giving operator.py a closer look myself.

    Things changed in operator.py in this version:

    • all __func__ = func assignments are moved to the end, after importing * from _operator. With the assignments after each func, func was still the Python version after importing from _operator. I suspect this means that _operator.c could be changed to not mess with creating each func and just let operator.py do it, but not being a native C speaker, I don't know how to do it. Also, there is an added test case to test whether func is func. It passes with the rest of the patch, but would fail on current operator.c; it seems that operator.c actually creates separate func and func functions (that do the same thing).

    • If importing from _operator succeeds, import __doc__ from _operator as well. The Python implementation has an extra note at the end of __doc__ advertising that it is a Python implementation.

    Also, after submitting this patch, I'm going to try to clean up the files list on this issue a bit. I'll clear the nosy list while I do so to avoid spamming everybody with messages about it. (At least, I assume I can do so, I haven't tried this before :). If I can't clear the nosy list, I won't bother with cleaning up the files, again to avoid spamming)

    @zware
    Copy link
    Member Author

    zware commented Apr 15, 2013

    A change that I mentioned in a Rietveld comment on v10, but not in my last message: __all__ in operator.py no longer includes all of the __func__s, as currently doing "from operator import *" does not import all of the __func__s.

    @serhiy-storchaka
    Copy link
    Member

    I think Antoine is more appropriate for committing this patch. I waited so long with this because I do not dare to take responsibility for themselves (it's almost like adding a new module).

    @serhiy-storchaka serhiy-storchaka removed their assignment Apr 15, 2013
    @rhettinger
    Copy link
    Contributor

    I would like to spend some time with this before it goes forward (especially the attrgetter, itemgetter, methodgetter group).

    Right now, it looks like a nice effort but I don't see how it makes Python any better for adding it. The odds are that this code will add bloat but not benefit any user (it won't get called at all).

    @rhettinger rhettinger self-assigned this Apr 16, 2013
    @bitdancer
    Copy link
    Member

    Raymond: it's not for the benefit of CPython.

    @rhettinger
    Copy link
    Contributor

    [David]

    Raymond: it's not for the benefit of CPython.

    IIRC, all the other implementations of Python already have this code passing tests, so it isn't really for their benefit either.

    @alex
    Copy link
    Member

    alex commented Apr 16, 2013

    If a pure python operator module were a part of the stdlib, we (PyPy) would probably delete most (if not all) of our own operator module.

    @rhettinger
    Copy link
    Contributor

    I reviewed the attrgetter(), mathodgetter(), and itemgetter() code in py_operator.v12.diff. The looks clean and correct.

    @rhettinger rhettinger removed their assignment Apr 16, 2013
    @serhiy-storchaka
    Copy link
    Member

    Now we can remove all __func__s from _operator.c.

    @zware
    Copy link
    Member Author

    zware commented Apr 16, 2013

    Thank you for the review, Raymond.

    Since Serhiy agrees that the _operator __func__s are unnecessary, here's a v13 that removes them. Again, I'm not a native C speaker, so these new changes in _operator.c deserve a bit of extra scrutiny. Everything builds and still passes the test suite, though.

    Also changed in this patch, test_pow and test_inplace remove explicit testing of __func__s. Those tests are useless, as they are merely rerunning already run tests on the same function with a different name, which is confirmed by test_dunder_is_original. I can extend that test with an explicit list of funcs which should have a __func__ if anyone thinks it's worth it.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 17, 2013

    length_hint() looks ok as well.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 20, 2013

    New changeset 97834382c6cc by Antoine Pitrou in branch 'default':
    Issue bpo-16694: Add a pure Python implementation of the operator module.
    http://hg.python.org/cpython/rev/97834382c6cc

    @pitrou
    Copy link
    Member

    pitrou commented Apr 20, 2013

    I've now commited the latest patch. Thank you very much, Zachary!

    @pitrou pitrou closed this as completed Apr 20, 2013
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 11, 2013

    New changeset 4b3238923b01 by Raymond Hettinger in branch 'default':
    Issue bpo-16694: Add source code link for operator.py
    http://hg.python.org/cpython/rev/4b3238923b01

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants