Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pure Python operator.index doesn't match the C version. #62912

Open
mdickinson opened this issue Aug 12, 2013 · 21 comments
Open

Pure Python operator.index doesn't match the C version. #62912

mdickinson opened this issue Aug 12, 2013 · 21 comments
Labels
3.11 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@mdickinson
Copy link
Member

BPO 18712
Nosy @arigo, @mdickinson, @ericsnowcurrently, @zware, @serhiy-storchaka, @corona10, @iritkatriel
Dependencies
  • bpo-17576: PyNumber_Index() is not int-subclass friendly (or operator.index() docs lie)
  • Files
  • operator_index.patch
  • x.py
  • operator_index_2.patch
  • y.py
  • operator_index_3.patch
  • operator_index_4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2013-08-12.14:09:38.011>
    labels = ['extension-modules', 'type-bug', '3.11']
    title = "Pure Python operator.index doesn't match the C version."
    updated_at = <Date 2021-10-02.15:33:51.696>
    user = 'https://github.com/mdickinson'

    bugs.python.org fields:

    activity = <Date 2021-10-02.15:33:51.696>
    actor = 'corona10'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2013-08-12.14:09:38.011>
    creator = 'mark.dickinson'
    dependencies = ['17576']
    files = ['31273', '31334', '31335', '31336', '31344', '31353']
    hgrepos = []
    issue_num = 18712
    keywords = ['patch']
    message_count = 21.0
    messages = ['194966', '195000', '195006', '195064', '195451', '195454', '195457', '195458', '195474', '195476', '195489', '195490', '195510', '195513', '195545', '195546', '195547', '195548', '195549', '195665', '401603']
    nosy_count = 7.0
    nosy_names = ['arigo', 'mark.dickinson', 'eric.snow', 'zach.ware', 'serhiy.storchaka', 'corona10', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue18712'
    versions = ['Python 3.11']

    @mdickinson
    Copy link
    Member Author

    Nitpick: the pure Python version of operator.index (new in Python 3.4, introduced in issue bpo-16694) doesn't match the C version, in that it looks up __index__ on the object rather than the class.

    iwasawa:cpython mdickinson$ ./python.exe
    Python 3.4.0a1+ (default:9e61563edb67+, Aug 12 2013, 14:45:12) 
    [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from test import support
    >>> py_operator = support.import_fresh_module('operator', blocked=['_operator'])
    >>> c_operator = support.import_fresh_module('operator', fresh=['_operator'])
    >>> class A(int): pass
    ... 
    >>> a = A(42); a.__index__ = lambda: 1729
    >>> 
    >>> py_operator.index(a)
    1729
    >>> c_operator.index(a)
    42

    @mdickinson mdickinson added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Aug 12, 2013
    @serhiy-storchaka
    Copy link
    Member

    We can use type(a).__index__(a). Should we also correct the documentation for operator.index() and operator.length_hint()?

    @mdickinson
    Copy link
    Member Author

    Yes, I think it would make sense to fix the docs as well, at least for Python 3.4. Probably not worth it for the maintenance releases.

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch. Now the code of operator.index() becomes even more complicated. Perhaps you want suggest other wording for documentation?

    Some code in stdlib (pyio.py, bz2.py, connection.py) uses a.__index_() instead of type(a).__index__(a) (with replacing AttributeError to TypeError). Is it worth to change?

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Aug 17, 2013

    Just mentioning it here again, but "type(a).__index__(a)" is still not perfectly correct. Attached is a case where it differs.

    I think you get always the correct answer by evaluating "range(a).stop". It's admittedly obscure... For example:

        class A:
            def __index__(self):
                return -42**100
    
        a = A()
        print(range(a).stop)

    @serhiy-storchaka
    Copy link
    Member

    The difference doesn't look significant. In all cases the TypeError is raised.

    But there was other differences between C and Python versions -- when __index__() returns non-integer. Updated patch fixes this.

    @mdickinson
    Copy link
    Member Author

    Just mentioning it here again, but "type(a).__index__(a)" is still not perfectly correct.

    Hmm. "type(a).__dict__'__index__'" ? (With suitable error checks, as in Serhiy's patch.)

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Aug 17, 2013

    The difference doesn't look significant. In all cases the TypeError is raised.

    Ok, so here is another case. (I won't go to great lengths trying to convince you that there is a problem, because the discussion already occurred several times; google for example for "_PyType_Lookup pure Python")

    @serhiy-storchaka
    Copy link
    Member

    Hmm. "type(a).__dict__'__index__'" ?

    This variant fails on:

        class A(int):
            @staticmethod
            def __index__():
                return 42

    @mdickinson
    Copy link
    Member Author

    Serhiy: Yep, or even on bool. Thanks.

    Armin: I don't think either of us thinks there isn't a problem here. :-)
    The Google search you suggested didn't turn up a whole lot of useful information for me. Was there a discussion of this on python-dev at some point? (And if not, should there be?)

    For this *particular* issue, it seems we can't exactly reproduce. type(a).__index__(a) seems like the best practical approximation to the true behaviour. I'm guessing that it's fairly rare to have useful definitions of __index__ on a metaclass.

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Aug 17, 2013

    This may have been the most recent discussion of this idea (as far as I can tell):
    http://mail.python.org/pipermail//python-ideas/2012-August/016036.html

    Basically, it seems to be still unresolved in the trunk Python; sorry, I thought by now it would have been resolved e.g. by the addition of a method on types or a function in the operator module. In the absence of either, you need either to simulate its behavior by doing this:

        for t in type(a).__mro__:
            if '__index__' in t.__dict__:
                return t.__dict__['__index__'](a)

    Or you can piggyback on an unrelated call that simply causes the C-level PyNumber_Index() to be called:

        return range(a).stop

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Aug 17, 2013

    Sorry, realized that my pure Python algorithm isn't equivalent to _PyType_Lookup() --- it fails the staticmethod example of Serhiy. A closer one would be:

        for t in type(a).__mro__:
            if '__index__' in t.__dict__:
                return t.__dict__['__index__'].__get__(a)()

    But it's still not a perfect match. I think right now the only "perfect" answer is given by workarounds like "range(a).stop"...

    @serhiy-storchaka
    Copy link
    Member

    Here is updated patch which uses Armin's algorithm for lookup special methods and adds special case for int subclasses in index().

    I have no idea how the documentation should look.

    @ericsnowcurrently
    Copy link
    Member

    Couldn't you make use of inspect.getattr_static()?

    getattr_static(obj.__class__, '__index__').__get__(obj)()

    getattr_static() does some extra work to get do the right lookup. I haven't verified that it matches _PyType_Lookup() exactly, but it should be pretty close at least.

    Also, pickle (unfortunately) also does lookup on instances rather than classes for the special methods (issue bpo-16251).

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Aug 18, 2013

    For completeness, can you post one line saying why the much simpler solution "range(a).stop" is not accepted?

    @serhiy-storchaka
    Copy link
    Member

    Here is a variant with getattr_static().

    @serhiy-storchaka
    Copy link
    Member

    For completeness, can you post one line saying why the much simpler solution "range(a).stop" is not accepted?

    Because it is implementation detail. The range() function if it will be implemented in Python needs operator.index().

    The main purpose of adding Python implementation of the operator module was to provide an implementation which can be used in alternative Python implementations (e.g. PyPy). The CPython itself doesn't use it. And now I doubt that such complicated implementation will be helpful.

    Perhaps we need a builtin which exposes _PyType_Lookup() at Python level.

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Aug 18, 2013

    Ah. If that's the only reason, then that seems a bit like misguided effort... For alternative implementations like PyPy and Jython, the "_operator" module is definitely one of the simplest ones to reimplement in RPython or Java. Every function is straightforwardly translated to just one call to an internal function -- that we need to have already for the rest of the language.

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Aug 18, 2013

    ...but yes, it's very obvious that exposing _PyType_Lookup() to pure Python is the right thing to do here. This is a central part of the way Python works internally, after all.

    Moreover, sorry about my previous note: if we started today to write PyPy, then it would be enough to have the pure Python version of operator.index(), based on the newly exposed _PyType_Lookup(). With PyPy's JIT, there is no real performance loss. Some of my confusion came from the fact that there *would* be serious performance loss if we had to work with the pure Python looping-over-mro-and-fishing-in-dict version.

    @zware
    Copy link
    Member

    zware commented Aug 19, 2013

    A lot of this discussion has flown a rather unfortunate distance over my head, especially since I've barely had time to follow it. But it looks to me like--given the number of other places that do the same thing as operator.index currently does--there needs to be a simple way to do the right thing somewhere accessible, which probably means a builtin.

    On the other hand, it seems to me like 'a.__index__()' *should* be "the right thing" to do, but I get the feeling that making that so would be an astronomically huge change without much real benefit and lots of opportunities to break everything. I suspect I'm also missing something fundamental in why it's not the right thing to do.

    @iritkatriel
    Copy link
    Member

    Reproduced on 3.11.

    @iritkatriel iritkatriel added the 3.11 only security fixes label Sep 10, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants