New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pure Python operator.index doesn't match the C version. #62912
Comments
Nitpick: the pure Python version of operator.index (new in Python 3.4, introduced in issue bpo-16694) doesn't match the C version, in that it looks up __index__ on the object rather than the class. iwasawa:cpython mdickinson$ ./python.exe
Python 3.4.0a1+ (default:9e61563edb67+, Aug 12 2013, 14:45:12)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from test import support
>>> py_operator = support.import_fresh_module('operator', blocked=['_operator'])
>>> c_operator = support.import_fresh_module('operator', fresh=['_operator'])
>>> class A(int): pass
...
>>> a = A(42); a.__index__ = lambda: 1729
>>>
>>> py_operator.index(a)
1729
>>> c_operator.index(a)
42 |
We can use type(a).__index__(a). Should we also correct the documentation for operator.index() and operator.length_hint()? |
Yes, I think it would make sense to fix the docs as well, at least for Python 3.4. Probably not worth it for the maintenance releases. |
Here is a patch. Now the code of operator.index() becomes even more complicated. Perhaps you want suggest other wording for documentation? Some code in stdlib (pyio.py, bz2.py, connection.py) uses a.__index_() instead of type(a).__index__(a) (with replacing AttributeError to TypeError). Is it worth to change? |
Just mentioning it here again, but "type(a).__index__(a)" is still not perfectly correct. Attached is a case where it differs. I think you get always the correct answer by evaluating "range(a).stop". It's admittedly obscure... For example: class A:
def __index__(self):
return -42**100
a = A()
print(range(a).stop) |
The difference doesn't look significant. In all cases the TypeError is raised. But there was other differences between C and Python versions -- when __index__() returns non-integer. Updated patch fixes this. |
Hmm. "type(a).__dict__'__index__'" ? (With suitable error checks, as in Serhiy's patch.) |
Ok, so here is another case. (I won't go to great lengths trying to convince you that there is a problem, because the discussion already occurred several times; google for example for "_PyType_Lookup pure Python") |
This variant fails on: class A(int):
@staticmethod
def __index__():
return 42 |
Serhiy: Yep, or even on bool. Thanks. Armin: I don't think either of us thinks there isn't a problem here. :-) For this *particular* issue, it seems we can't exactly reproduce. type(a).__index__(a) seems like the best practical approximation to the true behaviour. I'm guessing that it's fairly rare to have useful definitions of __index__ on a metaclass. |
This may have been the most recent discussion of this idea (as far as I can tell): Basically, it seems to be still unresolved in the trunk Python; sorry, I thought by now it would have been resolved e.g. by the addition of a method on types or a function in the operator module. In the absence of either, you need either to simulate its behavior by doing this: for t in type(a).__mro__:
if '__index__' in t.__dict__:
return t.__dict__['__index__'](a) Or you can piggyback on an unrelated call that simply causes the C-level PyNumber_Index() to be called: return range(a).stop |
Sorry, realized that my pure Python algorithm isn't equivalent to _PyType_Lookup() --- it fails the staticmethod example of Serhiy. A closer one would be: for t in type(a).__mro__:
if '__index__' in t.__dict__:
return t.__dict__['__index__'].__get__(a)() But it's still not a perfect match. I think right now the only "perfect" answer is given by workarounds like "range(a).stop"... |
Here is updated patch which uses Armin's algorithm for lookup special methods and adds special case for int subclasses in index(). I have no idea how the documentation should look. |
Couldn't you make use of inspect.getattr_static()? getattr_static(obj.__class__, '__index__').__get__(obj)() getattr_static() does some extra work to get do the right lookup. I haven't verified that it matches _PyType_Lookup() exactly, but it should be pretty close at least. Also, pickle (unfortunately) also does lookup on instances rather than classes for the special methods (issue bpo-16251). |
For completeness, can you post one line saying why the much simpler solution "range(a).stop" is not accepted? |
Here is a variant with getattr_static(). |
Because it is implementation detail. The range() function if it will be implemented in Python needs operator.index(). The main purpose of adding Python implementation of the operator module was to provide an implementation which can be used in alternative Python implementations (e.g. PyPy). The CPython itself doesn't use it. And now I doubt that such complicated implementation will be helpful. Perhaps we need a builtin which exposes _PyType_Lookup() at Python level. |
Ah. If that's the only reason, then that seems a bit like misguided effort... For alternative implementations like PyPy and Jython, the "_operator" module is definitely one of the simplest ones to reimplement in RPython or Java. Every function is straightforwardly translated to just one call to an internal function -- that we need to have already for the rest of the language. |
...but yes, it's very obvious that exposing _PyType_Lookup() to pure Python is the right thing to do here. This is a central part of the way Python works internally, after all. Moreover, sorry about my previous note: if we started today to write PyPy, then it would be enough to have the pure Python version of operator.index(), based on the newly exposed _PyType_Lookup(). With PyPy's JIT, there is no real performance loss. Some of my confusion came from the fact that there *would* be serious performance loss if we had to work with the pure Python looping-over-mro-and-fishing-in-dict version. |
A lot of this discussion has flown a rather unfortunate distance over my head, especially since I've barely had time to follow it. But it looks to me like--given the number of other places that do the same thing as operator.index currently does--there needs to be a simple way to do the right thing somewhere accessible, which probably means a builtin. On the other hand, it seems to me like 'a.__index__()' *should* be "the right thing" to do, but I get the feeling that making that so would be an astronomically huge change without much real benefit and lots of opportunities to break everything. I suspect I'm also missing something fundamental in why it's not the right thing to do. |
Reproduced on 3.11. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: