New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyNumber_Index() is not int-subclass friendly (or operator.index() docs lie) #61776
Comments
operator.index() is just a thin wrapper around PyNumber_Index(). The documentation for operator.index() claims that it is equivalent to calling obj.__index__() but for subclasses of int, this is not true. In fact, PyNumber_Index() first does (e.g. in Python 3.3) a PyLong_Check() and if that succeeds, the original object is returned *without* doing the moral equivalent in C of calling obj.__index__(). An example: class myint(int):
def __index__(self):
return int(self) + 1 >>> x = myint(7)
>>> x.__index__()
8
>>> from operator import index
>>> index(x)
7 The C API documents PyNumber_Index() as: "Returns the o converted to a Python int on success or NULL with a TypeError exception raised on failure." Because this has been the behavior of PyNumber_Index() since at least 2.7 (I didn't check farther back), this probably cannot be classified as a bug deserving to be fixed in the code for older Pythons. It might be worth fixing for Python 3.4, i.e. by moving the index check before the type check. In the meantime, this is probably a documentation bug. The C API implies, but should be clearer that if o is an int subtype (int and long in Python 2), it is returned unchanged. The operator.index() documentation should be amended to describe this behavior for int/long subclasses. A different alternative would be to leave PyNumber_Index() unchanged, but with the doco fix, and to augment operator.index() to do the PyIndex_Check() first, before calling PyNumber_Index(). That's a little more redundant, but would provide the documented behavior without changing the C API. |
You also end up with this nice bit of inconsistency: >>> x = myint(7)
>>> from operator import index
>>> range(10)[6:x]
range(6, 7)
>>> range(10)[6:x.__index__()]
range(6, 8)
>>> range(10)[6:index(x)]
range(6, 7)
>>> Granted, it's insane to have __index__() return a different value like this, but in my specific use case, it's the type of object returned from operator.index() that's the problem. operator.index() returns the subclass instance while obj.__index__() returns the int. (The use case is the IntEnum of PEP-435.) |
Would it be okay to do a check on __index__ after the PyLong_Check() succeeds? Something like this: if (PyLong_Check(item) &&
item->ob_type->tp_as_number->nb_index == PyLong_Type.tp_as_number->nb_index) {
Py_INCREF(item);
return item;
} This is something Nick and I were talking about at the sprints regarding fast paths in the abstract API (for mappings and sequences in our case). |
In my opinion that should use PyLong_CheckExact |
On Mar 30, 2013, at 12:29 AM, Eric Snow wrote:
I think that would work, yes. With this extra check, overriding __index__() |
Alex> In my opinion that should use PyLong_CheckExact +1 |
if (PyLong_CheckExact(item) || (PyLong_Check(item) && |
See the related python-dev discussion started by Mark Shannon here: http://mail.python.org/pipermail/python-dev/2013-March/125022.html and continuing well into April here: http://mail.python.org/pipermail/python-dev/2013-April/125042.html The consensus that emerged from that thread seems to be that calls to operator.index and to int() should always return something of exact type int. The attached patch:
I guess this may be too dangerous a change for Python 3.4. In that case, I propose raising warnings instead of TypeErrors for Python 3.4 and turning those into TypeErrors in Python 3.5. One other question: should direct calls to __int__ and __index__ also have their return values type-checked? That doesn't seem to happen at the moment for other magic methods (see below), so it would seem consistent to only do the type checking for interpreter-generated implicit calls to __int__ and __index__. Nick: any opinion? >>> class A:
... def __len__(self): return "a string"
... def __bool__(self): return "another string"
...
>>> a = A()
>>> a.__len__()
'a string'
>>> a.__bool__()
'another string'
>>> len(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object cannot be interpreted as an integer
>>> bool(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __bool__ should return bool, returned str |
New patch that replaces the TypeErrors with warnings and fixes a refleak in the original patch. |
The deprecation warning version looks good to me. Something I'll mention explicitly (regarding the PyCon discussions that Eric mentioned above), is that we unfortunately couldn't do something like this for the various concrete APIs with overly permissive subclass checks. For those APIs, calling them directly was often the *right* thing for simple subtypes implemented in C to use to call up to the parent implementation. This case is different, as it's the *abstract* APIs that currently have the overly permissive checks. |
Shouldn't it be PendingDeprecationWarning? |
Hmm. Possibly. I'm not sure what the policy is any more regarding DeprecationWarning versus PendingDeprecationWarning. Nick? |
Yet some nitpicks. Currently the code of _PyLong_FromNbInt() is inlined and the do_decref flag is used to prevent needless change refcounts of int objects (see also bpo-18797). In proposed patch common code is extracted into the _PyLong_FromNbInt() function and int objects increfed and decrefed. Doesn't it affect a performance? PyLong_As* functions used in arguments parsing in a lot of builtins and their .performance is important. If the patch slowdowns PyLong_As* functions we perhaps should check PyLong_CheckExact() before calling _PyLong_FromNbInt() and use the do_decref flag. In general the idea and the patch LGTM. |
On PyPy 1.8.0 operator.index(True) returns 1. |
And yet one nitpick. For int subclasses which doesn't overload the __int__ method the patch calls default int.__int__ which creates a copy of int object. This is redundant in PyLong_As* functions because they only extract C int value and drop Python int object. So we can use int subclass object itself as good as int object. |
On 21 Aug 2013 15:47, "Mark Dickinson" <report@bugs.python.org> wrote:
Not sure if this is written down anywhere, but I use |
Where do we stand with this issue? |
I still need to act on some of Serhiy's comments. I do plan to get this in for 3.4. |
Ping. |
Bah. Sorry; I haven't had time to deal with this. Serhiy: are you interested in taking over? |
Here is updated patch. There is no more overhead in PyLong_As* functions. Simplified PyNumber_Index(). assertWarns() now used instead of support.check_warnings(). Added new tests. |
Took me a while to figure out that one of the code paths was being deleted as redundant because the type machinery will always fill in nb_int for int subclasses, but Serhiy's patch looks good to me. |
The tests in the attached patches (for example issue17576_v3.patch) check that both are 8, but the tests which were actually committed are checking that "my_int.__index__() == 8" and "operator.index(my_int) == 7". |
Ah, it just checks current behavior. So we will know when this will be changed. |
OK, something appears to have gotten confused along the way here. Barry's original problem report was that operator.index() was returning a different answer than operator.__index__() for int subclasses. Absolutely nothing to do with the int builtin at all. While the fact int() *may* return int subclasses isn't especially good, it's also a longstanding behaviour. The problem Barry reports, where a subclassing based proxy type isn't reverting to a normal integer when accessed via operator.index() despite defining __index__() to do exactly that should be possible to fix just by applying the stricter check specifically in PyNumber_Index. Expanding the scope to cover __int__() and __trunc__() as well would be much, much hairier, as those are much older interfaces, and used in a wider variety of situations. We specifically invented __index__() to stay away from that mess while making it possible to explicitly indicate that a type supports a lossless conversion to int rather than a potentially lossy one. |
See also bpo-33039. |
I'm working on a PR that finally changes the DeprecationWarnings that Serhiy introduced to TypeErrors; I think that should be acceptable, four Python versions and some years later. With that PR:
|
I am not sure that raising an error is the best option. We can just convert an integer subclass to an exact int using _PyLong_Copy(). I am not sure that converting to an exact int in low-level C API functions is the best option. In many cases we use only the content of the resulting object ignoring its type (when convert it to the C integer or float, to bytes array, to new instance of int subclass). Creating a new exact int is a waste of time. This is why I withdrawn my patches and this issue is still open. |
Can we at least switch to PyLong_CheckExact? That would fix Barry's original issue and should run slightly faster. |
+1
I am sure. :-) The number of naturally-occurring cases where we're actually passing a subtype of And we definitely shouldn't let performance concerns dictate API; get the API right first, then see what can be done about performance without changing the API. It's clear to me that I'll split my PR up into two pieces, one for turning the deprecated behaviour into TypeErrors, and a second one that just makes the PyLong_CheckExact change. (I likely won't have time before feature freeze, though. OTOH, the PyLong_CheckExact change could be considered a bugfix.) |
This is a behavior change and as such should be preceded by a period of warning. If we go this way I propose to add a FutureWarning for int subclasses with overridden __index__. As for turning the deprecated behaviour into TypeErrors, we added yet few deprecation warnings in 3.8. Would not be better to turn all of them into TypeErrors at the same time? |
I've closed the PR. Reassigning back to Ethan. |
Mark, I think you can reopen the PR and merge it in 3.9 now. As for my proposition to use the FutureWarning first, I think it is not necessary. The behavior change is very subtle and will affects only int subclasses with overridden __index__. Similar changes (preferring __index__ over __int__) have been made in 3.8 without preceding FutureWarning. And similar minor changes were made in the past. On other hand, I am not sure that __index__ should be used for int subclasses. We already have the int content, so we can create an exact int with _PyLong_Copy(). |
It started to write a new issue, but then I found this issue issue (created in 2013!) which is still open. So let me write my comment here instead. The code to convert a number to an integer is quite complex in Python. There are *many* ways to do that and each way has subtle behavior differences (ex: __index__ vs __int__). Python tolerates some behavior which lead to even more confusion. For example, some functions explicitly reject the float type but accept Decimal: PyLong_Long(obj) calls type(obj).__index__() if it's defined, but it accepts subtypes of int, not only exactly the int type (type(x) == int). This feature is deprecated since Python 3.3 (released in 2012), since this change: commit 31a6554
I propose to now fail with an exception if __int__() or __index__() return type is not exactly int. Note: My notes on Python numbers: https://pythondev.readthedocs.io/numbers.html |
The current status:
But:
What I prefer as solutions of the remaining issues:
|
[Serhiy]
I'm not sure about this. Thinking about the bigger picture, we have a similar deprecation in place for __float__ returning an instance of a float subclass. That one I'd like to keep (and probably make an error for 3.10). A problem I've run into in Real Code (TM) is needing to convert something float-like to a float, using the same mechanisms that (for example) something like One option is to call "float", but that requires explicitly excluding str, bytes and bytearray, which feels ugly and not very future-proof. So the code ends up calling __float__. But because __float__ can return an instance of a float subclass, it then still needs some way to convert the return value to an actual float. And that's surprisingly tricky. So I really *do* want to see the ability of __float__ to return a non-float eventually removed. Similarly for __int__, there's no easy Python-side way to mimic the effect of calling __int__, followed by converting to an exact int. We have to:
Or:
Deprecating allowing __int__ to return a non-int helps here, because it lets me simply call __int__. I care much more about the __float__ case than the __int__ case, because the "right way" to duck-type integers is to use __index__ rather than __int__, and for __index__ we have operator.index as a solution. But it would seem odd to have the rule in place for __float__ but not for __int__ and __index__. The other way to solve my problem would be to provide an operator module function (operator.as_float?) that does a duck-typed conversion of an arbitrary Python object to a float. |
This does feel like the *right* solution to me. See bpo-40801 and the linked PR. If we can do something like this, I'd be happy to drop the expectation that __float__ return something of exact type float, and similarly for __index__. |
I think operator.index() should be brought to be inline with PyNumber_Index():
The language reference for __index__() suggests this is the direction to go (https://docs.python.org/3/reference/datamodel.html#object.\_\_index__). |
Note, the __str__ method on strings does not require an exact str. class S:
def __str__(self):
return self
print(type(str(S('hello world')))) |
PyNumber_Index() now always returns an instance of int.
If we go in this direction we should add a DeprecationWarning for __str__() returning not direct str. I am not sure that it is right. It adds a burden on authors of special methods to always convert the result to the corresponding direct type, while this conversion can silently (and more efficiently) be performed in the interpreter core. |
I saw str subclass being used for translation. Example: class Message(str):
"""A Message object is a unicode object that can be translated.
Translation of Message is done explicitly using the translate() method.
For all non-translation intents and purposes, a Message is simply unicode,
and can be treated as such.
""" https://github.com/openstack/oslo.i18n/blob/master/oslo_i18n/_message.py There is likely other funny use cases. I don't know if str() is used on Message instances. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: