Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyNumber_Index() is not int-subclass friendly (or operator.index() docs lie) #61776

Open
warsaw opened this issue Mar 29, 2013 · 54 comments
Open
Assignees
Labels
3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API type-bug An unexpected behavior, bug, or error

Comments

@warsaw
Copy link
Member

warsaw commented Mar 29, 2013

BPO 17576
Nosy @warsaw, @brettcannon, @rhettinger, @mdickinson, @ncoghlan, @vstinner, @alex, @ethanfurman, @ericsnowcurrently, @serhiy-storchaka, @manueljacob
PRs
  • bpo-17576: Strict __int__ and __index__ return types; operator.index always uses __index__ #13740
  • Files
  • issue17576.patch
  • issue17576_v2.patch
  • issue17576_v3.patch
  • issue17576_v4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ethanfurman'
    closed_at = None
    created_at = <Date 2013-03-29.22:25:11.167>
    labels = ['interpreter-core', 'type-bug', '3.9', '3.10']
    title = 'PyNumber_Index() is not int-subclass friendly (or operator.index() docs lie)'
    updated_at = <Date 2021-08-31.15:42:26.900>
    user = 'https://github.com/warsaw'

    bugs.python.org fields:

    activity = <Date 2021-08-31.15:42:26.900>
    actor = 'vstinner'
    assignee = 'ethan.furman'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2013-03-29.22:25:11.167>
    creator = 'barry'
    dependencies = []
    files = ['31147', '31149', '33077', '33093']
    hgrepos = []
    issue_num = 17576
    keywords = ['patch']
    message_count = 54.0
    messages = ['185522', '185523', '185530', '185531', '185532', '188791', '188810', '194335', '194337', '194347', '195713', '195747', '195759', '195764', '195772', '195824', '200171', '200229', '205733', '205735', '205793', '205802', '205856', '205862', '205870', '205889', '205890', '205896', '205916', '205919', '205921', '206199', '207293', '236493', '236502', '236526', '236528', '236880', '336045', '344266', '344275', '344308', '344384', '344386', '344406', '344681', '349562', '370171', '370177', '370190', '381639', '400674', '400707', '400747']
    nosy_count = 14.0
    nosy_names = ['barry', 'brett.cannon', 'rhettinger', 'mark.dickinson', 'ncoghlan', 'vstinner', 'Arfrever', 'alex', 'docs@python', 'ethan.furman', 'python-dev', 'eric.snow', 'serhiy.storchaka', 'mjacob']
    pr_nums = ['13740']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue17576'
    versions = ['Python 3.9', 'Python 3.10']

    @warsaw
    Copy link
    Member Author

    warsaw commented Mar 29, 2013

    operator.index() is just a thin wrapper around PyNumber_Index(). The documentation for operator.index() claims that it is equivalent to calling obj.__index__() but for subclasses of int, this is not true. In fact, PyNumber_Index() first does (e.g. in Python 3.3) a PyLong_Check() and if that succeeds, the original object is returned *without* doing the moral equivalent in C of calling obj.__index__(). An example:

    class myint(int):
        def __index__(self):
            return int(self) + 1
    >>> x = myint(7)
    >>> x.__index__()
    8
    >>> from operator import index
    >>> index(x)
    7

    The C API documents PyNumber_Index() as: "Returns the o converted to a Python int on success or NULL with a TypeError exception raised on failure."

    Because this has been the behavior of PyNumber_Index() since at least 2.7 (I didn't check farther back), this probably cannot be classified as a bug deserving to be fixed in the code for older Pythons. It might be worth fixing for Python 3.4, i.e. by moving the index check before the type check. In the meantime, this is probably a documentation bug.

    The C API implies, but should be clearer that if o is an int subtype (int and long in Python 2), it is returned unchanged. The operator.index() documentation should be amended to describe this behavior for int/long subclasses.

    A different alternative would be to leave PyNumber_Index() unchanged, but with the doco fix, and to augment operator.index() to do the PyIndex_Check() first, before calling PyNumber_Index(). That's a little more redundant, but would provide the documented behavior without changing the C API.

    @warsaw warsaw added docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error labels Mar 29, 2013
    @warsaw warsaw changed the title PyNumber_Index() is not int-subclass friendly PyNumber_Index() is not int-subclass friendly (or operator.index() docos lie) Mar 29, 2013
    @warsaw
    Copy link
    Member Author

    warsaw commented Mar 29, 2013

    You also end up with this nice bit of inconsistency:

    >>> x = myint(7)
    >>> from operator import index
    >>> range(10)[6:x]
    range(6, 7)
    >>> range(10)[6:x.__index__()]
    range(6, 8)
    >>> range(10)[6:index(x)]
    range(6, 7)
    >>> 

    Granted, it's insane to have __index__() return a different value like this, but in my specific use case, it's the type of object returned from operator.index() that's the problem. operator.index() returns the subclass instance while obj.__index__() returns the int.

    (The use case is the IntEnum of PEP-435.)

    @ericsnowcurrently
    Copy link
    Member

    Would it be okay to do a check on __index__ after the PyLong_Check() succeeds? Something like this:

        if (PyLong_Check(item) &&
            item->ob_type->tp_as_number->nb_index == PyLong_Type.tp_as_number->nb_index) {
            Py_INCREF(item);
            return item;
        }

    This is something Nick and I were talking about at the sprints regarding fast paths in the abstract API (for mappings and sequences in our case).

    @alex
    Copy link
    Member

    alex commented Mar 30, 2013

    In my opinion that should use PyLong_CheckExact

    @warsaw
    Copy link
    Member Author

    warsaw commented Mar 30, 2013

    On Mar 30, 2013, at 12:29 AM, Eric Snow wrote:

    Would it be okay to do a check on __index__ after the PyLong_Check()
    succeeds? Something like this:

    if (PyLong_Check(item) &&
    item->ob_type->tp_as_number->nb_index == PyLong_Type.tp_as_number->nb_index) {
    Py_INCREF(item);
    return item;
    }

    This is something Nick and I were talking about at the sprints regarding fast
    paths in the abstract API (for mappings and sequences in our case).

    I think that would work, yes. With this extra check, overriding __index__()
    in the subclass should fail this condition and fall back to the
    PyIndex_Check() clause.

    @vstinner
    Copy link
    Member

    vstinner commented May 9, 2013

    Alex> In my opinion that should use PyLong_CheckExact

    +1

    @serhiy-storchaka
    Copy link
    Member

    if (PyLong_CheckExact(item) || (PyLong_Check(item) &&
    item->ob_type->tp_as_number->nb_index == PyLong_Type.tp_as_number->nb_index))

    @mdickinson
    Copy link
    Member

    See the related python-dev discussion started by Mark Shannon here:

    http://mail.python.org/pipermail/python-dev/2013-March/125022.html

    and continuing well into April here:

    http://mail.python.org/pipermail/python-dev/2013-April/125042.html

    The consensus that emerged from that thread seems to be that calls to operator.index and to int() should always return something of exact type int.

    The attached patch:

    • Raises TypeError for implicit calls to nb_int that fail to return something of exact type int. (Results of direct calls to __int__ are not checked.)

    • Ensures that *all* conversions from a non-int to an int via nb_int make use of the nb_int slot, even for int subclasses. Prior to this patch, some of the PyLong_As... functions would bypass __int__ for int subclasses.

    • Adds a new private _PyLong_FromNbInt function to Objects/longobject.c, so that we have a single place for performing these conversions and making type checks, and refactors existing uses of the nb_int slot to go via this function.

    • Makes corresponding changes for nb_index, which should address the original bug report.

    I guess this may be too dangerous a change for Python 3.4. In that case, I propose raising warnings instead of TypeErrors for Python 3.4 and turning those into TypeErrors in Python 3.5.

    One other question: should direct calls to __int__ and __index__ also have their return values type-checked? That doesn't seem to happen at the moment for other magic methods (see below), so it would seem consistent to only do the type checking for interpreter-generated implicit calls to __int__ and __index__. Nick: any opinion?

    >>> class A:
    ...     def __len__(self): return "a string"
    ...     def __bool__(self): return "another string"
    ... 
    >>> a = A()
    >>> a.__len__()
    'a string'
    >>> a.__bool__()
    'another string'
    >>> len(a)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'str' object cannot be interpreted as an integer
    >>> bool(a)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: __bool__ should return bool, returned str

    @mdickinson mdickinson added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed docs Documentation in the Doc dir labels Aug 4, 2013
    @mdickinson mdickinson assigned mdickinson and unassigned docspython Aug 4, 2013
    @mdickinson
    Copy link
    Member

    New patch that replaces the TypeErrors with warnings and fixes a refleak in the original patch.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Aug 4, 2013

    The deprecation warning version looks good to me.

    Something I'll mention explicitly (regarding the PyCon discussions that Eric mentioned above), is that we unfortunately couldn't do something like this for the various concrete APIs with overly permissive subclass checks. For those APIs, calling them directly was often the *right* thing for simple subtypes implemented in C to use to call up to the parent implementation.

    This case is different, as it's the *abstract* APIs that currently have the overly permissive checks.

    @serhiy-storchaka
    Copy link
    Member

    Shouldn't it be PendingDeprecationWarning?

    @mdickinson
    Copy link
    Member

    Shouldn't it be PendingDeprecationWarning?

    Hmm. Possibly. I'm not sure what the policy is any more regarding DeprecationWarning versus PendingDeprecationWarning. Nick?

    @serhiy-storchaka
    Copy link
    Member

    Yet some nitpicks.

    Currently the code of _PyLong_FromNbInt() is inlined and the do_decref flag is used to prevent needless change refcounts of int objects (see also bpo-18797). In proposed patch common code is extracted into the _PyLong_FromNbInt() function and int objects increfed and decrefed. Doesn't it affect a performance? PyLong_As* functions used in arguments parsing in a lot of builtins and their .performance is important.

    If the patch slowdowns PyLong_As* functions we perhaps should check PyLong_CheckExact() before calling _PyLong_FromNbInt() and use the do_decref flag.

    In general the idea and the patch LGTM.

    @serhiy-storchaka
    Copy link
    Member

    On PyPy 1.8.0 operator.index(True) returns 1.

    @serhiy-storchaka
    Copy link
    Member

    And yet one nitpick. For int subclasses which doesn't overload the __int__ method the patch calls default int.__int__ which creates a copy of int object. This is redundant in PyLong_As* functions because they only extract C int value and drop Python int object. So we can use int subclass object itself as good as int object.

    @ncoghlan
    Copy link
    Contributor

    On 21 Aug 2013 15:47, "Mark Dickinson" <report@bugs.python.org> wrote:

    Mark Dickinson added the comment:

    > Shouldn't it be PendingDeprecationWarning?

    Hmm. Possibly. I'm not sure what the policy is any more regarding
    DeprecationWarning versus PendingDeprecationWarning. Nick?

    Not sure if this is written down anywhere, but I use
    PendingDeprecationWarning for "definitely still around next release, may
    not have a set date for removal" and DeprecationWarning for "may be removed
    next release".

    @ethanfurman
    Copy link
    Member

    Where do we stand with this issue?

    @mdickinson
    Copy link
    Member

    I still need to act on some of Serhiy's comments. I do plan to get this in for 3.4.

    @serhiy-storchaka
    Copy link
    Member

    Ping.

    @mdickinson
    Copy link
    Member

    Ping.

    Bah. Sorry; I haven't had time to deal with this. Serhiy: are you interested in taking over?

    @serhiy-storchaka
    Copy link
    Member

    Here is updated patch. There is no more overhead in PyLong_As* functions. Simplified PyNumber_Index(). assertWarns() now used instead of support.check_warnings(). Added new tests.

    @ncoghlan
    Copy link
    Contributor

    Took me a while to figure out that one of the code paths was being deleted as redundant because the type machinery will always fill in nb_int for int subclasses, but Serhiy's patch looks good to me.

    @manueljacob
    Copy link
    Mannequin

    manueljacob mannequin commented Feb 24, 2015

    The tests in the attached patches (for example issue17576_v3.patch) check that both are 8, but the tests which were actually committed are checking that "my_int.__index__() == 8" and "operator.index(my_int) == 7".

    @serhiy-storchaka
    Copy link
    Member

    Ah, it just checks current behavior. So we will know when this will be changed.

    @ncoghlan
    Copy link
    Contributor

    OK, something appears to have gotten confused along the way here. Barry's original problem report was that operator.index() was returning a different answer than operator.__index__() for int subclasses. Absolutely nothing to do with the int builtin at all. While the fact int() *may* return int subclasses isn't especially good, it's also a longstanding behaviour.

    The problem Barry reports, where a subclassing based proxy type isn't reverting to a normal integer when accessed via operator.index() despite defining __index__() to do exactly that should be possible to fix just by applying the stricter check specifically in PyNumber_Index.

    Expanding the scope to cover __int__() and __trunc__() as well would be much, much hairier, as those are much older interfaces, and used in a wider variety of situations. We specifically invented __index__() to stay away from that mess while making it possible to explicitly indicate that a type supports a lossless conversion to int rather than a potentially lossy one.

    @serhiy-storchaka
    Copy link
    Member

    See also bpo-33039.

    @mdickinson
    Copy link
    Member

    I'm working on a PR that finally changes the DeprecationWarnings that Serhiy introduced to TypeErrors; I think that should be acceptable, four Python versions and some years later. With that PR:

    • int will always return something of exact type int (or raise)
    • operator.index will always return something of exact type int (or raise)
    • PyNumber_Index will always use __index__ for int subclasses, so this should fix the issue that Barry originally reported (mismatch between obj.__index__() and operator.index(obj)).

    @mdickinson mdickinson assigned mdickinson and unassigned ethanfurman Jun 2, 2019
    @serhiy-storchaka
    Copy link
    Member

    I am not sure that raising an error is the best option. We can just convert an integer subclass to an exact int using _PyLong_Copy().

    I am not sure that converting to an exact int in low-level C API functions is the best option. In many cases we use only the content of the resulting object ignoring its type (when convert it to the C integer or float, to bytes array, to new instance of int subclass). Creating a new exact int is a waste of time.

    This is why I withdrawn my patches and this issue is still open.

    @rhettinger
    Copy link
    Contributor

    Can we at least switch to PyLong_CheckExact? That would fix Barry's original issue and should run slightly faster.

    @mdickinson
    Copy link
    Member

    Can we at least switch to PyLong_CheckExact?

    +1

    I am not sure that converting to an exact int in low-level C API functions is the best option.

    I am sure. :-) The number of naturally-occurring cases where we're actually passing a subtype of int that's not exactly int should be tiny. So long as there's a PyLong_CheckExact fast path, I don't think there are really any performance concerns here.

    And we definitely shouldn't let performance concerns dictate API; get the API right first, then see what can be done about performance without changing the API. It's clear to me that operator.index(obj) should give the exact same results as obj.__index__().

    I'll split my PR up into two pieces, one for turning the deprecated behaviour into TypeErrors, and a second one that just makes the PyLong_CheckExact change. (I likely won't have time before feature freeze, though. OTOH, the PyLong_CheckExact change could be considered a bugfix.)

    @serhiy-storchaka
    Copy link
    Member

    Can we at least switch to PyLong_CheckExact?

    This is a behavior change and as such should be preceded by a period of warning.

    If we go this way I propose to add a FutureWarning for int subclasses with overridden __index__.

    As for turning the deprecated behaviour into TypeErrors, we added yet few deprecation warnings in 3.8. Would not be better to turn all of them into TypeErrors at the same time?

    @mdickinson
    Copy link
    Member

    I've closed the PR. Reassigning back to Ethan.

    @mdickinson mdickinson assigned ethanfurman and unassigned mdickinson Jun 3, 2019
    @serhiy-storchaka
    Copy link
    Member

    Mark, I think you can reopen the PR and merge it in 3.9 now.

    As for my proposition to use the FutureWarning first, I think it is not necessary. The behavior change is very subtle and will affects only int subclasses with overridden __index__. Similar changes (preferring __index__ over __int__) have been made in 3.8 without preceding FutureWarning. And similar minor changes were made in the past.

    On other hand, I am not sure that __index__ should be used for int subclasses. We already have the int content, so we can create an exact int with _PyLong_Copy().

    @vstinner
    Copy link
    Member

    It started to write a new issue, but then I found this issue issue (created in 2013!) which is still open. So let me write my comment here instead.

    The code to convert a number to an integer is quite complex in Python. There are *many* ways to do that and each way has subtle behavior differences (ex: __index__ vs __int__). Python tolerates some behavior which lead to even more confusion. For example, some functions explicitly reject the float type but accept Decimal:

    PyLong_Long(obj) calls type(obj).__index__() if it's defined, but it accepts subtypes of int, not only exactly the int type (type(x) == int). This feature is deprecated since Python 3.3 (released in 2012), since this change:

    commit 31a6554
    Author: Serhiy Storchaka <storchaka@gmail.com>
    Date: Wed Dec 11 21:07:54 2013 +0200

    Issue bpo-17576: Deprecation warning emitted now when __int__() or __index__()
    return not int instance.  Introduced _PyLong_FromNbInt() and refactored
    PyLong_As*() functions.
    

    I propose to now fail with an exception if __int__() or __index__() return type is not exactly int.

    Note: My notes on Python numbers: https://pythondev.readthedocs.io/numbers.html

    @serhiy-storchaka
    Copy link
    Member

    The current status:

    • Decimal and Fraction are no longer automatically converted to int when pass to functions implemented in C. PyLong_AsLong() etc no longer call __int__. (see bpo-36048 and bpo-37999)
    • operator.index() and PyNumber_Index() always return an instance of exact type int. (see bpo-40792)
    • int() and PyNumber_Long() always return an instance of exact type int. (see bpo-26984)
    • __index__ is used as a fallback if __int__ is not defined. (see bpo-20092)

    But:

    • __index__ and __int__ are not called for int subclasses in operator.index() and int() (also in the C API PyNumber_Index(), PyNumber_Long(), PyLong_AsLong(), etc).
    • Instances of int sublasses are accepted as result of __index__ and __int__ (but it is deprecated).
    • The Python implementation of operator.index() differs from the C implementation in many ways. (see bpo-18712)

    What I prefer as solutions of the remaining issues:

    • It is good to not call __index__ and __int__ for int subclasses. __index__ and __int__ were designed for converting non-integers to int. There are no good use cases for overriding __index__ and __int__ in int subclasses, and calling them is just a waste of time. We should just document this behavior.

    • Undeprecate accepting __index__ and __int__ returning instances of int sublasses. There is no difference from the side of using int and index(), but it can simplify user implementations of __index__ and __int__.

    • Either sync the pure Python implementation of operator.index() with the C implementation or get rid of Python implementation of the operator module at all.

    @mdickinson
    Copy link
    Member

    [Serhiy]

    • Undeprecate accepting __index__ and __int__ returning instances of int sublasses. There is no difference from the side of using int and index(), but it can simplify user implementations of __index__ and __int__.

    I'm not sure about this. Thinking about the bigger picture, we have a similar deprecation in place for __float__ returning an instance of a float subclass. That one I'd like to keep (and probably make an error for 3.10).

    A problem I've run into in Real Code (TM) is needing to convert something float-like to a float, using the same mechanisms that (for example) something like math.sqrt uses.

    One option is to call "float", but that requires explicitly excluding str, bytes and bytearray, which feels ugly and not very future-proof.

    So the code ends up calling __float__. But because __float__ can return an instance of a float subclass, it then still needs some way to convert the return value to an actual float. And that's surprisingly tricky.

    So I really *do* want to see the ability of __float__ to return a non-float eventually removed.

    Similarly for __int__, there's no easy Python-side way to mimic the effect of calling __int__, followed by converting to an exact int. We have to:

    1. Do an explicit check for non-numbers (str, bytes, bytearray)
    2. Call int

    Or:

    1. Call int
    2. Convert an instance of a possible subclass of int to something of exact type int. I don't know how to do this cleanly in general in Python, and end up resorting to evil tricks like adding 0.

    Deprecating allowing __int__ to return a non-int helps here, because it lets me simply call __int__.

    I care much more about the __float__ case than the __int__ case, because the "right way" to duck-type integers is to use __index__ rather than __int__, and for __index__ we have operator.index as a solution.

    But it would seem odd to have the rule in place for __float__ but not for __int__ and __index__.

    The other way to solve my problem would be to provide an operator module function (operator.as_float?) that does a duck-typed conversion of an arbitrary Python object to a float.

    @mdickinson
    Copy link
    Member

    The other way to solve my problem would be to provide an operator module function (operator.as_float?) that does a duck-typed conversion of an arbitrary Python object to a float.

    This does feel like the *right* solution to me. See bpo-40801 and the linked PR. If we can do something like this, I'd be happy to drop the expectation that __float__ return something of exact type float, and similarly for __index__.

    @terryjreedy terryjreedy added 3.9 only security fixes 3.10 only security fixes labels Jul 6, 2020
    @brettcannon brettcannon changed the title PyNumber_Index() is not int-subclass friendly (or operator.index() docos lie) PyNumber_Index() is not int-subclass friendly (or operator.index() docs lie) Nov 23, 2020
    @brettcannon
    Copy link
    Member

    I think operator.index() should be brought to be inline with PyNumber_Index():

    • If the argument is a subclass of int then return it.
    • Otherwise call type(obj).__index__(obj)
    • If not an int, raise TypeError
    • If not a direct int, raise a DeprecationWarning

    The language reference for __index__() suggests this is the direction to go (https://docs.python.org/3/reference/datamodel.html#object.\_\_index__).

    @rhettinger
    Copy link
    Contributor

    So I really *do* want to see the ability of __float__
    to return a non-float eventually removed.

    Note, the __str__ method on strings does not require an exact str.

        class S:
            def __str__(self):
                return self
    
        print(type(str(S('hello world'))))

    @serhiy-storchaka
    Copy link
    Member

    PyNumber_Index() now always returns an instance of int.

    • If the argument is a direct int then return it.
    • If it is a subclass of int then return a direct int copy.
    • Otherwise call type(obj).__index__(obj)
    • If a direct int, return it
    • If a subclass of int, raise a DeprecationWarning and return a direct int copy
    • If not an int, raise TypeError

    If we go in this direction we should add a DeprecationWarning for __str__() returning not direct str. I am not sure that it is right. It adds a burden on authors of special methods to always convert the result to the corresponding direct type, while this conversion can silently (and more efficiently) be performed in the interpreter core.

    @vstinner
    Copy link
    Member

    If we go in this direction we should add a DeprecationWarning for __str__() returning not direct str.

    I saw str subclass being used for translation. Example:

    class Message(str):
        """A Message object is a unicode object that can be translated.
        Translation of Message is done explicitly using the translate() method.
        For all non-translation intents and purposes, a Message is simply unicode,
        and can be treated as such.
        """

    https://github.com/openstack/oslo.i18n/blob/master/oslo_i18n/_message.py

    There is likely other funny use cases.

    I don't know if str() is used on Message instances.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests