Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C API] Add Py_Is(x, y) and Py_IsNone(x) functions #87919

Closed
vstinner opened this issue Apr 6, 2021 · 19 comments
Closed

[C API] Add Py_Is(x, y) and Py_IsNone(x) functions #87919

vstinner opened this issue Apr 6, 2021 · 19 comments
Labels
3.10 only security fixes topic-C-API

Comments

@vstinner
Copy link
Member

vstinner commented Apr 6, 2021

BPO 43753
Nosy @tim-one, @rhettinger, @mdickinson, @vstinner, @cfbolz, @markshannon, @pablogsal, @shihai1991, @erlend-aasland
PRs
  • bpo-43753: Add Py_Is() and Py_IsNone() functions #25227
  • bpo-43753: _operator.is_() uses Py_Is() #28641
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-05-18.20:44:33.857>
    created_at = <Date 2021-04-06.18:38:58.346>
    labels = ['expert-C-API', '3.10']
    title = '[C API] Add Py_Is(x, y) and Py_IsNone(x) functions'
    updated_at = <Date 2021-09-29.23:28:18.199>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2021-09-29.23:28:18.199>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-05-18.20:44:33.857>
    closer = 'vstinner'
    components = ['C API']
    creation = <Date 2021-04-06.18:38:58.346>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43753
    keywords = ['patch']
    message_count = 19.0
    messages = ['390358', '390360', '390364', '390379', '390408', '390411', '390412', '390435', '390456', '390469', '390526', '390613', '390614', '390616', '390628', '390750', '390751', '393907', '402922']
    nosy_count = 9.0
    nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'vstinner', 'Carl.Friedrich.Bolz', 'Mark.Shannon', 'pablogsal', 'shihai1991', 'erlendaasland']
    pr_nums = ['25227', '28641']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue43753'
    versions = ['Python 3.10']

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 6, 2021

    I propose to add at least Py_Is(x, y) function to the Python C API which would be simply implemented as:

        static inline int Py_Is(PyObject *x, PyObject *y) { return (x == y); }

    Right now, there is no benefit for CPython. The idea is to prepare the CPython code base for future optimization, like tagged pointers. It may help PyPy. It can also help to prepare C extensions to migrate to HPy, since HPy requires to use HPy_Is(ctx, x, y): "x == y" (where x and y types is HPy) fails with a compiler error with HPy.

    --

    CPython uses reference counting and singletons like None and False. The "x is y" operator in Python is implemented with the IS_OP opcode implemented in Python/ceval.c as:

            case TARGET(IS_OP): {
                PyObject *right = POP();
                PyObject *left = TOP();
                int res = (left == right)^oparg;
                PyObject *b = res ? Py_True : Py_False;
                Py_INCREF(b);
                SET_TOP(b);
                Py_DECREF(left);
                Py_DECREF(right);
                PREDICT(POP_JUMP_IF_FALSE);
                PREDICT(POP_JUMP_IF_TRUE);
                DISPATCH();
            }

    In short, "x is y" in Python simply compares directly PyObject* pointer values in C: "x == y" where x and y are PyObject* pointers.

    PyPy doesn't use reference counting and so implements "x is y" differently: id(x) == id(y) if I understood correctly. In PyPy, id(obj) is more complex than simply getting the object memory address. For example, id(1) is always 17 in PyPy (on Linux/x86-64 at least).

    At the C API level, using "x == y" to check if y object "is" x object doesn't work well with PyPy object models. Moreover, "x == Py_None" doesn't work if None is stored as a tagged pointer in CPython.

    --

    Maybe we can also add functions to check if an object is a singleton:

    • Py_IsNone(x): x == Py_None
    • Py_IsTrue(x): x == Py_True
    • Py_IsFalse(x): x == Py_False

    See also bpo-39511 "[subinterpreters] Per-interpreter singletons (None, True, False, etc.)" which may need to replace Py_None singleton with Py_GetNone(). IMO it makes more sense to call Py_IS_NONE(x) function to express that code is run, rather than writing "x == Py_None" as if Py_None is and will always be a variable directly accesssible.

    Py_IS_TRUE() name sounds like PyObject_IsTrue(). Can it be an issue? Can someone come with a better name?

    @vstinner vstinner added 3.10 only security fixes topic-C-API labels Apr 6, 2021
    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 6, 2021

    Python has more singletons. In C:

    • Py_NotImplemented: NotImplemented
    • Py_Ellipsis: Ellipsis

    But I don't think that "x == NotImplemented" is common enough to justify a new function.

    @vstinner vstinner changed the title [C API] Add Py_IS(x, y) macro [C API] Add Py_IS(x, y) and Py_IsNone(x) functions Apr 6, 2021
    @vstinner vstinner changed the title [C API] Add Py_IS(x, y) macro [C API] Add Py_IS(x, y) and Py_IsNone(x) functions Apr 6, 2021
    @vstinner vstinner changed the title [C API] Add Py_IS(x, y) and Py_IsNone(x) functions [C API] Add Py_Is(x, y) and Py_IsNone(x) functions Apr 6, 2021
    @vstinner vstinner changed the title [C API] Add Py_IS(x, y) and Py_IsNone(x) functions [C API] Add Py_Is(x, y) and Py_IsNone(x) functions Apr 6, 2021
    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 6, 2021

    Py_IS_TYPE(obj, type) was added to Python 3.9 by bpo-39573:
    https://docs.python.org/dev/c-api/structures.html#c.Py_IS_TYPE

    commit d905df7
    Author: Dong-hee Na <donghee.na92@gmail.com>
    Date: Fri Feb 14 02:37:17 2020 +0900

    bpo-39573: Add Py_IS_TYPE() function (GH-18488)
    
    Co-Author: Neil Schemenauer <nas-github@arctrix.com>
    

    It's currently implemented as:

    static inline int _Py_IS_TYPE(const PyObject *ob, const PyTypeObject *type) {
        return Py_TYPE(ob) == type;
    }
    #define Py_IS_TYPE(ob, type) _Py_IS_TYPE(_PyObject_CAST_CONST(ob), type)

    @rhettinger
    Copy link
    Contributor

    Right now, there is no benefit for CPython.

    Please don't this until we have a clear demonstrable benefit. As it stands now, this is all cost and no benefit.

    Adding unnecessary abstraction layers just makes it more difficult for people to learn to be core devs. Also, it will likely result in a lot of pointless code churn where each change carries a risk of a new bug being introduced.

    The notion of pointer comparison is so fundamental to C code that it is counter productive to try to abstract it away. It isn't much different than saying that every instance of "a + b" should be abstracted to "add(a, b)" and every "a[b]" should be abstracted to "index_lookup(a, b)" on the hope that maybe it might someday be helpful for PyPy or HPy even though they have never requested such a change.

    From my own point of view, these need abstractions just make it harder to tell what code is actually doing. Further, it will just lead to wasting everyone's time in code reviews where the reviewer insists on the applying the new inline function and there is confusion about whether two pointers are generic pointers or python object pointers, each with their own comparison technique.

    Also, there is too much faith in functions marked as "inline" always being inlined. Compilers get to make their own choices and under some circumstances will not inline, especially for cross module calls. This risks taking code that is currently obviously fast and occasionally, invisibility slowing it down massively — from a step that is 1 cycle at most and is sometimes zero cost to a step that involves an actual function call, possibly needing to save and restore registers.

    Lastly, the API changes aren't just for you or the standard library. In effect, you're telling the entire ecosystem of C extensions that they are doing it wrong. Almost certainly, some will follow this path and some won't, further fracturing the ecosystem.

    @markshannon
    Copy link
    Member

    For any sane design of tagged pointers, x == y (in C) will work fine.

    is is not well defined except for a small set of values, so the docs for Py_Is would have to so vague as to be worthless, IMO.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 7, 2021

    For any sane design of tagged pointers, x == y (in C) will work fine.

    I wrote a proof-of-concept for using tagged pointners in CPython:
    https://github.com/vstinner/cpython/pull/6

    "The PR is large because of the first changes which add Py_IS_NONE(), Py_IS_TRUE() and Py_IS_FALSE() macros."

    For backward compatibility, I added a function to get a concrete Python object from a tagged pointer: the True as a tagged pointer becomes (PyObject *)&_Py_TrueStruct concrete object. Py_IS_TRUE() has to check for both values: tagged pointer or &_Py_TrueStruct (don't look at my exact implementation, it's wrong ;-)).

    In a perfect world, there would be no backward compatibility and you would never have to create Python objects from a tagged pointer.

    The other problem is that the stable ABI exposes "Py_True" as "&_Py_TrueStruct" and so C extensions built with the stable ABI uses "&_Py_TrueStruct", not the tagged pointer.

    See also bpo-39511 which discuss solutions for these problems.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 7, 2021

    Also, there is too much faith in functions marked as "inline" always being inlined.

    I proposed to declare it as a "static inline" function, but I'm fine with a macro as well. I mostly care about the API: call "Py_Is()", not really about the exact implementation. In practice, for now, Py_Is(x, y) will continue to be compiled as "x == y".

    @cfbolz
    Copy link
    Mannequin

    cfbolz mannequin commented Apr 7, 2021

    Just chiming in to say that for PyPy this API would be extremely useful, because PyPy's "is" is not implementable with a pointer comparison on the C level (due to unboxing we need to compare integers, floats, etc by value). Right now, C extension code that compares pointers is subtly broken and cannot be fixed by us.

    @rhettinger
    Copy link
    Contributor

    Just chiming in to say that for PyPy this API would be extremely useful

    Thanks for that input. Given that there would be some value add, I withdraw my objection.

    I proposed to declare it as a "static inline" function,
    but I'm fine with a macro as well.

    Let's use a macro then because inlining in guaranteed and we don't have to be on the lookout for failures to inline.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 7, 2021

    PR 25227: I reimplemented Py_Is() as a macro and I added unit tests.

    Other added functions simply call Py_Is(), example:

    #define Py_IsNone(x) Py_Is(x, Py_None)

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 8, 2021

    Mark:

    is is not well defined except for a small set of values, so the docs for Py_Is would have to so vague as to be worthless, IMO.

    I don't propose to change the "is" operator semantic: Py_Is(x, y) is C would behave *exaclty* as "x is y" in Python.

    Check my implementation: the IS_OP bytecode uses directly Py_Is().

    @erlend-aasland
    Copy link
    Contributor

    I'd also prefer a Py_IsNotNone() because it's more explicit than !Py_IsNone(); the exclamation mark can be easily missed when reading/writing code, IMO.

    Just my 2 cents.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 9, 2021

    I'd also prefer a Py_IsNotNone() because it's more explicit than !Py_IsNone()

    I would prefer keep the C API small. I don't think that we need to duplicate all functions testing for something. We provide PyTuple_Check(obj) but we don't provide PyTuple_NotCheck(obj) for example.

    IMO !Py_IsNone(obj) makes perfectly sense in Python.

    Also, "x == Py_None" is more common than "x != Py_None".

    @erlend-aasland
    Copy link
    Contributor

    I would prefer keep the C API small.

    Yes, I see the value of that as well.

    I tried applying this API on an extension, and I found the code to be slightly less readable for the "is not" cases.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 9, 2021

    I tried applying this API on an extension, and I found the code to be slightly less readable for the "is not" cases.

    FYI you can try upgrade_pythoncapi.py on your project using the following PR to update code to use Py_IsNone/Py_IsTrue/Py_IsFalse:
    python/pythoncapi-compat#8

    For example, use "upgrade_pythoncapi.py -o Py_Is directory/" or "upgrade_pythoncapi.py -o Py_Is file.c".

    @vstinner
    Copy link
    Member Author

    New changeset 09bbebe by Victor Stinner in branch 'master':
    bpo-43753: Add Py_Is() and Py_IsNone() functions (GH-25227)
    09bbebe

    @vstinner
    Copy link
    Member Author

    Carl:

    Just chiming in to say that for PyPy this API would be extremely useful, because PyPy's "is" is not implementable with a pointer comparison on the C level (due to unboxing we need to compare integers, floats, etc by value). Right now, C extension code that compares pointers is subtly broken and cannot be fixed by us.

    Can you come with a sentence that I can put in the documentation to explain that Py_Is() is written for interoperability with other Python implementations?

    @vstinner
    Copy link
    Member Author

    Well, nobody came up with a better definition, so let's go with:

    "Test if the x object is the y object, the same as x is y in Python."

    Thanks for the feedback and reviews! Py_Is(x, y), Py_IsNone(x), Py_IsTrue(x) and Py_IsFalse(x) are now part of Python 3.10 C API.

    @vstinner
    Copy link
    Member Author

    New changeset 8d3e7ef by Victor Stinner in branch 'main':
    bpo-43753: _operator.is_() uses Py_Is() (GH-28641)
    8d3e7ef

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes topic-C-API
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants