Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C API] Convert PyTuple_GET_ITEM() macro to a static inline function #85250

Closed
vstinner opened this issue Jun 22, 2020 · 13 comments
Closed

[C API] Convert PyTuple_GET_ITEM() macro to a static inline function #85250

vstinner opened this issue Jun 22, 2020 · 13 comments
Labels
3.10 only security fixes topic-C-API

Comments

@vstinner
Copy link
Member

BPO 41078
Nosy @scoder, @vstinner
PRs
  • bpo-41078: Rename pycore_tupleobject.h to pycore_tuple.h #21056
  • bpo-41078: Add pycore_list.h internal header file #21057
  • bpo-41078: Fix bltinmodule.c with Py_TRACE_REFS #21058
  • [WIP] bpo-41078: Convert PyTuple_GET_ITEM() macro to static inline function #21059
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-09-21.21:59:56.853>
    created_at = <Date 2020-06-22.14:50:13.657>
    labels = ['expert-C-API', '3.10']
    title = '[C API] Convert PyTuple_GET_ITEM() macro to a static inline function'
    updated_at = <Date 2021-09-21.21:59:56.853>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2021-09-21.21:59:56.853>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-09-21.21:59:56.853>
    closer = 'vstinner'
    components = ['C API']
    creation = <Date 2020-06-22.14:50:13.657>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 41078
    keywords = ['patch']
    message_count = 13.0
    messages = ['372091', '372093', '372095', '372097', '372100', '372121', '372123', '372126', '372128', '372129', '372132', '372149', '402379']
    nosy_count = 2.0
    nosy_names = ['scoder', 'vstinner']
    pr_nums = ['21056', '21057', '21058', '21059']
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue41078'
    versions = ['Python 3.10']

    @vstinner
    Copy link
    Member Author

    PyTuple_GET_ITEM() can be abused to access directly the PyTupleObject.ob_item member:

       PyObject **items = &PyTuple_GET_ITEM(0);

    Giving a direct access to an array or PyObject* (PyObject**) is causing issues with other Python implementations, like PyPy, which don't use PyObject internally.

    I propose to convert the PyTuple_GET_ITEM() and PyList_GET_ITEM() macros to static inline functions to disallow that.

    @vstinner vstinner added 3.10 only security fixes topic-C-API labels Jun 22, 2020
    @vstinner
    Copy link
    Member Author

    New changeset 384621c by Victor Stinner in branch 'master':
    bpo-41078: Rename pycore_tupleobject.h to pycore_tuple.h (GH-21056)
    384621c

    @vstinner
    Copy link
    Member Author

    New changeset c45dbe9 by Victor Stinner in branch 'master':
    bpo-41078: Add pycore_list.h internal header file (GH-21057)
    c45dbe9

    @vstinner
    Copy link
    Member Author

    New changeset c96d00e by Victor Stinner in branch 'master':
    bpo-41078: Fix bltinmodule.c with Py_TRACE_REFS (GH-21058)
    c96d00e

    @vstinner
    Copy link
    Member Author

    The PR 21059 breaks Cython. I reported the issue to Cython upstream:
    cython/cython#3701

    numpy is also affected: code generated by Cython (numpy/random/_mt19937.c) contains the issue.

    @vstinner
    Copy link
    Member Author

    PySequence_Fast_ITEMS() must also be deprecated since it also gives a direct access to PyTupleObject.ob_item and PyListObject.ob_item members (PyObject**). PySequence_Fast_GET_ITEM() should be used instead.

    @vstinner
    Copy link
    Member Author

    PySequence_Fast_ITEMS() is used in numpy at 7 lines:

    numpy/core/src/common/ufunc_override.c:118: *out_objs = PySequence_Fast_ITEMS(seq);

    ---

    PyUFuncOverride_GetOutObjects(PyObject *kwds, PyObject **out_kwd_obj, PyObject ***out_objs)

    => PyObject** is part of the numpy C API (it's even a pointer to a PyObject** array)!

    numpy/core/src/multiarray/arrayfunction_override.c:80: PyObject **items = PySequence_Fast_ITEMS(relevant_args);
    ---

    static int
    get_implementing_args_and_methods(PyObject *relevant_args,
                                      PyObject **implementing_args,
                                      PyObject **methods)
    {
        int num_implementing_args = 0;
        Py_ssize_t i;
        int j;
    
        PyObject **items = PySequence_Fast_ITEMS(relevant_args);
        Py_ssize_t length = PySequence_Fast_GET_SIZE(relevant_args);
    
        for (i = 0; i < length; i++) {
            int new_class = 1;
            PyObject *argument = items[i];

    => here PySequence_Fast_GET_ITEM() can be used instead.

    numpy/core/src/multiarray/arrayfunction_override.c:167: PyObject **items = PySequence_Fast_ITEMS(types);
    numpy/core/src/multiarray/common.c:434: objects = PySequence_Fast_ITEMS(seq);
    numpy/core/src/multiarray/iterators.c:1368: ret = multiiter_new_impl(n, PySequence_Fast_ITEMS(fast_seq));
    numpy/core/src/multiarray/methods.c:1015: in_objs = PySequence_Fast_ITEMS(fast);
    numpy/core/src/umath/override.c:38: arg_objs = PySequence_Fast_ITEMS(args);

    @scoder
    Copy link
    Contributor

    scoder commented Jun 22, 2020

    Giving a direct access to an array or PyObject* (PyObject**) is causing issues with other Python implementations, like PyPy, which don't use PyObject internally.

    I'm wondering – if the intention is to help other implementations, then why do we need to break extensions in *CPython* for that? In CPython, this usage seems perfectly valid with the current data structure. If that ever changes, we can still make this change as well, but do you really see the internal layout of tuples change at some point? (Lists are a different case, I'd say, but tuples?)

    @vstinner
    Copy link
    Member Author

    The long rationale is explained in PEP-620:
    https://www.python.org/dev/peps/pep-0620/

    PyTupleObject and PyListObject structures must become opaque. Also, later, these structures may change to be more efficient.

    @scoder
    Copy link
    Contributor

    scoder commented Jun 22, 2020

    Also, later, these structures may change to be more efficient.

    Tuples? Really?

    Ok, quoting PEP-620:

    Members of … PyTupleObject structures have not changed since the "Initial revision" commit (1990)

    I honestly think the reason for that might simply be that there's not so much to improve for tuples.

    nothing prevents a C extension to get or set directly
    PyTupleObject.ob_item[0] (the first item of a tuple).

    I certainly understand that that is a problem, especially if "PyObject" may change in the future. And this is essentially what the current "PyTuple_GET_ITEM()" macro does in a binary module.

    Should we also turn "_PyTuple_ITEMS()" into a public inline function then to make up for the loss of the "&PyTuple_GET_ITEM(t, 0)" pattern? It would make it explicit what is intended. I think that should be our main goal in the CPython side.

    @vstinner
    Copy link
    Member Author

    I suggest you to discuss the PEP-620 on python-dev, rather than on the bug tracker. I just posted it there today.

    This issue tracks the implemetation of one part of the PEP-620.

    @vstinner
    Copy link
    Member Author

    See "(PEP-620) C API for efficient loop iterating on a sequence of PyObject** or other C types" thread on python-dev:
    https://mail.python.org/archives/list/python-dev@python.org/thread/632CV42376SWVYAZTHG4ROOV2HRHOVZ7/

    @vstinner
    Copy link
    Member Author

    The PEP-620 is still a draft. I prefer to work on other C API changes before coming back to this change which may break many C extensions. I failed finding time to design an API to expose a PyObject** array "view" with a function to release/destroy the view.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes topic-C-API
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants