classification
Title: Add PyModule_Add()
Type: enhancement Stage: patch review
Components: C API Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brandtbucher, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2020-11-11 20:35 by serhiy.storchaka, last changed 2020-11-21 12:23 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 23240 open serhiy.storchaka, 2020-11-11 20:38
PR 23443 open serhiy.storchaka, 2020-11-21 12:23
Messages (8)
msg380794 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-11-11 20:35
There is a design flaw in PyModule_AddObject(). It steals reference of its value only if the result is success. To avoid leaks it should be used in the following form:

    PyObject *tmp = <new reference>;
    if (PyModule_AddObject(name, name, tmp) < 0) {
        Py_XDECREF(tmp);
        goto error;
    }

It is inconvenient and many code forgot to use a temporary variable and call Py_XDECREF().

It was not intention, but it is too late to change this behavior now, because some code calls Py_XDECREF() if PyModule_AddObject() fails. Fixing PyModule_AddObject() now will break hard such code.

There was an idea to make the change gradual, controlled by a special macro (see issue26871). But it still has significant risk.

I propose to add new function PyModule_Add() which always steals reference to its argument. It is more convenient and allows to get rid of temporary variable:

    if (PyModule_Add(name, name, <new reference>) < 0) {
        goto error;
    }

I choose name PyModule_Add because it is short, and allow to write the call in one line with moderately long expression <new reference> (like PyFloat_FromDouble(...) or PyLong_FromUnsignedLong(...)).
msg380795 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2020-11-11 22:08
See also:

https://bugs.python.org/issue38823

https://github.com/python/cpython/pull/17298
msg380818 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-12 12:57
Oh, I just rejected PR 17298. Copy of my message:
---
I added PyModule_AddObjectRef() which uses strong references, rather than only stealing a reference on success.

I also enhanced the documentation to show concrete examples:
https://docs.python.org/dev/c-api/module.html#c.PyModule_AddObjectRef

I modified a few extension to use PyModule_AddObjectRef(). Sometimes, PyModule_AddObject() is more appropriate. Sometimes, PyModule_AddObjectRef() is more appropriate. Both functions are relevant, and I don't see a clear winner.

I agree than fixing existing code is painful, but I hope that new code using mostly PyModule_AddObjectRef() would be simpler to review. I'm not sure that it's simpler to write new code using PyModule_AddObjectRef(), since you might need more Py_DECREF() calls.

My intent is to have more "regular" code about reference counting. See also: https://bugs.python.org/issue42294

Since you wrote that this API is a band aid on a broken API, I consider that you are fine with rejecting it and move on to the new PyModule_AddObjectRef().

Anyway, thanks for you attempt to make the C API less broken :-)
---

I added PyModule_AddObjectRef() in bpo-163574.
msg380819 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-12 12:58
I'm using more and more often such macro:

#define MOD_ADD(name, expr) \
    do { \
        PyObject *obj = (expr); \
        if (obj == NULL) { \
            return -1; \
        } \
        if (PyModule_AddObjectRef(mod, name, obj) < 0) { \
            Py_DECREF(obj); \
            return -1; \
        } \
        Py_DECREF(obj); \
    } while (0)

Would PyModule_Add() replace such macro?
msg380820 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-12 12:59
If PyModule_Add() is added, I would suggest to rename PyModule_AddObjectRef() to PyModule_AddRef() for consistency.

IMHO PyModule_AddObjectRef() remains useful even if PyModule_Add() is added.
msg380821 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-12 13:00
In practice, PyModule_AddRef(mod, obj) behaves as PyModule_Add(mod, Py_NewRef(obj)) if I understood correctly.
msg380829 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-11-12 15:26
PyModule_Add() allows to make such macro much simpler:

#define MOD_ADD(name, expr) \
    do { \
        if (PyModule_Add(mod, name, expr) < 0) { \
            return -1; \
        } \
    } while (0)

PyModule_AddObjectRef() is just Py_XINCREF() followed by PyModule_Add(). But since most values added to the module are new references, Py_XINCREF() is usually not needed. The PyModule_Add* API is a convenient API. It is not necessary, you can use PyModule_GetDict() + PyDict_SetItemString(), but with this API it is easier. And PyModule_Add() is a correct PyModule_AddObject() (which was broken a long time ago and cannot be fixed now) and is more convenient than PyModule_AddObjectRef().

PyModule_AddIntConstant() and PyModule_AddStringConstant() can be easily expressed in terms of PyModule_Add():

    PyModule_Add(m, name, PyLong_FromLong(value))
    PyModule_Add(m, name, PyUnicode_FromString(value))

And it is easy to combine it with other functions: PyLong_FromLongLong(), PyLong_FromUnsignedLong(), PyLong_FromVoidPtr(), PyFloat_FromDouble(), PyCapsule_New(), PyType_FromSpec(), etc.
msg380833 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-12 16:35
> PyModule_AddObjectRef() is just Py_XINCREF() followed by PyModule_Add(). But since most values added to the module are new references, Py_XINCREF() is usually not needed.

There is no general rule. I saw two main cases.


(A) Create an object only to add it into the module. PyModule_Add() and PyModule_AddObject() are good for that case.

Example in the array module:

    PyObject *typecodes = PyUnicode_DecodeASCII(buffer, p - buffer, NULL);
    if (PyModule_AddObject(m, "typecodes", typecodes) < 0) {
        Py_XDECREF(typecodes);
        return -1;
    }

This code can be rewritten with PyModule_Add():

    PyObject *typecodes = PyUnicode_DecodeASCII(buffer, p - buffer, NULL);
    if (PyModule_Add(m, "typecodes", typecodes) < 0) {
        return -1;
    }

Py_XDECREF(typecodes) is no longer needed using PyModule_Add().


(B) Add an existing object into the module, but the objet is already stored elsewhere. PyModule_AddObjectRef() is good for that case. It became common to have this case when an object is also stored in the module state. 

Example in _ast:

    state->AST_type = PyType_FromSpec(&AST_type_spec);
    if (!state->AST_type) {
        return 0;
    }
    (...)
    if (PyModule_AddObjectRef(m, "AST", state->AST_type) < 0) {
        return -1;
    }

state->AST_type and module attribute both hold a strong reference to the type.
History
Date User Action Args
2020-11-21 12:23:35serhiy.storchakasetpull_requests: + pull_request22335
2020-11-12 16:35:43vstinnersetmessages: + msg380833
2020-11-12 15:26:20serhiy.storchakasetmessages: + msg380829
2020-11-12 13:00:47vstinnersetmessages: + msg380821
2020-11-12 12:59:36vstinnersetmessages: + msg380820
2020-11-12 12:58:37vstinnersetmessages: + msg380819
2020-11-12 12:57:32vstinnersetmessages: + msg380818
2020-11-11 22:08:54brandtbuchersetnosy: + brandtbucher
messages: + msg380795
2020-11-11 20:38:44serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request22137
2020-11-11 20:35:14serhiy.storchakacreate