Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C API] Detect refcount bugs on True/False in C extensions #89224

Closed
vstinner opened this issue Aug 31, 2021 · 14 comments
Closed

[C API] Detect refcount bugs on True/False in C extensions #89224

vstinner opened this issue Aug 31, 2021 · 14 comments
Labels
3.11 only security fixes extension-modules C modules in the Modules dir

Comments

@vstinner
Copy link
Member

BPO 45061
Nosy @vstinner, @ctismer, @miss-islington
PRs
  • bpo-45061: Detect Py_DECREF(Py_True) bug #28089
  • bpo-45061: Enhance faulthandler traceback wit no Python frame #28090
  • [3.10] bpo-45061: Enhance faulthandler traceback wit no Python frame (GH-28090) #28096
  • bpo-45061: Detect refcount bug on empty tuple singleton #28503
  • bpo-45061: Detect refcount bug on empty string singleton #28504
  • bpo-45061: Revert unicode_is_singleton() change #28516
  • Files
  • os_uname_refcount_bug.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-08-31.16:05:53.139>
    created_at = <Date 2021-08-31.13:17:39.832>
    labels = ['extension-modules', '3.11']
    title = '[C API] Detect refcount bugs on True/False in C extensions'
    updated_at = <Date 2021-09-22.10:16:57.297>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2021-09-22.10:16:57.297>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-08-31.16:05:53.139>
    closer = 'vstinner'
    components = ['Extension Modules']
    creation = <Date 2021-08-31.13:17:39.832>
    creator = 'vstinner'
    dependencies = []
    files = ['50247']
    hgrepos = []
    issue_num = 45061
    keywords = ['patch']
    message_count = 14.0
    messages = ['400728', '400729', '400730', '400746', '400748', '400753', '400828', '400829', '400839', '400904', '402365', '402370', '402371', '402417']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'Christian.Tismer', 'miss-islington']
    pr_nums = ['28089', '28090', '28096', '28503', '28504', '28516']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue45061'
    versions = ['Python 3.11']

    @vstinner
    Copy link
    Member Author

    Writing C extensions using directly the C API is error prone. It's easy to add or forget a Py_INCREF or Py_DECREF. Adding Py_DECREF(Py_True) or Py_DECREF(Py_False) by mistake causes a surprising crash at Python exit:
    ---
    Debug memory block at address p=0x8a6e80: API ''
    0 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
    at p-7: 0x00 *** OUCH
    at p-6: 0x00 *** OUCH
    at p-5: 0x00 *** OUCH
    at p-4: 0x00 *** OUCH
    at p-3: 0x00 *** OUCH
    at p-2: 0x00 *** OUCH
    at p-1: 0x00 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
    may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0x8a6e80 are not all FORBIDDENBYTE (0xfd):
    at tail+0: 0x00 *** OUCH
    at tail+1: 0x00 *** OUCH
    at tail+2: 0x00 *** OUCH
    at tail+3: 0x00 *** OUCH
    at tail+4: 0x00 *** OUCH
    at tail+5: 0x00 *** OUCH
    at tail+6: 0x00 *** OUCH
    at tail+7: 0x00 *** OUCH

    Enable tracemalloc to get the memory block allocation traceback

    Fatal Python error: _PyMem_DebugRawFree: bad ID: Allocated using API '', verified using API 'o'
    Python runtime state: finalizing (tstate=0x0000000001f43c50)

    Current thread 0x00007f3f562fa740 (most recent call first):
    Garbage-collecting
    <no Python frame>
    Abandon (core dumped)
    ---

    In my case, the bug occurs at Python exit, in code_dealloc(): "Py_XDECREF(co->co_consts);" destroys a tuple which contains True. It calls object_dealloc() which calls PyBool_Type.tp_dealloc(Py_True), but this object is allocated statically, and so its memory must not deallocated by the Python dynamic memory allocator. In debug mode, PyObject_Free() triggers a fatal error.

    Concrete example of such bug in PySide with Python 3.10 which is now stricter on reference counting (thanks to the work made in bpo-1635741 and for Python subinterpreters):
    https://bugreports.qt.io/browse/PYSIDE-1436

    I propose to add a specific deallocator functions on bool to detect such bug, to ease debugging. There is already a similar deallocator for the None singleton.

    @vstinner vstinner added 3.11 only security fixes extension-modules C modules in the Modules dir labels Aug 31, 2021
    @vstinner
    Copy link
    Member Author

    To reproduce the bug, apply attached os_uname_refcount_bug.patch and call os.uname():
    ---

    $ ./python -c 'import os; os.uname()'
    (...)
    Fatal Python error: _PyMem_DebugRawFree: bad ID: Allocated using API '', verified using API 'o'
    (...)

    @vstinner
    Copy link
    Member Author

    There was such bug in CPython stdlib, in the _testcapi module, see commit 310e2d2 of bpo-36854:

    commit 310e2d2
    Author: Victor Stinner <vstinner@python.org>
    Date: Fri Nov 22 10:58:00 2019 +0100

    bpo-36854: Fix refleak in subinterpreter (GH-17331)
    
    finalize_interp_clear() now explicitly clears the codec registry and
    then trigger a GC collection to clear all references.
    
    diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
    index baa6907b7e..0908f3457f 100644
    --- a/Modules/_testcapimodule.c
    +++ b/Modules/_testcapimodule.c
    @@ -6721,11 +6721,14 @@ PyInit__testcapi(void)
         PyModule_AddObject(m, "instancemethod", (PyObject *)&PyInstanceMethod_Type);
     
         PyModule_AddIntConstant(m, "the_number_three", 3);
    +    PyObject *v;
     #ifdef WITH_PYMALLOC
    -    PyModule_AddObject(m, "WITH_PYMALLOC", Py_True);
    +    v = Py_True;
     #else
    -    PyModule_AddObject(m, "WITH_PYMALLOC", Py_False);
    +    v = Py_False;
     #endif
    +    Py_INCREF(v);
    +    PyModule_AddObject(m, "WITH_PYMALLOC", v);
     
         TestError = PyErr_NewException("_testcapi.error", NULL, NULL);
         Py_INCREF(TestError);

    @vstinner
    Copy link
    Member Author

    New changeset 888d4cc by Victor Stinner in branch 'main':
    bpo-45061: Enhance faulthandler traceback wit no Python frame (GH-28090)
    888d4cc

    @miss-islington
    Copy link
    Contributor

    New changeset c4c57e5 by Miss Islington (bot) in branch '3.10':
    bpo-45061: Enhance faulthandler traceback wit no Python frame (GH-28090)
    c4c57e5

    @vstinner
    Copy link
    Member Author

    New changeset 4300352 by Victor Stinner in branch 'main':
    bpo-45061: Detect Py_DECREF(Py_True) bug (GH-28089)
    4300352

    @ctismer
    Copy link
    Contributor

    ctismer commented Sep 1, 2021

    What about an even more flexible solution?
    A debug option could memorize always the last object deallocated
    and give full info (the object's repr) before the crash would happen.

    @ctismer
    Copy link
    Contributor

    ctismer commented Sep 1, 2021

    Ah, that would be really much to store: every object with a refcount
    going to zero would have to be memorized :/

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 1, 2021

    Christian Tismer: "What about an even more flexible solution? A debug option could memorize always the last object deallocated and give full info (the object's repr) before the crash would happen."

    Bugs in object_dealloc() are rare. Apart None, True and False, do you know other objects which must not be deleted?

    Calling repr(obj) can crash during late Python finalization for many reasons.

    It is slow and can consume significant memory. Not all object types detect loops (a container storing indirectly itself) and so it can fail with a MemoryError: repr() is not "safe".

    Using gdb, it is easy to dump the object currently being deleted: go the object_dealloc() frame and use "print self" command.

    Example using attached os_uname_refcount_bug.patch:
    ------------

    $ gdb ./python
    (...)
    (gdb) run
    >>> import os; os.uname()
    >>> exit()

    Debug memory block at address p=0x886e60: API ''
    0 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
    at p-7: 0x00 *** OUCH
    at p-6: 0x00 *** OUCH
    at p-5: 0x00 *** OUCH
    at p-4: 0x00 *** OUCH
    at p-3: 0x00 *** OUCH
    at p-2: 0x00 *** OUCH
    at p-1: 0x00 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
    may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0x886e60 are not all FORBIDDENBYTE (0xfd):
    at tail+0: 0x00 *** OUCH
    at tail+1: 0x00 *** OUCH
    at tail+2: 0x00 *** OUCH
    at tail+3: 0x00 *** OUCH
    at tail+4: 0x00 *** OUCH
    at tail+5: 0x00 *** OUCH
    at tail+6: 0x00 *** OUCH
    at tail+7: 0x00 *** OUCH

    Enable tracemalloc to get the memory block allocation traceback

    Fatal Python error: _PyMem_DebugRawFree: bad ID: Allocated using API '', verified using API 'o'
    Python runtime state: finalizing (tstate=0x00000000008e1a80)

    Current thread 0x00007ffff7c24740 (most recent call first):
    Garbage-collecting
    <no Python frame>

    Program received signal SIGABRT, Aborted.
    0x00007ffff7c662a2 in raise () from /lib64/libc.so.6

    (...)

    (gdb) frame 9
    #9 0x0000000000499dd7 in object_dealloc (self=True)
    at Objects/typeobject.c:4497
    4497 Py_TYPE(self)->tp_free(self);

    (gdb) print self
    $2 = True
    ------------

    @ctismer
    Copy link
    Contributor

    ctismer commented Sep 2, 2021

    Apart None, True and False, do you know other objects which must not be deleted?

    Yes, what I can think of is the immutable empty tuple singleton
    which is such a candidate to be forgotten.

    @vstinner
    Copy link
    Member Author

    New changeset 79a3148 by Victor Stinner in branch 'main':
    bpo-45061: Detect refcount bug on empty tuple singleton (GH-28503)
    79a3148

    @vstinner
    Copy link
    Member Author

    New changeset 86f2837 by Victor Stinner in branch 'main':
    bpo-45061: Detect refcount bug on empty string singleton (GH-28504)
    86f2837

    @vstinner
    Copy link
    Member Author

    I don't want to add a deallocator to bytes and int types to detect when their singleton is destroyed, since it would slow down C extensions which have no refcount bug. Maybe it could be added only if Py_DEBUG macro is defined. But I don't think that Py_DECREF() bugs on these singletons are common enough to justify these. I don't want to make Python debug build slower.

    IMO detecting refcount bugs on the most common singletons is enough: None, True, False, (), "". I implemented these checks because it was simple and had no major on performance, even on the debug build.

    I now consider that the issue is fully fixed ;-)

    @vstinner
    Copy link
    Member Author

    New changeset 8620be9 by Victor Stinner in branch 'main':
    bpo-45061: Revert unicode_is_singleton() change (GH-28516)
    8620be9

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes extension-modules C modules in the Modules dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants