This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [C API] Detect refcount bugs on True/False in C extensions
Type: Stage: resolved
Components: Extension Modules Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Christian.Tismer, miss-islington, vstinner
Priority: normal Keywords: patch

Created on 2021-08-31 13:17 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
os_uname_refcount_bug.patch vstinner, 2021-08-31 13:19
Pull Requests
URL Status Linked Edit
PR 28089 merged vstinner, 2021-08-31 13:24
PR 28090 merged vstinner, 2021-08-31 13:38
PR 28096 merged miss-islington, 2021-08-31 15:34
PR 28503 merged vstinner, 2021-09-21 20:38
PR 28504 merged vstinner, 2021-09-21 21:19
PR 28516 merged vstinner, 2021-09-22 09:23
Messages (14)
msg400728 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-31 13:17
Writing C extensions using directly the C API is error prone. It's easy to add or forget a Py_INCREF or Py_DECREF. Adding Py_DECREF(Py_True) or Py_DECREF(Py_False) by mistake causes a surprising crash at Python exit:
---
Debug memory block at address p=0x8a6e80: API ''
    0 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0x00 *** OUCH
        at p-6: 0x00 *** OUCH
        at p-5: 0x00 *** OUCH
        at p-4: 0x00 *** OUCH
        at p-3: 0x00 *** OUCH
        at p-2: 0x00 *** OUCH
        at p-1: 0x00 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0x8a6e80 are not all FORBIDDENBYTE (0xfd):
        at tail+0: 0x00 *** OUCH
        at tail+1: 0x00 *** OUCH
        at tail+2: 0x00 *** OUCH
        at tail+3: 0x00 *** OUCH
        at tail+4: 0x00 *** OUCH
        at tail+5: 0x00 *** OUCH
        at tail+6: 0x00 *** OUCH
        at tail+7: 0x00 *** OUCH

Enable tracemalloc to get the memory block allocation traceback

Fatal Python error: _PyMem_DebugRawFree: bad ID: Allocated using API '', verified using API 'o'
Python runtime state: finalizing (tstate=0x0000000001f43c50)

Current thread 0x00007f3f562fa740 (most recent call first):
  Garbage-collecting
<no Python frame>
Abandon (core dumped)
---

In my case, the bug occurs at Python exit, in code_dealloc(): "Py_XDECREF(co->co_consts);" destroys a tuple which contains True. It calls object_dealloc() which calls PyBool_Type.tp_dealloc(Py_True), but this object is allocated statically, and so its memory must not deallocated by the Python dynamic memory allocator. In debug mode, PyObject_Free() triggers a fatal error.

Concrete example of such bug in PySide with Python 3.10 which is now stricter on reference counting (thanks to the work made in bpo-1635741 and for Python subinterpreters):
https://bugreports.qt.io/browse/PYSIDE-1436

I propose to add a specific deallocator functions on bool to detect such bug, to ease debugging. There is already a similar deallocator for the None singleton.
msg400729 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-31 13:19
To reproduce the bug, apply attached os_uname_refcount_bug.patch and call os.uname():
---
$ ./python -c 'import os; os.uname()'
(...)
Fatal Python error: _PyMem_DebugRawFree: bad ID: Allocated using API '', verified using API 'o'
(...)
---
msg400730 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-31 13:34
There was such bug in CPython stdlib, in the _testcapi module, see commit 310e2d25170a88ef03f6fd31efcc899fe062da2c of bpo-36854:

commit 310e2d25170a88ef03f6fd31efcc899fe062da2c
Author: Victor Stinner <vstinner@python.org>
Date:   Fri Nov 22 10:58:00 2019 +0100

    bpo-36854: Fix refleak in subinterpreter (GH-17331)
    
    finalize_interp_clear() now explicitly clears the codec registry and
    then trigger a GC collection to clear all references.

diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index baa6907b7e..0908f3457f 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -6721,11 +6721,14 @@ PyInit__testcapi(void)
     PyModule_AddObject(m, "instancemethod", (PyObject *)&PyInstanceMethod_Type);
 
     PyModule_AddIntConstant(m, "the_number_three", 3);
+    PyObject *v;
 #ifdef WITH_PYMALLOC
-    PyModule_AddObject(m, "WITH_PYMALLOC", Py_True);
+    v = Py_True;
 #else
-    PyModule_AddObject(m, "WITH_PYMALLOC", Py_False);
+    v = Py_False;
 #endif
+    Py_INCREF(v);
+    PyModule_AddObject(m, "WITH_PYMALLOC", v);
 
     TestError = PyErr_NewException("_testcapi.error", NULL, NULL);
     Py_INCREF(TestError);
msg400746 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-31 15:34
New changeset 888d4cc06b887e77f281ba4d640e281cb4c61b7b by Victor Stinner in branch 'main':
bpo-45061: Enhance faulthandler traceback wit no Python frame (GH-28090)
https://github.com/python/cpython/commit/888d4cc06b887e77f281ba4d640e281cb4c61b7b
msg400748 - (view) Author: miss-islington (miss-islington) Date: 2021-08-31 15:53
New changeset c4c57e5c0eb79795d4fd1d9d8292455567c60070 by Miss Islington (bot) in branch '3.10':
bpo-45061: Enhance faulthandler traceback wit no Python frame (GH-28090)
https://github.com/python/cpython/commit/c4c57e5c0eb79795d4fd1d9d8292455567c60070
msg400753 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-31 16:05
New changeset 4300352000beed22fb525ec45fd331918d206528 by Victor Stinner in branch 'main':
bpo-45061: Detect Py_DECREF(Py_True) bug (GH-28089)
https://github.com/python/cpython/commit/4300352000beed22fb525ec45fd331918d206528
msg400828 - (view) Author: Christian Tismer (Christian.Tismer) * (Python committer) Date: 2021-09-01 12:29
What about an even more flexible solution?
A debug option could memorize always the last object deallocated
and give full info (the object's repr) before the crash would happen.
msg400829 - (view) Author: Christian Tismer (Christian.Tismer) * (Python committer) Date: 2021-09-01 12:32
Ah, that would be really much to store: every object with a refcount
going to zero would have to be memorized :/
msg400839 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-01 14:27
Christian Tismer: "What about an even more flexible solution? A debug option could memorize always the last object deallocated and give full info (the object's repr) before the crash would happen."

Bugs in object_dealloc() are rare. Apart None, True and False, do you know other objects which must not be deleted?

Calling repr(obj) can crash during late Python finalization for many reasons.

It is slow and can consume significant memory. Not all object types detect loops (a container storing indirectly itself) and so it can fail with a MemoryError: repr() is not "safe".

Using gdb, it is easy to dump the object currently being deleted: go the object_dealloc() frame and use "print self" command.

Example using attached os_uname_refcount_bug.patch:
------------
$ gdb ./python
(...)
(gdb) run
>>> import os; os.uname()
>>> exit()

Debug memory block at address p=0x886e60: API ''
    0 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0x00 *** OUCH
        at p-6: 0x00 *** OUCH
        at p-5: 0x00 *** OUCH
        at p-4: 0x00 *** OUCH
        at p-3: 0x00 *** OUCH
        at p-2: 0x00 *** OUCH
        at p-1: 0x00 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0x886e60 are not all FORBIDDENBYTE (0xfd):
        at tail+0: 0x00 *** OUCH
        at tail+1: 0x00 *** OUCH
        at tail+2: 0x00 *** OUCH
        at tail+3: 0x00 *** OUCH
        at tail+4: 0x00 *** OUCH
        at tail+5: 0x00 *** OUCH
        at tail+6: 0x00 *** OUCH
        at tail+7: 0x00 *** OUCH

Enable tracemalloc to get the memory block allocation traceback

Fatal Python error: _PyMem_DebugRawFree: bad ID: Allocated using API '', verified using API 'o'
Python runtime state: finalizing (tstate=0x00000000008e1a80)

Current thread 0x00007ffff7c24740 (most recent call first):
  Garbage-collecting
  <no Python frame>

Program received signal SIGABRT, Aborted.
0x00007ffff7c662a2 in raise () from /lib64/libc.so.6

(...)

(gdb) frame 9
#9  0x0000000000499dd7 in object_dealloc (self=True)
    at Objects/typeobject.c:4497
4497	    Py_TYPE(self)->tp_free(self);

(gdb) print self
$2 = True
------------
msg400904 - (view) Author: Christian Tismer (Christian.Tismer) * (Python committer) Date: 2021-09-02 08:52
> Apart None, True and False, do you know other objects which must not be deleted?

Yes, what I can think of is the immutable empty tuple singleton
which is such a candidate to be forgotten.
msg402365 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-21 21:04
New changeset 79a31480992c3fa5890fc7a6c5d9af6d337d5844 by Victor Stinner in branch 'main':
bpo-45061: Detect refcount bug on empty tuple singleton (GH-28503)
https://github.com/python/cpython/commit/79a31480992c3fa5890fc7a6c5d9af6d337d5844
msg402370 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-21 21:43
New changeset 86f28372b17c8c56539e9543bea9f125ab11b8aa by Victor Stinner in branch 'main':
bpo-45061: Detect refcount bug on empty string singleton (GH-28504)
https://github.com/python/cpython/commit/86f28372b17c8c56539e9543bea9f125ab11b8aa
msg402371 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-21 21:45
I don't want to add a deallocator to bytes and int types to detect when their singleton is destroyed, since it would slow down C extensions which have no refcount bug. Maybe it could be added only if Py_DEBUG macro is defined. But I don't think that Py_DECREF() bugs on these singletons are common enough to justify these. I don't want to make Python debug build slower.

IMO detecting refcount bugs on the most common singletons is enough: None, True, False, (), "". I implemented these checks because it was simple and had no major on performance, even on the debug build.

I now consider that the issue is fully fixed ;-)
msg402417 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-22 10:16
New changeset 8620be99da930230b18ec05f4d7446ee403531af by Victor Stinner in branch 'main':
bpo-45061: Revert unicode_is_singleton() change (GH-28516)
https://github.com/python/cpython/commit/8620be99da930230b18ec05f4d7446ee403531af
History
Date User Action Args
2022-04-11 14:59:49adminsetgithub: 89224
2021-09-22 10:16:57vstinnersetmessages: + msg402417
2021-09-22 09:23:49vstinnersetpull_requests: + pull_request26908
2021-09-21 21:45:48vstinnersetmessages: + msg402371
2021-09-21 21:43:16vstinnersetmessages: + msg402370
2021-09-21 21:19:40vstinnersetpull_requests: + pull_request26899
2021-09-21 21:04:41vstinnersetmessages: + msg402365
2021-09-21 20:38:24vstinnersetpull_requests: + pull_request26898
2021-09-02 08:52:39Christian.Tismersetmessages: + msg400904
2021-09-01 14:27:00vstinnersetmessages: + msg400839
2021-09-01 12:32:21Christian.Tismersetmessages: + msg400829
2021-09-01 12:29:19Christian.Tismersetnosy: + Christian.Tismer
messages: + msg400828
2021-08-31 16:05:53vstinnersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-08-31 16:05:22vstinnersetmessages: + msg400753
2021-08-31 15:53:28miss-islingtonsetmessages: + msg400748
2021-08-31 15:34:56miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request26539
2021-08-31 15:34:39vstinnersetmessages: + msg400746
2021-08-31 13:38:14vstinnersetpull_requests: + pull_request26533
2021-08-31 13:34:42vstinnersetmessages: + msg400730
2021-08-31 13:24:06vstinnersetstage: patch review
pull_requests: + pull_request26532
2021-08-31 13:19:06vstinnersetfiles: + os_uname_refcount_bug.patch
keywords: + patch
messages: + msg400729
2021-08-31 13:17:39vstinnercreate