This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Returning None from a callback with restype py_object decrements None's refcount too much
Type: crash Stage: patch review
Components: ctypes Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: anthonypjshaw Nosy List: anthonypjshaw, dgelessus, eryksun, freakboy3742
Priority: normal Keywords: easy (C), patch

Created on 2019-05-10 22:39 by dgelessus, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
python3.8_2019-05-11-000251.crash dgelessus, 2019-05-10 22:38 OS X crash report (Python 3.8)
test_gc_ctypes.py anthonypjshaw, 2019-05-11 06:10
Pull Requests
URL Status Linked Edit
PR 13364 open dgelessus, 2019-05-16 17:07
Messages (10)
msg342141 - (view) Author: (dgelessus) * Date: 2019-05-10 22:38
This occurs when writing a ctypes callback in Python whose restype is ctypes.py_object. If the callback returns None, the refcount of None is decremented once too often. This happens every time the callback is called, and if done often enough, Python attempts to deallocate None and crashes.

To reproduce:

    $ bin/python3
    Python 3.8.0a4+ (heads/master:09532feeec, May 10 2019, 23:53:49) 
    [Clang 8.0.0 (clang-800.0.42.1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> import ctypes
    >>> FUNTYPE = ctypes.CFUNCTYPE(ctypes.py_object)
    >>> @FUNTYPE
    ... def fun():
    ...     return None
    ... 
    >>> print(fun())
    None
    >>> sys.getrefcount(None)
    4329
    >>> print(fun())
    None
    >>> sys.getrefcount(None)
    4327
    >>> print(fun())
    None
    >>> sys.getrefcount(None)
    4326
    >>> while True:
    ...     fun()
    ... 
    Fatal Python error: deallocating None
    
    Current thread 0x00007fff7bf80000 (most recent call first):
      File "<stdin>", line 2 in <module>
    Abort trap: 6
    # exits with code 134 (SIGABRT)

I've attached the crash report generated by OS X. (It's a plain text file, despite the extension.)

Interestingly, this only happens with None. Other returned objects are refcounted correctly:

    >>> thing = object()
    >>> @FUNTYPE
    ... def otherfun():
    ...     return thing
    ... 
    >>> print(otherfun())
    <object object at 0x10d920220>
    >>> sys.getrefcount(thing)
    2
    >>> print(otherfun())
    <object object at 0x10d920220>
    >>> sys.getrefcount(thing)
    2
    >>> print(otherfun())
    <object object at 0x10d920220>
    >>> sys.getrefcount(thing)
    2

I could reproduce this on the following configurations:

* Python 3.8.0a4 (self-compiled from master branch: 09532feeece39d5ba68a0d47115ce1967bfbd58e) on OS X 10.11 (x86_64)
* Python 3.7.3 (python.org installer) on OS X 10.11 (x86_64)
* Python 3.6.8 (from package manager) on Kubuntu 18.10 (x86_64)
* Python 3.5.3 (from package manager) on Raspbian Stretch (armv6)
msg342163 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-11 05:58
Thanks, I'll check this out
msg342164 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-11 06:03
The documentation says:

>>
Note Make sure you keep references to CFUNCTYPE() objects as long as they are used from C code. ctypes doesn’t, and if you don’t, they may be garbage collected, crashing your program when a callback is made.
Also, note that if the callback function is called in a thread created outside of Python’s control (e.g. by the foreign code that calls the callback), ctypes creates a new dummy Python thread on every invocation. This behavior is correct for most purposes, but it means that values stored with threading.local will not survive across different callbacks, even when those calls are made from the same C thread.

But that doesn't describe the situation you've shared. I'll continue to look into the ctypes module
msg342165 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-11 06:10
From lldb 
(lldb) run ~/cpython/test_gc_ctypes.py
Process 20059 launched: '/Users/anthonyshaw/CLionProjects/cpython/python.exe' (x86_64)
Fatal Python error: deallocating None

Current thread 0x00000001005c85c0 (most recent call first):
  File "/Users/anthonyshaw/cpython/test_gc_ctypes.py", line 7 in <module>
Objects/typeobject.c:3187: _Py_NegativeRefcount: Assertion failed: object has negative ref count
Enable tracemalloc to get the memory block allocation traceback

object  : <refcnt -1 at 0x10038a2c0>
type    : NoneType
refcount: -1
address : 0x10038a2c0
Process 20059 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff5984f2c6 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff5984f2c6 <+10>: jae    0x7fff5984f2d0            ; <+20>
    0x7fff5984f2c8 <+12>: movq   %rax, %rdi
    0x7fff5984f2cb <+15>: jmp    0x7fff59849457            ; cerror_nocancel
    0x7fff5984f2d0 <+20>: retq   
Target 0: (python.exe) stopped.
msg342166 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-11 06:27
Full trace for reference:

(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff5984f2c6 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff59904bf1 libsystem_pthread.dylib`pthread_kill + 284
    frame #2: 0x00007fff597b96a6 libsystem_c.dylib`abort + 127
    frame #3: 0x0000000100236c14 python.exe`fatal_error(prefix=0x0000000000000000, msg="_PyObject_AssertFailed", status=-1) at pylifecycle.c:2128:9
    frame #4: 0x0000000100236a5e python.exe`Py_FatalError(msg="_PyObject_AssertFailed") at pylifecycle.c:2138:5
    frame #5: 0x00000001000b49f4 python.exe`_PyObject_AssertFailed(obj=0x000000010038a2c0, expr=0x0000000000000000, msg="object has negative ref count", file="Objects/typeobject.c", line=3187, function="_Py_NegativeRefcount") at object.c:2163:5
    frame #6: 0x00000001000b4a44 python.exe`_Py_NegativeRefcount(filename="Objects/typeobject.c", lineno=3187, op=0x000000010038a2c0) at object.c:208:5
    frame #7: 0x00000001000d2bad python.exe`_Py_DECREF(filename="Objects/typeobject.c", lineno=3187, op=0x000000010038a2c0) at object.h:468:13
    frame #8: 0x00000001000d5889 python.exe`_PyType_Lookup(type=0x00000001003e4ad0, name=0x000000010064bba0) at typeobject.c:3187:9
    frame #9: 0x00000001000b731f python.exe`_PyObject_GenericGetAttrWithDict(obj=0x0000000100645dd0, name=0x000000010064bba0, dict=0x0000000000000000, suppress=0) at object.c:1221:13
    frame #10: 0x00000001000b7243 python.exe`PyObject_GenericGetAttr(obj=0x0000000100645dd0, name=0x000000010064bba0) at object.c:1309:12
    frame #11: 0x00000001000b69a8 python.exe`PyObject_GetAttr(v=0x0000000100645dd0, name=0x000000010064bba0) at object.c:916:16
    frame #12: 0x0000000100301599 python.exe`textiowrapper_closed_get(self=0x00000001006aba50, context=0x0000000000000000) at textio.c:3015:12
    frame #13: 0x000000010004eedf python.exe`getset_get(descr=0x000000010122f210, obj=0x00000001006aba50, type=0x00000001003e7280) at descrobject.c:158:16
    frame #14: 0x00000001000b738c python.exe`_PyObject_GenericGetAttrWithDict(obj=0x00000001006aba50, name=0x00000001016f00a8, dict=0x0000000000000000, suppress=0) at object.c:1228:19
    frame #15: 0x00000001000b7243 python.exe`PyObject_GenericGetAttr(obj=0x00000001006aba50, name=0x00000001016f00a8) at object.c:1309:12
    frame #16: 0x00000001000b69a8 python.exe`PyObject_GetAttr(v=0x00000001006aba50, name=0x00000001016f00a8) at object.c:916:16
    frame #17: 0x00000001000b68e4 python.exe`PyObject_GetAttrString(v=0x00000001006aba50, name="closed") at object.c:821:11
    frame #18: 0x00000001002399cc python.exe`file_is_closed(fobj=0x00000001006aba50) at pylifecycle.c:1074:21
    frame #19: 0x0000000100235f64 python.exe`flush_std_files at pylifecycle.c:1094:45
    frame #20: 0x0000000100236bfd python.exe`fatal_error(prefix=0x0000000000000000, msg="deallocating None", status=-1) at pylifecycle.c:2116:9
    frame #21: 0x0000000100236a5e python.exe`Py_FatalError(msg="deallocating None") at pylifecycle.c:2138:5
    frame #22: 0x00000001000b8318 python.exe`none_dealloc(ignore=0x000000010038a2c0) at object.c:1551:5
    frame #23: 0x00000001000ba6ba python.exe`_Py_Dealloc(op=0x000000010038a2c0) at object.c:2178:5
    frame #24: 0x00000001001dec2b python.exe`_Py_DECREF(filename="Python/ceval.c", lineno=1200, op=0x000000010038a2c0) at object.h:473:9
    frame #25: 0x00000001001cd444 python.exe`_PyEval_EvalFrameDefault(f=0x000000010130b860, throwflag=0) at ceval.c:1200:13
    frame #26: 0x00000001001cc277 python.exe`PyEval_EvalFrameEx(f=0x000000010130b860, throwflag=0) at ceval.c:625:12
    frame #27: 0x00000001001e3347 python.exe`_PyEval_EvalCodeWithName(_co=0x000000010146b880, globals=0x0000000101246e50, locals=0x0000000101246e50, args=0x0000000000000000, argcount=0, kwnames=0x0000000000000000, kwargs=0x0000000000000000, kwcount=0, kwstep=2, defs=0x0000000000000000, defcount=0, kwdefs=0x0000000000000000, closure=0x0000000000000000, name=0x0000000000000000, qualname=0x0000000000000000) at ceval.c:4036:14
    frame #28: 0x00000001001cc1f0 python.exe`PyEval_EvalCodeEx(_co=0x000000010146b880, globals=0x0000000101246e50, locals=0x0000000101246e50, args=0x0000000000000000, argcount=0, kws=0x0000000000000000, kwcount=0, defs=0x0000000000000000, defcount=0, kwdefs=0x0000000000000000, closure=0x0000000000000000) at ceval.c:4065:12
    frame #29: 0x00000001001cc04e python.exe`PyEval_EvalCode(co=0x000000010146b880, globals=0x0000000101246e50, locals=0x0000000101246e50) at ceval.c:602:12
    frame #30: 0x0000000100249f55 python.exe`run_eval_code_obj(co=0x000000010146b880, globals=0x0000000101246e50, locals=0x0000000101246e50) at pythonrun.c:1047:9
    frame #31: 0x00000001002483f7 python.exe`run_mod(mod=0x0000000102017438, filename=0x0000000101495d30, globals=0x0000000101246e50, locals=0x0000000101246e50, flags=0x00007ffeefbff690, arena=0x00000001012e2520) at pythonrun.c:1063:9
    frame #32: 0x00000001002478b5 python.exe`PyRun_FileExFlags(fp=0x00007fff8c715030, filename_str="/Users/anthonyshaw/cpython/test_gc_ctypes.py", start=257, globals=0x0000000101246e50, locals=0x0000000101246e50, closeit=1, flags=0x00007ffeefbff690) at pythonrun.c:994:11
    frame #33: 0x0000000100246c91 python.exe`PyRun_SimpleFileExFlags(fp=0x00007fff8c715030, filename="/Users/anthonyshaw/cpython/test_gc_ctypes.py", closeit=1, flags=0x00007ffeefbff690) at pythonrun.c:429:13
    frame #34: 0x00000001002467bc python.exe`PyRun_AnyFileExFlags(fp=0x00007fff8c715030, filename="/Users/anthonyshaw/cpython/test_gc_ctypes.py", closeit=1, flags=0x00007ffeefbff690) at pythonrun.c:85:16
    frame #35: 0x0000000100277095 python.exe`pymain_run_file(config=0x0000000101800ca8, cf=0x00007ffeefbff690) at main.c:346:15
    frame #36: 0x000000010027641d python.exe`pymain_run_python(exitcode=0x00007ffeefbff75c) at main.c:511:21
    frame #37: 0x00000001002760cc python.exe`_Py_RunMain at main.c:583:24
    frame #38: 0x0000000100276612 python.exe`pymain_main(args=0x00007ffeefbff7c8) at main.c:612:12
    frame #39: 0x0000000100276655 python.exe`_Py_UnixMain(argc=2, argv=0x00007ffeefbff830) at main.c:636:12
    frame #40: 0x0000000100000d82 python.exe`main(argc=2, argv=0x00007ffeefbff830) at python.c:16:12
    frame #41: 0x00007fff597143d5 libdyld.dylib`start + 1
msg342174 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-11 07:41
In the stack it's calling none_dealloc() which should never happen.

Assume this is being triggered by ctypes PyCFuncPtr_call.

The stacktrace I'm getting comes after the double decref so it's not showing the root cause. Someone who knows ctypes better might be able to help
msg342186 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-05-11 13:31
This is due to an oversight in _CallPythonObject in Modules/_ctypes/callbacks.c. The setfunc protocol is to return None if there's no object to keep alive. This isn't applicable to py_object (i.e. O_set in Modules/_ctypes/cfield.c). So the problem is the unconditional decref of None in the following snippet from _CallPythonObject:

    if (keep == NULL) /* Could not convert callback result. */
        PyErr_WriteUnraisable(callable);
    else if (keep == Py_None) /* Nothing to keep */
        Py_DECREF(keep);
    else if (setfunc != _ctypes_get_fielddesc("O")->setfunc) {
        if (-1 == PyErr_WarnEx(PyExc_RuntimeWarning,
                               "memory leak in callback function.",
                               1))
            PyErr_WriteUnraisable(callable);
    }

I'd rewrite it as follows:

    if (keep == NULL) { /* Could not convert callback result. */
        PyErr_WriteUnraisable(callable);
    } else if (setfunc != _ctypes_get_fielddesc("O")->setfunc) {
        if (keep == Py_None) { /* Nothing to keep */
            Py_DECREF(keep);
        } else if (PyErr_WarnEx(PyExc_RuntimeWarning, "memory leak "
                   "in callback function.", 1) == -1) {
            PyErr_WriteUnraisable(callable);
        }
    }
msg342221 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-11 21:02
Thanks Eryk that saved a lot of debugging. 

dgelessus - if you want to write a patch for CPython am happy to take you through this and get it over the line. 

Else: am Happy to write a test against the gc counter and a patch for this
msg342304 - (view) Author: (dgelessus) * Date: 2019-05-13 10:17
Thank you for looking into this! I can confirm that Eryk Sun's change fixes the issue for me locally.

I'm up for making the patch for this. Regarding tests, I see there are already some refcount-related ctypes tests in Lib/ctypes/test/test_refcounts.py - should I add another test case there that reproduces this situation and ensures that None's refcount is unaffected? (I imagine testing against None's refcount will be a bit fragile, so it might be best to call the previously-buggy function in a loop and check afterwards that None's refcount hasn't changed significantly.)
msg342387 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-13 21:23
If you can write a test similar to the AnotherLeak.test_callback test case, and commit that first.

It will show in the CI/CD log as failed and verify the issue (incase it comes up in PR review)

Then add another commit with the patch itself and we should see the CI/CD pass.

Please follow the PR guide if this is your first PR to CPython https://devguide.python.org/pullrequest/
History
Date User Action Args
2022-04-11 14:59:15adminsetgithub: 81061
2021-03-13 06:00:14eryksunsetversions: + Python 3.9, Python 3.10, - Python 3.5, Python 3.6, Python 3.7
2019-05-16 17:07:58dgelessussetkeywords: + patch
stage: patch review
pull_requests: + pull_request13273
2019-05-13 21:23:43anthonypjshawsetmessages: + msg342387
2019-05-13 10:17:21dgelessussetmessages: + msg342304
2019-05-11 21:02:28anthonypjshawsetmessages: + msg342221
2019-05-11 15:32:22freakboy3742setnosy: + freakboy3742
2019-05-11 13:31:50eryksunsetkeywords: + easy (C)
nosy: + eryksun
messages: + msg342186

2019-05-11 07:41:34anthonypjshawsetmessages: + msg342174
2019-05-11 06:27:01anthonypjshawsetmessages: + msg342166
2019-05-11 06:10:16anthonypjshawsetfiles: + test_gc_ctypes.py

messages: + msg342165
2019-05-11 06:03:48anthonypjshawsetmessages: + msg342164
2019-05-11 05:58:39anthonypjshawsetassignee: anthonypjshaw

messages: + msg342163
nosy: + anthonypjshaw
2019-05-10 22:39:01dgelessuscreate