Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning None from a callback with restype py_object decrements None's refcount too much #81061

Closed
dgelessus mannequin opened this issue May 10, 2019 · 11 comments
Closed
Assignees
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes easy topic-ctypes type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@dgelessus
Copy link
Mannequin

dgelessus mannequin commented May 10, 2019

BPO 36880
Nosy @freakboy3742, @eryksun, @tonybaloney, @dgelessus
PRs
  • GH-81061: Fix refcount issue when returning None from a ctypes.py_object callback #13364
  • Files
  • python3.8_2019-05-11-000251.crash: OS X crash report (Python 3.8)
  • test_gc_ctypes.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/tonybaloney'
    closed_at = None
    created_at = <Date 2019-05-10.22:39:01.064>
    labels = ['easy', '3.8', '3.9', '3.10', 'ctypes', 'type-crash']
    title = "Returning None from a callback with restype py_object decrements None's refcount too much"
    updated_at = <Date 2021-03-13.06:00:14.026>
    user = 'https://github.com/dgelessus'

    bugs.python.org fields:

    activity = <Date 2021-03-13.06:00:14.026>
    actor = 'eryksun'
    assignee = 'anthonypjshaw'
    closed = False
    closed_date = None
    closer = None
    components = ['ctypes']
    creation = <Date 2019-05-10.22:39:01.064>
    creator = 'dgelessus'
    dependencies = []
    files = ['48325', '48326']
    hgrepos = []
    issue_num = 36880
    keywords = ['patch', 'easy (C)']
    message_count = 10.0
    messages = ['342141', '342163', '342164', '342165', '342166', '342174', '342186', '342221', '342304', '342387']
    nosy_count = 4.0
    nosy_names = ['freakboy3742', 'eryksun', 'anthonypjshaw', 'dgelessus']
    pr_nums = ['13364']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue36880'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    Linked PRs

    @dgelessus
    Copy link
    Mannequin Author

    dgelessus mannequin commented May 10, 2019

    This occurs when writing a ctypes callback in Python whose restype is ctypes.py_object. If the callback returns None, the refcount of None is decremented once too often. This happens every time the callback is called, and if done often enough, Python attempts to deallocate None and crashes.

    To reproduce:

        $ bin/python3
        Python 3.8.0a4+ (heads/master:09532feeec, May 10 2019, 23:53:49) 
        [Clang 8.0.0 (clang-800.0.42.1)] on darwin
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import sys
        >>> import ctypes
        >>> FUNTYPE = ctypes.CFUNCTYPE(ctypes.py_object)
        >>> @FUNTYPE
        ... def fun():
        ...     return None
        ... 
        >>> print(fun())
        None
        >>> sys.getrefcount(None)
        4329
        >>> print(fun())
        None
        >>> sys.getrefcount(None)
        4327
        >>> print(fun())
        None
        >>> sys.getrefcount(None)
        4326
        >>> while True:
        ...     fun()
        ... 
        Fatal Python error: deallocating None
        
        Current thread 0x00007fff7bf80000 (most recent call first):
          File "<stdin>", line 2 in <module>
        Abort trap: 6
        # exits with code 134 (SIGABRT)

    I've attached the crash report generated by OS X. (It's a plain text file, despite the extension.)

    Interestingly, this only happens with None. Other returned objects are refcounted correctly:

        >>> thing = object()
        >>> @FUNTYPE
        ... def otherfun():
        ...     return thing
        ... 
        >>> print(otherfun())
        <object object at 0x10d920220>
        >>> sys.getrefcount(thing)
        2
        >>> print(otherfun())
        <object object at 0x10d920220>
        >>> sys.getrefcount(thing)
        2
        >>> print(otherfun())
        <object object at 0x10d920220>
        >>> sys.getrefcount(thing)
        2

    I could reproduce this on the following configurations:

    • Python 3.8.0a4 (self-compiled from master branch: 09532fe) on OS X 10.11 (x86_64)
    • Python 3.7.3 (python.org installer) on OS X 10.11 (x86_64)
    • Python 3.6.8 (from package manager) on Kubuntu 18.10 (x86_64)
    • Python 3.5.3 (from package manager) on Raspbian Stretch (armv6)

    @dgelessus dgelessus mannequin added 3.7 (EOL) end of life 3.8 only security fixes topic-ctypes type-crash A hard crash of the interpreter, possibly with a core dump labels May 10, 2019
    @tonybaloney
    Copy link
    Mannequin

    tonybaloney mannequin commented May 11, 2019

    Thanks, I'll check this out

    @tonybaloney tonybaloney mannequin self-assigned this May 11, 2019
    @tonybaloney
    Copy link
    Mannequin

    tonybaloney mannequin commented May 11, 2019

    The documentation says:

    >
    Note Make sure you keep references to CFUNCTYPE() objects as long as they are used from C code. ctypes doesn’t, and if you don’t, they may be garbage collected, crashing your program when a callback is made.
    Also, note that if the callback function is called in a thread created outside of Python’s control (e.g. by the foreign code that calls the callback), ctypes creates a new dummy Python thread on every invocation. This behavior is correct for most purposes, but it means that values stored with threading.local will not survive across different callbacks, even when those calls are made from the same C thread.

    But that doesn't describe the situation you've shared. I'll continue to look into the ctypes module

    @tonybaloney
    Copy link
    Mannequin

    tonybaloney mannequin commented May 11, 2019

    From lldb
    (lldb) run ~/cpython/test_gc_ctypes.py
    Process 20059 launched: '/Users/anthonyshaw/CLionProjects/cpython/python.exe' (x86_64)
    Fatal Python error: deallocating None

    Current thread 0x00000001005c85c0 (most recent call first):
    File "/Users/anthonyshaw/cpython/test_gc_ctypes.py", line 7 in <module>
    Objects/typeobject.c:3187: _Py_NegativeRefcount: Assertion failed: object has negative ref count
    Enable tracemalloc to get the memory block allocation traceback

    object : <refcnt -1 at 0x10038a2c0>
    type : NoneType
    refcount: -1
    address : 0x10038a2c0
    Process 20059 stopped

    • thread Support "bpo-" in Misc/NEWS #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
      frame #0: 0x00007fff5984f2c6 libsystem_kernel.dylib__pthread_kill + 10 libsystem_kernel.dylib__pthread_kill:
      -> 0x7fff5984f2c6 <+10>: jae 0x7fff5984f2d0 ; <+20>
      0x7fff5984f2c8 <+12>: movq %rax, %rdi
      0x7fff5984f2cb <+15>: jmp 0x7fff59849457 ; cerror_nocancel
      0x7fff5984f2d0 <+20>: retq
      Target 0: (python.exe) stopped.

    @tonybaloney
    Copy link
    Mannequin

    tonybaloney mannequin commented May 11, 2019

    Full trace for reference:

    (lldb) bt all

    • thread Support "bpo-" in Misc/NEWS #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
      • frame #0: 0x00007fff5984f2c6 libsystem_kernel.dylib__pthread_kill + 10 frame #1: 0x00007fff59904bf1 libsystem_pthread.dylibpthread_kill + 284
        frame Rename README to README.rst and enhance formatting #2: 0x00007fff597b96a6 libsystem_c.dylibabort + 127 frame #3: 0x0000000100236c14 python.exefatal_error(prefix=0x0000000000000000, msg="_PyObject_AssertFailed", status=-1) at pylifecycle.c:2128:9
        frame Update Python Software Foundation Copyright Year. #4: 0x0000000100236a5e python.exePy_FatalError(msg="_PyObject_AssertFailed") at pylifecycle.c:2138:5 frame #5: 0x00000001000b49f4 python.exe_PyObject_AssertFailed(obj=0x000000010038a2c0, expr=0x0000000000000000, msg="object has negative ref count", file="Objects/typeobject.c", line=3187, function="_Py_NegativeRefcount") at object.c:2163:5
        frame Add Pycharm's .idea directory to gitignore #6: 0x00000001000b4a44 python.exe_Py_NegativeRefcount(filename="[Objects/typeobject.c](https://github.com/python/cpython/blob/main/Objects/typeobject.c)", lineno=3187, op=0x000000010038a2c0) at object.c:208:5 frame #7: 0x00000001000d2bad python.exe_Py_DECREF(filename="Objects/typeobject.c", lineno=3187, op=0x000000010038a2c0) at object.h:468:13
        frame Change some mercurial/ hg.python.org references. #8: 0x00000001000d5889 python.exe_PyType_Lookup(type=0x00000001003e4ad0, name=0x000000010064bba0) at typeobject.c:3187:9 frame #9: 0x00000001000b731f python.exe_PyObject_GenericGetAttrWithDict(obj=0x0000000100645dd0, name=0x000000010064bba0, dict=0x0000000000000000, suppress=0) at object.c:1221:13
        frame bpo-29474: Improve documentation for weakref.WeakValueDictionary #10: 0x00000001000b7243 python.exePyObject_GenericGetAttr(obj=0x0000000100645dd0, name=0x000000010064bba0) at object.c:1309:12 frame #11: 0x00000001000b69a8 python.exePyObject_GetAttr(v=0x0000000100645dd0, name=0x000000010064bba0) at object.c:916:16
        frame bpo-29524: Add Objects/call.c file #12: 0x0000000100301599 python.exetextiowrapper_closed_get(self=0x00000001006aba50, context=0x0000000000000000) at textio.c:3015:12 frame #13: 0x000000010004eedf python.exegetset_get(descr=0x000000010122f210, obj=0x00000001006aba50, type=0x00000001003e7280) at descrobject.c:158:16
        frame Disable Travis docs job until a fix is found #14: 0x00000001000b738c python.exe_PyObject_GenericGetAttrWithDict(obj=0x00000001006aba50, name=0x00000001016f00a8, dict=0x0000000000000000, suppress=0) at object.c:1228:19 frame #15: 0x00000001000b7243 python.exePyObject_GenericGetAttr(obj=0x00000001006aba50, name=0x00000001016f00a8) at object.c:1309:12
        frame Make Travis docs build more lenient #16: 0x00000001000b69a8 python.exePyObject_GetAttr(v=0x00000001006aba50, name=0x00000001016f00a8) at object.c:916:16 frame #17: 0x00000001000b68e4 python.exePyObject_GetAttrString(v=0x00000001006aba50, name="closed") at object.c:821:11
        frame Rename Doc/README.txt to Doc/README.rst #18: 0x00000001002399cc python.exefile_is_closed(fobj=0x00000001006aba50) at pylifecycle.c:1074:21 frame #19: 0x0000000100235f64 python.exeflush_std_files at pylifecycle.c:1094:45
        frame Update link destination of the Mersenne Twister homepage #20: 0x0000000100236bfd python.exefatal_error(prefix=0x0000000000000000, msg="deallocating None", status=-1) at pylifecycle.c:2116:9 frame #21: 0x0000000100236a5e python.exePy_FatalError(msg="deallocating None") at pylifecycle.c:2138:5
        frame [backport to 3.6] bpo-29474: Improve documentation for weakref.WeakValueDictionary #22: 0x00000001000b8318 python.exenone_dealloc(ignore=0x000000010038a2c0) at object.c:1551:5 frame #23: 0x00000001000ba6ba python.exe_Py_Dealloc(op=0x000000010038a2c0) at object.c:2178:5
        frame bpo-28837: Fix lib2to3 handling of map/zip/filter #24: 0x00000001001dec2b python.exe_Py_DECREF(filename="[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c)", lineno=1200, op=0x000000010038a2c0) at object.h:473:9 frame #25: 0x00000001001cd444 python.exe_PyEval_EvalFrameDefault(f=0x000000010130b860, throwflag=0) at ceval.c:1200:13
        frame [backport to 3.5] bpo-29529: Add .travis.yml to 3.5 branch #26: 0x00000001001cc277 python.exePyEval_EvalFrameEx(f=0x000000010130b860, throwflag=0) at ceval.c:625:12 frame #27: 0x00000001001e3347 python.exe_PyEval_EvalCodeWithName(_co=0x000000010146b880, globals=0x0000000101246e50, locals=0x0000000101246e50, args=0x0000000000000000, argcount=0, kwnames=0x0000000000000000, kwargs=0x0000000000000000, kwcount=0, kwstep=2, defs=0x0000000000000000, defcount=0, kwdefs=0x0000000000000000, closure=0x0000000000000000, name=0x0000000000000000, qualname=0x0000000000000000) at ceval.c:4036:14
        frame bpo-28556: Various updates to typing #28: 0x00000001001cc1f0 python.exePyEval_EvalCodeEx(_co=0x000000010146b880, globals=0x0000000101246e50, locals=0x0000000101246e50, args=0x0000000000000000, argcount=0, kws=0x0000000000000000, kwcount=0, defs=0x0000000000000000, defcount=0, kwdefs=0x0000000000000000, closure=0x0000000000000000) at ceval.c:4065:12 frame #29: 0x00000001001cc04e python.exePyEval_EvalCode(co=0x000000010146b880, globals=0x0000000101246e50, locals=0x0000000101246e50) at ceval.c:602:12
        frame Allow up to a 0.01% drop in coverage #30: 0x0000000100249f55 python.exerun_eval_code_obj(co=0x000000010146b880, globals=0x0000000101246e50, locals=0x0000000101246e50) at pythonrun.c:1047:9 frame #31: 0x00000001002483f7 python.exerun_mod(mod=0x0000000102017438, filename=0x0000000101495d30, globals=0x0000000101246e50, locals=0x0000000101246e50, flags=0x00007ffeefbff690, arena=0x00000001012e2520) at pythonrun.c:1063:9
        frame bpo-29576: Improve some deprecations in the importlib #32: 0x00000001002478b5 python.exePyRun_FileExFlags(fp=0x00007fff8c715030, filename_str="/Users/anthonyshaw/cpython/test_gc_ctypes.py", start=257, globals=0x0000000101246e50, locals=0x0000000101246e50, closeit=1, flags=0x00007ffeefbff690) at pythonrun.c:994:11 frame #33: 0x0000000100246c91 python.exePyRun_SimpleFileExFlags(fp=0x00007fff8c715030, filename="/Users/anthonyshaw/cpython/test_gc_ctypes.py", closeit=1, flags=0x00007ffeefbff690) at pythonrun.c:429:13
        frame bpo-29026: Clarify documentation of time.time #34: 0x00000001002467bc python.exePyRun_AnyFileExFlags(fp=0x00007fff8c715030, filename="/Users/anthonyshaw/cpython/test_gc_ctypes.py", closeit=1, flags=0x00007ffeefbff690) at pythonrun.c:85:16 frame #35: 0x0000000100277095 python.exepymain_run_file(config=0x0000000101800ca8, cf=0x00007ffeefbff690) at main.c:346:15
        frame [backport to 3.5] bpo-28929: Link the documentation to its source file on GitHub #36: 0x000000010027641d python.exepymain_run_python(exitcode=0x00007ffeefbff75c) at main.c:511:21 frame #37: 0x00000001002760cc python.exe_Py_RunMain at main.c:583:24
        frame [backport to 2.7] bpo-28929: Link the documentation to its source file on GitHub #38: 0x0000000100276612 python.exepymain_main(args=0x00007ffeefbff7c8) at main.c:612:12 frame #39: 0x0000000100276655 python.exe_Py_UnixMain(argc=2, argv=0x00007ffeefbff830) at main.c:636:12
        frame [backport to 3.5] bpo-29438: Fixed use-after-free in key sharing dict #40: 0x0000000100000d82 python.exemain(argc=2, argv=0x00007ffeefbff830) at python.c:16:12 frame #41: 0x00007fff597143d5 libdyld.dylibstart + 1

    @tonybaloney
    Copy link
    Mannequin

    tonybaloney mannequin commented May 11, 2019

    In the stack it's calling none_dealloc() which should never happen.

    Assume this is being triggered by ctypes PyCFuncPtr_call.

    The stacktrace I'm getting comes after the double decref so it's not showing the root cause. Someone who knows ctypes better might be able to help

    @eryksun
    Copy link
    Contributor

    eryksun commented May 11, 2019

    This is due to an oversight in _CallPythonObject in Modules/_ctypes/callbacks.c. The setfunc protocol is to return None if there's no object to keep alive. This isn't applicable to py_object (i.e. O_set in Modules/_ctypes/cfield.c). So the problem is the unconditional decref of None in the following snippet from _CallPythonObject:

        if (keep == NULL) /* Could not convert callback result. */
            PyErr_WriteUnraisable(callable);
        else if (keep == Py_None) /* Nothing to keep */
            Py_DECREF(keep);
        else if (setfunc != _ctypes_get_fielddesc("O")->setfunc) {
            if (-1 == PyErr_WarnEx(PyExc_RuntimeWarning,
                                   "memory leak in callback function.",
                                   1))
                PyErr_WriteUnraisable(callable);
        }

    I'd rewrite it as follows:

        if (keep == NULL) { /* Could not convert callback result. */
            PyErr_WriteUnraisable(callable);
        } else if (setfunc != _ctypes_get_fielddesc("O")->setfunc) {
            if (keep == Py_None) { /* Nothing to keep */
                Py_DECREF(keep);
            } else if (PyErr_WarnEx(PyExc_RuntimeWarning, "memory leak "
                       "in callback function.", 1) == -1) {
                PyErr_WriteUnraisable(callable);
            }
        }

    @eryksun eryksun added the easy label May 11, 2019
    @tonybaloney
    Copy link
    Mannequin

    tonybaloney mannequin commented May 11, 2019

    Thanks Eryk that saved a lot of debugging.

    dgelessus - if you want to write a patch for CPython am happy to take you through this and get it over the line.

    Else: am Happy to write a test against the gc counter and a patch for this

    @dgelessus
    Copy link
    Mannequin Author

    dgelessus mannequin commented May 13, 2019

    Thank you for looking into this! I can confirm that Eryk Sun's change fixes the issue for me locally.

    I'm up for making the patch for this. Regarding tests, I see there are already some refcount-related ctypes tests in Lib/ctypes/test/test_refcounts.py - should I add another test case there that reproduces this situation and ensures that None's refcount is unaffected? (I imagine testing against None's refcount will be a bit fragile, so it might be best to call the previously-buggy function in a loop and check afterwards that None's refcount hasn't changed significantly.)

    @tonybaloney
    Copy link
    Mannequin

    tonybaloney mannequin commented May 13, 2019

    If you can write a test similar to the AnotherLeak.test_callback test case, and commit that first.

    It will show in the CI/CD log as failed and verify the issue (incase it comes up in PR review)

    Then add another commit with the patch itself and we should see the CI/CD pass.

    Please follow the PR guide if this is your first PR to CPython https://devguide.python.org/pullrequest/

    @eryksun eryksun added 3.9 only security fixes 3.10 only security fixes and removed 3.7 (EOL) end of life labels Mar 13, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    kumaraditya303 pushed a commit that referenced this issue Jan 9, 2023
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 9, 2023
    …es.py_object` callback (pythonGH-13364)
    
    (cherry picked from commit 837ba05)
    
    Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 9, 2023
    …es.py_object` callback (pythonGH-13364)
    
    (cherry picked from commit 837ba05)
    
    Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
    miss-islington added a commit that referenced this issue Jan 9, 2023
    …object` callback (GH-13364)
    
    (cherry picked from commit 837ba05)
    
    Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
    miss-islington added a commit that referenced this issue Jan 9, 2023
    …object` callback (GH-13364)
    
    (cherry picked from commit 837ba05)
    
    Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
    @hauntsaninja
    Copy link
    Contributor

    Thanks, looks like this was completed

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes 3.10 only security fixes easy topic-ctypes type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants