This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ctypes pointee goes out of scope, then pointer in struct dangles and crashes
Type: crash Stage: resolved
Components: ctypes Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: NankerPhelge, eryksun
Priority: normal Keywords:

Created on 2020-09-29 03:40 by NankerPhelge, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg377652 - (view) Author: Ian M. Hoffman (NankerPhelge) Date: 2020-09-29 03:40
A description of the problem, complete example code for reproducing it, and a work-around are available on SO at the link:

https://stackoverflow.com/questions/64083376/python-memory-corruption-after-successful-return-from-a-ctypes-foreign-function

In summary: (1) create an array within a Python function, (2) create a ctypes.Structure with a pointer to that array, (3) return that struct from the Python function, (4) pass the struct out and back to a foreign function, (5) Python can successfully dereference the return from the foreign function, then (6) Python crashes.

As far as I can tell, when the array in the function goes out of scope at the end of the function, the pointer to it in the struct becomes dangling ... but the dangling doesn't catch up with Python until the very end when the Python struct finally goes out of scope in Python and the GC can't find its pointee.

I've reproduced this on Windows and linux with gcc- and MSVC-compiled Python 3.6 and 3.8.

Perhaps it is not good practice on my part to have let the array go out of scope, but perhaps a warning from Python (or at least some internal awareness that the memory is no longer addressed) is in order so that Python doesn't crash upon failing to free it.

This may be related to #39217; I can't tell.
msg377657 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-09-29 05:46
I think this is a numpy issue. Its data_as() method doesn't support the ctypes _objects protocol to keep the numpy array referenced by subsequently created ctypes objects. For example:

    import ctypes
    import numpy as np

    dtype = ctypes.c_double
    ptype = ctypes.POINTER(dtype)

    class array_struct(ctypes.Structure):
        _fields_ = (('dim1', ctypes.c_int),
                    ('dim2', ctypes.c_int),
                    ('ptr', ptype))

    n = 12
    m = 50
    a = np.ones((n, m), dtype=dtype)
    p = a.ctypes.data_as(ptype)

data_as() is implemented as a cast() of the ctypes._data pointer. This is a c_void_p instance for the base address of the numpy array, but it doesn't reference the numpy array object itself. The pointer returned by cast() references this _data pointer in its _objects:

    >>> p._objects
    {139993690976448: c_void_p(42270224)}

data_as() also sets a reference to the numpy array object as a non-ctypes _arr attribute:

    >>> p._arr is a
    True

This _arr attribute keeps the array referenced for the particular instance, but ctypes isn't aware of it. When the object returned by data_as() is set in a struct, ctypes only carries forward the c_void_p reference from _objects that it's aware of:

    >>> a_wrap1 = array_struct(n, m, p)
    >>> a_wrap1._objects
    {'2': {139993690976448: c_void_p(42270224)}}

It would be sufficient to keep the numpy array alive if this c_void_p instance referenced the array, but it doesn't. Alternatively, data_as() could update the _objects dict to reference the array. For example:

    >>> p._objects['1'] = a
    >>> a_wrap2 = array_struct(n, m, p)
    >>> a_wrap2._objects['2']['1'] is a
    True
msg377674 - (view) Author: Ian M. Hoffman (NankerPhelge) Date: 2020-09-29 16:45
You are correct.

After further review, I found an older ctypes issue #12836 which was then enshrined in a workaround in the numpy.ndarray.ctypes interface to vanilla ctypes.

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.ctypes.html

Numpy ctypes has both a `data` method for which "a reference will not be kept to the array" and a `data_as` method which has the desired behavior: "The returned pointer will keep a reference to the array."

So, we've all got our workarounds. What remains is whether/how to implement a check in Python for the dangling pointer. I have no advice on that, except that it is desirable to avoid the fault crash, no matter who is to blame.
msg377691 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-09-30 03:49
> `data_as` method which has the desired behavior: "The returned 
> pointer will keep a reference to the array."

I don't think it's the desired behavior at all. data_as() sets an _arr attribute of which ctypes isn't aware. It should cast the address to the given type and manually set the array reference in the _objects dict, which ctypes will automatically carry forward in all instances of aggregate types (structs and arrays) that reference the numpy array. For example:

    >>> p = a.ctypes.data_as(ptype)
    >>> p._objects['1'] = a

Adding p to an array carries its _objects dict forward:

    >>> ptarr = (ptype * 1)(p)
    >>> ptarr._objects['0']['1'] is a
    True

If the returned pointer is cast() again, then bpo-12836 is an issue. For example:

    >>> p2 = ctypes.cast(p, ctypes.c_void_p)
    >>> p._objects is p2._objects
    True
    >>> for k in p._objects:
    ...     if p._objects[k] is p:
    ...         print('circular reference')
    ... 
    circular reference

That needs to be fixed. But relying on _arr instead of correctly integrating with ctypes isn't a good idea, IMO. Work around the actual bug instead of introducing behavior that risks crashing just for the sake of resolving an uncommon circular reference problem.
msg377712 - (view) Author: Ian M. Hoffman (NankerPhelge) Date: 2020-09-30 16:39
I agree with you. When I wrote "desired behavior" I intended it to mean "my selfishly desired outcome of not loading my struct with a dangling pointer." This issue seems to have descended into workarounds that treat the symptoms; I'm all for treating the cause.
msg389053 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-03-19 04:58
The ctypes issue is bpo-12836, which has a suggested solution. This issue is a third-party problem introduced by a workaround, which needs to be addressed at the source, such as with helper functions and subclasses that close the loop.
History
Date User Action Args
2022-04-11 14:59:36adminsetgithub: 86049
2021-03-19 04:58:54eryksunsetstatus: open -> closed
resolution: third party
messages: + msg389053

stage: resolved
2020-09-30 16:39:00NankerPhelgesetmessages: + msg377712
2020-09-30 03:49:55eryksunsetmessages: + msg377691
2020-09-29 16:45:37NankerPhelgesetmessages: + msg377674
2020-09-29 05:46:01eryksunsetnosy: + eryksun
messages: + msg377657
2020-09-29 03:40:06NankerPhelgecreate