Our application server running on top of Twisted crashs 1 to 3 times
per day. It uses a ctypes binding for libnetfilter_conntrack (dump
Linux conntrack table) which is running in a dedicated thread. So we
get:
- Python 2.5.2
- Twisted 8.1.0-3
- Linux 2.6.26-1-amd64 SMP x86_64
The crash does not occur in the "ctypes" thread but it the main thread
(another CPython thread). The backtrace is incoherent which means that
it's a multithreading problem. So I used helgrind (Valgrind tool) to
watch invalid memory accesses, and here is one:
==30545== Possible data race during write of size 4 at 0x4EC1E60
==30545== at 0x808F616: PyString_FromStringAndSize
(stringobject.c:78)
==30545== by 0x4D3CBD9: string_at (_ctypes.c:4568)
==30545== by 0x4D4654E: ffi_call_SYSV (sysv.S:60)
==30545== by 0x4D46396: ffi_call (ffi.c:221)
==30545== by 0x4D3E9F1: _call_function_pointer (callproc.c:668)
==30545== by 0x4D3F147: _CallProc (callproc.c:991)
==30545== by 0x4D3B0DA: CFuncPtr_call (_ctypes.c:3373)
==30545== by 0x8060E0A: PyObject_Call (abstract.c:1861)
==30545== by 0x80CB391: do_call (ceval.c:3784)
==30545== by 0x80CAD69: call_function (ceval.c:3596)
==30545== by 0x80C7B6F: PyEval_EvalFrameEx (ceval.c:2272)
==30545== by 0x80C9329: PyEval_EvalCodeEx (ceval.c:2836)
==30545== Old state: shared-readonly by threads #1, #4
==30545== New state: shared-modified by threads #1, #4
==30545== Reason: this thread, #1, holds no consistent locks
==30545== Location 0x4EC1E60 has never been protected by any lock
In _CallProc() the test ((flags & FUNCFLAG_PYTHONAPI) == 0) is True,
which means that the GIL is released. But it's a bug because as you
can see, string_at() uses PyString_FromStringAndSize() which requires
the GIL!
Finally, the bug comes from ctypes module, not _ctypes: ctypes just
uses the wrong calling convention. Using PYFUNCPTR() instead of
CFUNCPTR(), the Helgrind warning goes away ;-)
Note about Helgrind: This tools really rocks!!!
|