This is a consequence of several factors. It starts with the __init__ method of ctypes.Array, Array_init. This function doesn't hard-code calling the base sq_ass_item slot function, Array_ass_item. If it did, it wouldn't be nearly as slow. Instead it calls the abstract function PySequence_SetItem. Doing it this way accommodates an array subclass that overrides __setitem__.
What I'd like to do here is check whether the sq_ass_item slot is defined as Array_ass_item, and if so call it directly instead of PySequence_SetItem. But it turns out that it's not set as Array_ass_item even if the subclass doesn't override __setitem__, and more than anything this is the real culprit for the relative slowness of Array_init.
If a built-in type such as ctypes.Array defines both mp_ass_subscript and sq_ass_item, then the __setitem__ wrapper_descriptor wraps the more generic mp_ass_subscript slot function. Then for a subclass, update_one_slot in Objects/typeobject.c plays it safe when updating the sq_ass_item slot. It sees that the inherited __setitem__ descriptor doesn't call wrap_sq_setitem, so it defines the slot in the subclass to use the generic function slot_sq_ass_item.
This generic slot function goes the long way around to look up and bind the __setitem__ method and convert the Py_ssize_t index to a Python integer, to call the wrapper that calls the mp_ass_subscript slot. To add insult to injury, the implementation of this slot for a ctypes Array, Array_ass_subscript, has to convert back to a Py_ssize_t integer via PyNumber_AsSsize_t.
I don't know if this can be resolved while preserving the generic design of the initializer. As is, calling PySequence_SetItem in a tight loop is ridiculously slow. I experimented with calling Array_ass_item directly. With this change it's as fast as assigning to a slice of the whole array. Actually with a list it's a bit slower because *t has to be copied to a tuple. But it takes about the same amount of time as assigning to a slice when t is already a tuple, such as tuple(range(1000000)).
I doubt any amount of tweaking will make ctypes as fast as an array.array. ctypes has a generic design to accommodate simple C data, pointers, and aggregate arrays, structs, and unions. This comes with some cost to performance. However, you can and should make use of the buffer protocol to use arrays from the array module or numpy where performance is critical. It's trivial to create a ctypes array from an object that supports the buffer protocol. For example:
v = array.array('I', t)
a = (ctypes.c_uint32 * len(v)).from_buffer(v)
There's no need to use the array.array's buffer_info() or ctypes.cast(). The from_buffer() method creates an array that shares the buffer of the source object, so it's relatively fast. It's also returning a sized array instead of a lengthless pointer (though it is possible to cast to an array pointer and immediately dereference the array).
|