This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ctypes is too slow to convert a Python list to a C array
Type: performance Stage:
Components: ctypes Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Tom Cornebize, eryksun
Priority: normal Keywords:

Created on 2016-09-01 09:31 by Tom Cornebize, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
ctypes_slow.py Tom Cornebize, 2016-09-01 09:31
Messages (3)
msg274111 - (view) Author: Tom Cornebize (Tom Cornebize) Date: 2016-09-01 09:31
It is much faster to construct a Python array from the list and then cast this array, rather than using the "standard" constructor. See attached file to compare the performances.

This issue was previously asked on Stackoverflow: http://stackoverflow.com/questions/39225263/why-is-ctypes-so-slow-to-convert-a-python-list-to-a-c-array/
msg274138 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-09-01 14:53
This is a consequence of several factors. It starts with the __init__ method of ctypes.Array, Array_init. This function doesn't hard-code calling the base sq_ass_item slot function, Array_ass_item. If it did, it wouldn't be nearly as slow. Instead it calls the abstract function PySequence_SetItem. Doing it this way accommodates an array subclass that overrides __setitem__. 

What I'd like to do here is check whether the sq_ass_item slot is defined as Array_ass_item, and if so call it directly instead of PySequence_SetItem. But it turns out that it's not set as Array_ass_item even if the subclass doesn't override __setitem__, and more than anything this is the real culprit for the relative slowness of Array_init.

If a built-in type such as ctypes.Array defines both mp_ass_subscript and sq_ass_item, then the __setitem__ wrapper_descriptor wraps the more generic mp_ass_subscript slot function. Then for a subclass, update_one_slot in Objects/typeobject.c plays it safe when updating the sq_ass_item slot. It sees that the inherited __setitem__ descriptor doesn't call wrap_sq_setitem, so it defines the slot in the subclass to use the generic function slot_sq_ass_item. 

This generic slot function goes the long way around to look up and bind the __setitem__ method and convert the Py_ssize_t index to a Python integer, to call the wrapper that calls the mp_ass_subscript slot. To add insult to injury, the implementation of this slot for a ctypes Array, Array_ass_subscript, has to convert back to a Py_ssize_t integer via PyNumber_AsSsize_t.

I don't know if this can be resolved while preserving the generic design of the initializer. As is, calling PySequence_SetItem in a tight loop is ridiculously slow. I experimented with calling Array_ass_item directly. With this change it's as fast as assigning to a slice of the whole array. Actually with a list it's a bit slower because *t has to be copied to a tuple. But it takes about the same amount of time as assigning to a slice when t is already a tuple, such as tuple(range(1000000)).

I doubt any amount of tweaking will make ctypes as fast as an array.array. ctypes has a generic design to accommodate simple C data, pointers, and aggregate arrays, structs, and unions. This comes with some cost to performance. However, you can and should make use of the buffer protocol to use arrays from the array module or numpy where performance is critical. It's trivial to create a ctypes array from an object that supports the buffer protocol. For example: 

    v = array.array('I', t)
    a = (ctypes.c_uint32 * len(v)).from_buffer(v)

There's no need to use the array.array's buffer_info() or ctypes.cast(). The from_buffer() method creates an array that shares the buffer of the source object, so it's relatively fast. It's also returning a sized array instead of a lengthless pointer (though it is possible to cast to an array pointer and immediately dereference the array).
msg274148 - (view) Author: Tom Cornebize (Tom Cornebize) Date: 2016-09-01 15:57
Thank you for these explanations.

I understand that we get a generic function to the cost of performances.

However, I think we should at least tell in the documentation that the constructor (ctypes.c_uint32 * len(t))(*t) is slow and that we can do much faster in some specific cases (e.g. an array of integers).

It would be even better to have some specific method(s) to do this in ctypes, instead of having to rely on an array.array just to build a ctypes array from a list. I am not familiar with CPython code, so I do not know if it would be easily feasible.
History
Date User Action Args
2022-04-11 14:58:35adminsetgithub: 72113
2021-03-05 19:07:48eryksunsetversions: + Python 3.8, Python 3.9, Python 3.10, - Python 2.7, Python 3.5, Python 3.6
2016-09-01 15:57:58Tom Cornebizesetmessages: + msg274148
2016-09-01 14:54:15eryksunsetversions: + Python 2.7, Python 3.5, Python 3.6
2016-09-01 14:53:44eryksunsetnosy: + eryksun
messages: + msg274138
2016-09-01 09:31:15Tom Cornebizecreate