Message369881
> In CPython itself: See count_set_bits in Modules/mathmodule.c
Python/hamt.c contains an optimized function:
static inline uint32_t
hamt_bitcount(uint32_t i)
{
/* We could use native popcount instruction but that would
require to either add configure flags to enable SSE4.2
support or to detect it dynamically. Otherwise, we have
a risk of CPython not working properly on older hardware.
In practice, there's no observable difference in
performance between using a popcount instruction or the
following fallback code.
The algorithm is copied from:
https://graphics.stanford.edu/~seander/bithacks.html
*/
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0xF0F0F0F) * 0x1010101) >> 24;
}
Python/pymath.c provides a "unsigned int _Py_bit_length(unsigned long d)" function used by math.factorial, _PyLong_NumBits(), int.__format__(), long / long, _PyLong_Frexp() and PyLong_AsDouble(), etc.
Maybe we could add a _Py_bit_count().
See also bpo-29782: "Use __builtin_clzl for bits_in_digit if available" which proposes to micro-optimize _Py_bit_length().
--
In the meanwhile, I also added pycore_byteswap.h *internal* header which provides static inline function which *do* use builtin functions like __builtin_bswap32(). |
|
Date |
User |
Action |
Args |
2020-05-25 14:53:08 | vstinner | set | recipients:
+ vstinner, tim.peters, rhettinger, mark.dickinson, casevh, serhiy.storchaka, Jim Fasarakis-Hilliard, niklasf, gbtami |
2020-05-25 14:53:08 | vstinner | set | messageid: <1590418388.48.0.880766696723.issue29882@roundup.psfhosted.org> |
2020-05-25 14:53:08 | vstinner | link | issue29882 messages |
2020-05-25 14:53:07 | vstinner | create | |
|