Message 369881 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Jim Fasarakis-Hilliard, casevh, gbtami, mark.dickinson, niklasf, rhettinger, serhiy.storchaka, tim.peters, vstinner
Date	2020-05-25.14:53:07
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1590418388.48.0.880766696723.issue29882@roundup.psfhosted.org>
In-reply-to

Content
> In CPython itself: See count_set_bits in Modules/mathmodule.c Python/hamt.c contains an optimized function: static inline uint32_t hamt_bitcount(uint32_t i) { /* We could use native popcount instruction but that would require to either add configure flags to enable SSE4.2 support or to detect it dynamically. Otherwise, we have a risk of CPython not working properly on older hardware. In practice, there's no observable difference in performance between using a popcount instruction or the following fallback code. The algorithm is copied from: https://graphics.stanford.edu/~seander/bithacks.html / i = i - ((i >> 1) & 0x55555555); i = (i & 0x33333333) + ((i >> 2) & 0x33333333); return (((i + (i >> 4)) & 0xF0F0F0F) 0x1010101) >> 24; } Python/pymath.c provides a "unsigned int _Py_bit_length(unsigned long d)" function used by math.factorial, _PyLong_NumBits(), int.__format__(), long / long, _PyLong_Frexp() and PyLong_AsDouble(), etc. Maybe we could add a _Py_bit_count(). See also bpo-29782: "Use __builtin_clzl for bits_in_digit if available" which proposes to micro-optimize _Py_bit_length(). -- In the meanwhile, I also added pycore_byteswap.h internal header which provides static inline function which do use builtin functions like __builtin_bswap32().

> In CPython itself: See count_set_bits in Modules/mathmodule.c

Python/hamt.c contains an optimized function:

static inline uint32_t
hamt_bitcount(uint32_t i)
{
    /* We could use native popcount instruction but that would
       require to either add configure flags to enable SSE4.2
       support or to detect it dynamically.  Otherwise, we have
       a risk of CPython not working properly on older hardware.

       In practice, there's no observable difference in
       performance between using a popcount instruction or the
       following fallback code.

       The algorithm is copied from:
       https://graphics.stanford.edu/~seander/bithacks.html
    */
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    return (((i + (i >> 4)) & 0xF0F0F0F) * 0x1010101) >> 24;
}


Python/pymath.c provides a "unsigned int _Py_bit_length(unsigned long d)" function used by math.factorial, _PyLong_NumBits(), int.__format__(), long / long, _PyLong_Frexp() and PyLong_AsDouble(), etc.

Maybe we could add a _Py_bit_count().

See also bpo-29782: "Use __builtin_clzl for bits_in_digit if available" which proposes to micro-optimize _Py_bit_length().

--

In the meanwhile, I also added pycore_byteswap.h *internal* header which provides static inline function which *do* use builtin functions like __builtin_bswap32().

History
Date	User	Action	Args
2020-05-25 14:53:08	vstinner	set	recipients: + vstinner, tim.peters, rhettinger, mark.dickinson, casevh, serhiy.storchaka, Jim Fasarakis-Hilliard, niklasf, gbtami
2020-05-25 14:53:08	vstinner	set	messageid: <1590418388.48.0.880766696723.issue29882@roundup.psfhosted.org>
2020-05-25 14:53:08	vstinner	link	issue29882 messages
2020-05-25 14:53:07	vstinner	create