Message304696
The setup.py file for Python states:
if (not cross_compiling and
os.uname().machine == "x86_64" and
sys.maxsize > 2**32):
# Every x86_64 machine has at least SSE2. Check for sys.maxsize
# in case that kernel is 64-bit but userspace is 32-bit.
blake2_macros.append(('BLAKE2_USE_SSE', '1'))
While the assertion about having SSE2 is true, it doesn't mean that it's worthwhile to use. I've tested pure (i.e. without SSSE3 and so on) on three different machines, getting the following results:
Athlon64 X2 (SSE2 is the best supported variant), 540 MiB of data:
SSE2: [5.189988004000043, 5.070812243997352]
ref: [2.0161159170020255, 2.0475422790041193]
Core i3, same data file:
SSE2: [1.924425926999902, 1.92461746999993, 1.9298037500000191]
ref: [1.7940209749999667, 1.7900855569999976, 1.7835538760000418]
Xeon E5630 server, 230 MiB data file:
SSE2: [0.7671358410007088, 0.7797677099879365, 0.7648976119962754]
ref: [0.5784736709902063, 0.5717909929953748, 0.5717219939979259]
So in all the tested cases, pure SSE2 implementation is *slower* than the reference implementation. SSSE3 and other variants are faster and AFAIU they are enabled automatically based on CFLAGS, so it doesn't matter for most of the systems.
However, for old CPUs that do not support SSSE3, the choice of SSE2 makes the algorithm prohibitively slow -- it's 2.5 times slower than the reference implementation! |
|
Date |
User |
Action |
Args |
2017-10-21 07:57:11 | mgorny | set | recipients:
+ mgorny |
2017-10-21 07:57:11 | mgorny | set | messageid: <1508572631.92.0.213398074469.issue31834@psf.upfronthosting.co.za> |
2017-10-21 07:57:11 | mgorny | link | issue31834 messages |
2017-10-21 07:57:10 | mgorny | create | |
|