This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author oconnor663
Recipients Zooko.Wilcox-O'Hearn, christian.heimes, corona10, gregory.p.smith, jstasiak, kmaork, larry, lemburg, mgorny, oconnor663, xtreak
Date 2022-03-24.15:35:01
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
> Hardware accelerated SHAs are likely faster than blake3 single core.

Surprisingly, they're not. Here's a quick measurement on my recent ThinkPad laptop (64 KiB of input, single-threaded, TurboBoost left on), which supports both AVX-512 and the SHA extensions:

OpenSSL SHA-256: 1816 MB/s
OpenSSL SHA-1:   2103 MB/s
BLAKE3 SSE2:     2109 MB/s
BLAKE3 SSE4.1:   2474 MB/s
BLAKE3 AVX2:     4898 MB/s
BLAKE3 AVX-512:  8754 MB/s

The main reason SHA-1 and SHA-256 don't do better is that they're fundamentally serial algorithms. Hardware acceleration can speed up a single instance of their compression functions, but there's just no way for it to run more than one instance per message at a time. In contrast, AES-CTR can easily parallelize its blocks, and hardware accelerated AES does beat BLAKE3.

> And certainly more efficient in terms of watt-secs/byte.

I don't have any experience measuring power myself, so take this with a grain of salt: I think the difference in throughput shown above is large enough that, even accounting for the famously high power draw of AVX-512, BLAKE3 comes out ahead in terms of energy/byte. Probably not on ARM though.
Date User Action Args
2022-03-24 15:35:01oconnor663setrecipients: + oconnor663, lemburg, gregory.p.smith, larry, christian.heimes, mgorny, Zooko.Wilcox-O'Hearn, jstasiak, corona10, xtreak, kmaork
2022-03-24 15:35:01oconnor663setmessageid: <>
2022-03-24 15:35:01oconnor663linkissue39298 messages
2022-03-24 15:35:01oconnor663create