classification
Title: add BLAKE3 to hashlib
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: christian.heimes Nosy List: Zooko.Wilcox-O'Hearn, christian.heimes, corona10, larry, oconnor663, xtreak
Priority: normal Keywords: patch

Created on 2020-01-11 04:27 by larry, last changed 2020-01-17 22:55 by oconnor663.

Messages (7)
msg359777 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2020-01-11 04:27
From 3/4 of the team that brought you BLAKE2, now comes... BLAKE3!

https://github.com/BLAKE3-team/BLAKE3

BLAKE3 is a brand new hashing function.  It's fast, it's paralellizeable, and unlike BLAKE2 there's only one variant.

I've experimented with it a little.  On my laptop (2018 Intel i7 64-bit), the portable implementation is kind of middle-of-the-pack, but with AVX2 enabled it's second only to the "Haswell" build of KangarooTwelve.  On a 32-bit ARMv7 machine the results are more impressive--the portable implementation is neck-and-neck with MD4, and with NEON enabled it's definitely the fastest hash function I tested.  These tests are all single-threaded and eliminate I/O overhead.

The above Github repo has a reference implementation in C which includes Intel and ARM SIMD drivers.  Unsurprisingly, the interface looks roughly the same as the BLAKE2 interface(s), so if you took the existing BLAKE2 module and s/blake2b/blake3/ you'd be nearly done.  Not quite as close as blake2b and blake2s though ;-)
msg359794 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2020-01-11 13:37
I've been playing with the new algorithm, too. Pretty impressive!

Let's give the reference implementation a while to stabilize. The code has comments like: "This is only for benchmarking. The guy who wrote this file hasn't touched C since college. Please don't use this code in production."
msg359796 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2020-01-11 14:06
For what it's worth, I spent some time producing clean benchmarks.  All these were run on the same laptop, and all pre-load the same file (406668786 bytes) and run one update() on the whole thing to minimize overhead.  K12 and BLAKE3 are using a hand-written C driver, and compiled with both gcc and clang; all the rest of the algorithms are from hashlib.new, python3 configured with --enable-optimizations and compiled with gcc.  K12 and BLAKE3 support several SIMD extensions; this laptop only has AVX2 (no AVX512).  All these numbers are the best of 3.  All tests were run in a single thread.

-----------------+----------+----------+----+-----------------------
   hash algorithm|elapsed s |mb/sec    |size|hash
-----------------+----------+----------+----+-----------------------
      K12-Haswell 0.176949   2298224495  64  24693954fa0dfb059f99...
K12-Haswell-clang 0.181968   2234841926  64  24693954fa0dfb059f99...
BLAKE3-AVX2-clang 0.250482   1623547723  64  30149a073eab69f76583...
      BLAKE3-AVX2 0.256845   1583326242  64  30149a073eab69f76583...
              md4 0.37684668 1079135924  32  d8a66422a4f0ae430317...
             sha1 0.46739069  870083193  40  a7488d7045591450ded9...
        K12-clang 0.498058    816509323  64  24693954fa0dfb059f99...
           BLAKE3 0.561470    724292378  64  30149a073eab69f76583...
              K12 0.569490    714093306  64  24693954fa0dfb059f99...
     BLAKE3-clang 0.573743    708800001  64  30149a073eab69f76583...
          blake2b 0.58276098  697831191 128  809ca44337af39792f8f...
              md5 0.59936016  678504863  32  306d7de4d1622384b976...
           sha384 0.64208886  633352818  96  b107ce5d086e9757efa7...
       sha512_224 0.66094102  615287556  56  90931762b9e553bd07f3...
       sha512_256 0.66465768  611846969  64  27b03aacdfbde1c2628e...
           sha512 0.6776549   600111921 128  f0af29e2019a6094365b...
          blake2s 0.86828375  468359318  64  02bee0661cd88aa2be15...
           sha256 0.97720436  416155312  64  48b5243cfcd90d84cd3f...
           sha224 1.0255457   396538907  56  10fb56b87724d59761c6...
        shake_128 1.0895037   373260576  32  2ec12727ac9d59c2e842...
         md5-sha1 1.1171806   364013470  72  306d7de4d1622384b976...
         sha3_224 1.2059123   337229156  56  93eaf083ca3a9b348e14...
        shake_256 1.3039152   311882857  64  b92538fd701791db8c1b...
         sha3_256 1.3417314   303092540  64  69354bf585f21c567f1e...
        ripemd160 1.4846368   273918025  40  30f2fe48fec404990264...
         sha3_384 1.7710776   229616579  96  61af0469534633003d3b...
              sm3 1.8384831   221198006  64  1075d29c75b06cb0af3e...
         sha3_512 2.4839673   163717444 128  c7c250e79844d8dc856e...

If I can't have BLAKE3, I'm definitely switching to BLAKE2 ;-)
msg359936 - (view) Author: Jack O'Connor (oconnor663) * Date: 2020-01-13 22:51
I'm in the middle of adding some Rust bindings to the C implementation in github.com/BLAKE3-team/BLAKE3, so that `cargo test` and `cargo bench` can cover both. Once that's done, I'll follow up with benchmark numbers from my laptop (Kaby Lake i5-8250U, also AVX2 with no AVX-512). For benchmark numbers with AVX-512 support, see the Performance section of the BLAKE3 paper (https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf). Larry, what processor did you run your benchmarks on?

Also, is there anything currently in CPython that does dispatch based on runtime CPU feature detection? Is this something that BLAKE3 should do for itself, or is there existing machinery that we'd want to integrate with?
msg359941 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2020-01-13 23:52
According to my order details it is a "8th Generation Intel Core i7-8650U".
msg360152 - (view) Author: Jack O'Connor (oconnor663) * Date: 2020-01-16 23:16
Ok, I've added Rust bindings to the BLAKE3 C implementation, so that I can benchmark it in a vaguely consistent way. My laptop is an i5-8250U, which should be very similar to yours. (Both are "Kaby Lake Refresh".) My end result do look similar to yours with TurboBoost on, but pretty different with TurboBoost off:

with TurboBoost on
------------------
K12 GCC        | 2159 MB/s
BLAKE3 Rust    | 1787 MB/s
BLAKE3 C Clang | 1588 MB/s
BLAKE3 C GCC   | 1453 MB/s

with TurboBoost off
-------------------
BLAKE3 Rust    | 1288 MB/s
K12 GCC        | 1060 MB/s
BLAKE3 C Clang | 1094 MB/s
BLAKE3 C GCC   |  943 MB/s

The difference seems to be that with TurboBoost on, the BLAKE3 benchmarks have my CPU sitting around 2.4 GHz, while for the K12 benchmarks it's more like 2.9 GHz. With TurboBoost off, both benchmarks run at 1.6 GHz, and BLAKE3 does better. I'm not sure what causes that frequency difference. Perhaps some high-power instruction that the BLAKE3 implementation is emitting?

To reproduce these numbers you can clone these two repos (the latter is where I happen to have a K12 benchmark):

https://github.com/BLAKE3-team/BLAKE3
https://github.com/oconnor663/blake2_simd

Then in both cases checkout the "bench_406668786" branch, where I've put some benchmarks with the same input length you used.

For Rust BLAKE3, at the root of the BLAKE3 repo, run: cargo +nightly bench 406668786

For C BLAKE3, the command is the same, but run it in the "./c/blake3_c_rust_bindings" directory. The build defaults to GCC, and you can "export CC=clang" to switch it.

For my K12 benchmark, at the root of the blake2_simd repo, run: cargo +nightly bench --features=kangarootwelve 406668786
msg360215 - (view) Author: Jack O'Connor (oconnor663) * Date: 2020-01-17 22:55
I plan to bring the C code up to speed with the Rust code this week. As part of that, I'll probably remove comments like the one above :) Otherwise, is there anything else we can do on our end to help with this?
History
Date User Action Args
2020-01-17 22:55:56oconnor663setmessages: + msg360215
2020-01-16 23:16:29oconnor663setmessages: + msg360152
2020-01-13 23:52:19larrysetmessages: + msg359941
2020-01-13 22:51:57oconnor663setnosy: + oconnor663
messages: + msg359936
2020-01-11 14:06:33larrysetmessages: + msg359796
2020-01-11 13:37:00christian.heimessetassignee: christian.heimes
messages: + msg359794
2020-01-11 06:19:09xtreaksetnosy: + xtreak
2020-01-11 06:17:15corona10setnosy: + corona10
2020-01-11 04:27:40larrycreate