christian.heimes
Recipients Arfrever, Giovanni.Bajo, PaulMcMillan, Vlado.Boza, alex, arigo, benjamin.peterson, camara, christian.heimes, dmalcolm, koniiiik, lemburg, serhiy.storchaka, vstinner
2012-11-06
I modified crypto_auth() a bit:

Py_uhash_t crypto_auth(const unsigned char *in, unsigned long long inlen)
  u64 k0 = _Py_HashSecret.prefix;
  u64 k1 = _Py_HashSecret.suffix;
  return (Py_uhash_t)b;

and replaced the loop in _Py_HashBytes() with a call to crypto_auth(). For large strings SipHash is as faster as our current algorithm on my 64bit box. That was to be expected as SipHash works on blocks of 8 bytes while the default algorithm can't be optimized with SIMD instructions.

Current hashing algorithm:
$ ./python -m timeit -s "x = b'a' * int(1E7)" "hash(x)"
1000000 loops, best of 3: 0.39 usec per loop

$ ./python -m timeit -s "x = b'a' * int(1E7)" "hash(x)"
1000000 loops, best of 3: 0.381 usec per loop
