classification
Title: Moving to SipHash-1-3
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: christian.heimes Nosy List: christian.heimes, haypo, inada.naoki, lemburg, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-02-01 07:02 by inada.naoki, last changed 2017-03-06 11:23 by inada.naoki.

Files
File name Uploaded Description Edit
siphash13.patch inada.naoki, 2017-02-01 09:21 review
Messages (25)
msg286590 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-01 07:02
Rust and Ruby moved from SipHash-2-4 to SipHash-1-3.

https://github.com/rust-lang/rust/issues/29754
https://bugs.ruby-lang.org/issues/13017

SipHash-1-3 seems faster than 2-4 and secure enough.
msg286594 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-02-01 07:31
+1 The discussions on the Rust and Ruby lists seem persuasive.
msg286598 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-01 08:18
I'm running pyperformance.

BTW, hashlib doesn't provide siphash24.  So can we simply replace siphash24 with siphash13, like ruby?
https://github.com/ruby/ruby/commit/04c94f95d1a1c6a12f5412228a2bcdc00f5de3b2
msg286600 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-02-01 09:14
PEP 456 defines an API to add more hashing algorithms and make the selection of hash algorithm a compile time option. We can easily add SipHash-1-3 and make it the default algorithm. Vendors then can select between FNV2, SipHash-1-3 and SipHash-2-4.

On another note should we add SipHash-2-4 and 1-3 PRF to the hashlib mode?
msg286601 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-01 09:21
$ ../python.default -m perf compare_to default.json patched4.json -G
Slower (2):
- scimark_sor: 479 ms +- 8 ms -> 485 ms +- 9 ms: 1.01x slower (+1%)
- genshi_xml: 196 ms +- 2 ms -> 196 ms +- 2 ms: 1.00x slower (+0%)

Faster (19):
- json_loads: 62.7 us +- 0.5 us -> 61.5 us +- 1.1 us: 1.02x faster (-2%)
- django_template: 402 ms +- 3 ms -> 395 ms +- 4 ms: 1.02x faster (-2%)
- logging_format: 36.6 us +- 0.5 us -> 36.1 us +- 0.4 us: 1.01x faster (-1%)
- xml_etree_generate: 277 ms +- 5 ms -> 274 ms +- 5 ms: 1.01x faster (-1%)
- sympy_str: 461 ms +- 5 ms -> 456 ms +- 3 ms: 1.01x faster (-1%)
- call_method_unknown: 16.6 ms +- 1.2 ms -> 16.4 ms +- 0.2 ms: 1.01x faster (-1%)
- raytrace: 1.31 sec +- 0.02 sec -> 1.30 sec +- 0.01 sec: 1.01x faster (-1%)
- python_startup_no_site: 9.81 ms +- 0.02 ms -> 9.74 ms +- 0.02 ms: 1.01x faster (-1%)
- python_startup: 16.2 ms +- 0.0 ms -> 16.1 ms +- 0.0 ms: 1.01x faster (-1%)
- logging_simple: 31.2 us +- 0.3 us -> 31.0 us +- 0.4 us: 1.01x faster (-1%)
- 2to3: 742 ms +- 4 ms -> 739 ms +- 4 ms: 1.00x faster (-0%)
- hexiom: 22.7 ms +- 0.2 ms -> 22.6 ms +- 0.2 ms: 1.00x faster (-0%)
- call_simple: 14.2 ms +- 0.5 ms -> 14.2 ms +- 0.2 ms: 1.00x faster (-0%)
- sympy_integrate: 46.2 ms +- 0.3 ms -> 46.1 ms +- 0.4 ms: 1.00x faster (-0%)
- regex_compile: 434 ms +- 4 ms -> 433 ms +- 4 ms: 1.00x faster (-0%)
- deltablue: 17.7 ms +- 0.2 ms -> 17.7 ms +- 0.2 ms: 1.00x faster (-0%)
- chaos: 299 ms +- 3 ms -> 299 ms +- 3 ms: 1.00x faster (-0%)
- call_method_slots: 14.6 ms +- 0.2 ms -> 14.6 ms +- 0.1 ms: 1.00x faster (-0%)
- regex_dna: 283 ms +- 3 ms -> 283 ms +- 2 ms: 1.00x faster (-0%)

Benchmark hidden because not significant (43)


I'm creating siphash13 from reference implementation to double check the output.
msg286602 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-01 09:24
> PEP 456 defines an API to add more hashing algorithms and make the selection of hash algorithm a compile time option. We can easily add SipHash-1-3 and make it the default algorithm. Vendors then can select between FNV2, SipHash-1-3 and SipHash-2-4.

OK, I'll add siphash13, instead of replacing siphash24.

> On another note should we add SipHash-2-4 and 1-3 PRF to the hashlib mode?

siphash implementation uses 64bit unit.
I don't know how to implement partial update like: m.update(b'foo'); b.update(b'bar')
msg286605 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-02-01 09:35
I added SipHash13 as additional hash algorithm in https://github.com/tiran/cpython/tree/siphash13 . Still need to verify the finalizer.

For hashlib I'd need to move to a different implementation of SipHash. The implementation in pyhash.c is optimized for speed and has a fixed key.
msg286606 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-02-01 09:45
On 01.02.2017 10:14, Christian Heimes wrote:
> 
> PEP 456 defines an API to add more hashing algorithms and make the selection of hash algorithm a compile time option. We can easily add SipHash-1-3 and make it the default algorithm. Vendors then can select between FNV2, SipHash-1-3 and SipHash-2-4.

+1 on adding the 1-3 and making it the default; the faster
the better. Hash speed for strings needs to be excellent in Python
due to the many dict lookups we use in the interpreter.

Reading up a bit on the Rust thread and looking at this benchmark
which is mentioned in the thread:

https://imgur.com/5dKecOW

it seems as if it would make sense to not use a fixed
hash algorithm for all strings lengths, but instead a
hybrid one to increase performance for short strings
(which are used a lot in Python).

Is there a good hash algorithm with provides better
performance for short strings than siphash ?

> On another note should we add SipHash-2-4 and 1-3 PRF to the hashlib mode?

+1 as well.
msg286607 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-01 09:50
> it seems as if it would make sense to not use a fixed
> hash algorithm for all strings lengths, but instead a
> hybrid one to increase performance for short strings
> (which are used a lot in Python).
>
> Is there a good hash algorithm with provides better
> performance for short strings than siphash ?

There is undocumented option "Py_HASH_CUTOFF" to use DJBX33A for short string.

See https://github.com/python/cpython/blob/master/Include/pyhash.h#L98
msg286608 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-01 10:07
The performance of hash algorithm shouldn't affect general benchmarks since hash value is cached inside string object. Almost all dict lookups in critical parts are lookups with interned strings. But in corner cases the difference can be measurable. We should look not on results of macrobenchmarks, but find worst cases.

There is also an alignment issue with the implementation of SipHash-2-4 (and I suppose with SipHash-1-3). See issue28055.
msg286609 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-02-01 10:13
On 01.02.2017 10:50, INADA Naoki wrote:
> 
>> it seems as if it would make sense to not use a fixed
>> hash algorithm for all strings lengths, but instead a
>> hybrid one to increase performance for short strings
>> (which are used a lot in Python).
>>
>> Is there a good hash algorithm with provides better
>> performance for short strings than siphash ?
> 
> There is undocumented option "Py_HASH_CUTOFF" to use DJBX33A for short string.
> 
> See https://github.com/python/cpython/blob/master/Include/pyhash.h#L98

Thanks. I found a reference in the PEP 456:

"""
...
However a fast function like DJBX33A is not as secure as SipHash24. A
cutoff at about 5 to 7 bytes should provide a decent safety margin and
speed up at the same time. The PEP's reference implementation provides
such a cutoff with Py_HASH_CUTOFF . The optimization is disabled by
default for several reasons. For one the security implications are
unclear yet and should be thoroughly studied before the optimization is
enabled by default. Secondly the performance benefits vary. On 64 bit
Linux system with Intel Core i7 multiple runs of Python's benchmark
suite [pybench] show an average speedups between 3% and 5% for
benchmarks such as django_v2, mako and etree with a cutoff of 7.
Benchmarks with X86 binaries and Windows X86_64 builds on the same
machine are a bit slower with small string optimization.

The state of small string optimization will be assessed during the beta
phase of Python 3.4. The feature will either be enabled with appropriate
values or the code will be removed before beta 2 is released.
"""

The mentioned speedup sounds like enabling this by default
would make a lot of sense (keeping the compile time option of
setting Py_HASH_CUTOFF to 0).

Aside: I haven't been following this since the days I proposed the
collision counting method as solution to the vulnerability, which was
rejected at the time. It's interesting how much effort went into
trying to fix the string hashing in dicts over the years. Yet the
much easier to use integer key hash collisions have still not
been addressed.
msg286610 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-02-01 10:14
On 01.02.2017 11:07, Serhiy Storchaka wrote:
> 
> The performance of hash algorithm shouldn't affect general benchmarks since hash value is cached inside string object. Almost all dict lookups in critical parts are lookups with interned strings. But in corner cases the difference can be measurable. We should look not on results of macrobenchmarks, but find worst cases.

Good point.
msg286617 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-01 12:03
Current Py_HASH_CUTOFF implementation seems weak.
```
        switch(len) {
            /* ((hash << 5) + hash) + *p == hash * 33 + *p */
            case 7: hash = ((hash << 5) + hash) + *p++; /* fallthrough */
...
            case 1: hash = ((hash << 5) + hash) + *p++; break;
            default:
                assert(0);
        }
        hash ^= len;
        hash ^= (Py_uhash_t) _Py_HashSecret.djbx33a.suffix;
        x = (Py_hash_t)hash;
```

HashSecret used at last.  Conflicting pair without secret conflict with secrets too.

```
>>> def djbx33a(bs):
...     h = 5381
...     for i in bs:
...         h = h*33+i
...     h ^= len(bs)
...     h ^~ secret
...     return h & 0xffff_ffff_ffff_ffff
...
>>> for i in [0, 3, 7]: # range(8)
...   for j in [0, 4, 7]: # range(8)
...     for k in [0, 5, 7]: # range(8)
...       print(i,j,k, djbx33a([7-i, 7-j+33*i, 7-k+33*j, 33*k]))
...
0 0 0 6381700318
0 0 5 6381700318
0 0 7 6381700318
0 4 0 6381700318
...
7 4 7 6381700318
7 7 0 6381700318
7 7 5 6381700318
7 7 7 6381700318
>>>
```

So there are 8^(Py_HASH_CUTOFF-1) conflicting sequence can be built.

Maybe, we should remove Py_HASH_CUTOFF completely?
msg286620 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-02-01 12:20
Py_HASH_CUTOFF was an experimental feature for low-performance platforms. On modern hardware the performance increase is marginal to non-existing. I planned to deprecate the feature in 3.6 but forgot. We can deprecate it now and remove it in either 3.7 or 3.8.
msg286621 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-02-01 12:28
On 01.02.2017 13:03, INADA Naoki wrote:
> Maybe, we should remove Py_HASH_CUTOFF completely?

I think we ought to look for a better hash algorithm
for short strings, e.g. a CRC based one.

Some interesting investigations on this:
http://www.orthogonal.com.au/computers/hashstrings/
http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

PS: It may also be wise not to use the hash randomization
with these, since the secret would leak. Again, collision
counting comes to mind, since for short strings it is much
more important to have a good bucket distribution than
crypto security to protect hash secrets :-)
msg286622 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-02-01 12:31
We can and should still use hash randomization for short strings, but a different key than for hash randomization for longer strings.
msg286623 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-01 12:41
Py_HASH_CUTOFF uses different secret from siphash already.
The problem is the secret doesn't affects to collision at all.

Attacker can produce large number of collision, without knowing the secret.

BTW, we have FNV already.  Let's survey about FNV-1 short string collisions.
msg286624 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-01 12:59
https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/

Murmur may be safe for <16 byte too.
msg286628 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-02-01 13:07
Unless somebody is able to show a real-world example with a considerable performance boost (>5%), I'm strictly against any kind of special casing for short strings. I don't want to trade security for performance. You might be able to persuade me to go for a more complex system, when you can both proof security and performance boost.
msg286727 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-02 00:56
OK, let's forget Py_HASH_CUTOFF for now and focus on SipHash-13.
msg287686 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-13 13:04
Christian Heimes: Could you create pull request?

https://github.com/tiran/cpython/tree/siphash13 is 404 now.
msg287688 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-02-13 13:08
The old repos has a different name, https://github.com/tiran/cpython-mirror

I'm busy with work at the moment. I'll get to the patch later.
msg288756 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-03-01 14:52
This issue is an optimization, but I still don't see any significant speedup :-) Would it be possible to write at least one microbenchmark showing a speedup? Maybe hash(<very long byte string>)?
msg288758 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-01 15:34
And needed microbenchmarks for different sizes, because the result should be relied on fitting the data in caches of different levels.
msg289094 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-03-06 11:23
microbench result: https://gist.github.com/methane/33c7b01c45ce23b67246f5ddaff9c9e7

Not significant ~ very little difference for small data.
For 4KB:
Median +- std dev: [python.default] 3.26 us +- 0.03 us -> [python.siphash13] 2.03 us +- 0.02 us: 1.60x faster (-38%)
History
Date User Action Args
2017-03-06 11:23:50inada.naokisetmessages: + msg289094
2017-03-01 15:34:44serhiy.storchakasetmessages: + msg288758
2017-03-01 14:52:53hayposetnosy: + haypo
messages: + msg288756
2017-02-13 13:08:01christian.heimessetmessages: + msg287688
2017-02-13 13:04:43inada.naokisetmessages: + msg287686
2017-02-02 00:56:54inada.naokisetmessages: + msg286727
2017-02-01 13:07:00christian.heimessetmessages: + msg286628
2017-02-01 12:59:44inada.naokisetmessages: + msg286624
2017-02-01 12:41:33inada.naokisetmessages: + msg286623
2017-02-01 12:31:11christian.heimessetmessages: + msg286622
2017-02-01 12:28:15lemburgsetmessages: + msg286621
2017-02-01 12:20:23christian.heimessetmessages: + msg286620
2017-02-01 12:03:44inada.naokisetmessages: + msg286617
2017-02-01 10:14:24lemburgsetmessages: + msg286610
2017-02-01 10:13:33lemburgsetmessages: + msg286609
2017-02-01 10:07:26serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg286608
2017-02-01 09:50:44inada.naokisetmessages: + msg286607
2017-02-01 09:45:24lemburgsetnosy: + lemburg
messages: + msg286606
2017-02-01 09:35:25christian.heimessetmessages: + msg286605
2017-02-01 09:24:45inada.naokisetmessages: + msg286602
2017-02-01 09:21:46inada.naokisetfiles: + siphash13.patch
keywords: + patch
messages: + msg286601
2017-02-01 09:14:58christian.heimessetmessages: + msg286600
2017-02-01 08:18:41inada.naokisetmessages: + msg286598
2017-02-01 07:31:54rhettingersetassignee: christian.heimes

messages: + msg286594
nosy: + rhettinger, christian.heimes
2017-02-01 07:19:02inada.naokisettype: performance
components: + Interpreter Core
2017-02-01 07:02:05inada.naokicreate