classification
Title: Slightly faster set lookup
Type: performance Stage: patch review
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: josh.r, pitrou, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-01-20 10:39 by serhiy.storchaka, last changed 2015-01-25 19:55 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
set_faster_lookup.patch serhiy.storchaka, 2015-01-20 10:39 review
Messages (5)
msg234367 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-20 10:39
Currently set_lookkey() first tests entry->key == NULL, then entry->hash == hash and entry->key != dummy, and only after that entry->key == key. Proposed patch optimizes the order of comparisons. entry->key == key is tested first as for dicts. And no need to test entry->key != dummy after entry->hash == hash if entry->hash of dummy key is set to -1.

Microbenchmark which demonstrates the best case (a lot of lookups of keys existing in the set).

$ ./python -m timeit -s "a = list(range(10**6)); s1 = set(a); s2 = set(a)" -- "s1 <= s2"

Unpatched: 10 loops, best of 3: 39.4 msec per loop
Patched: 10 loops, best of 3: 35.3 msec per loop
msg234553 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2015-01-23 11:57
Does it slow down other cases? Seems to me like dictionaries would have more "existing key lookups" than sets to justify the optimization, since they're used for attribute lookup and the like, and because you usually want the value associated with existing keys, where a set would usually be used in a "might exist, might not" scenario of membership testing.
msg234555 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-23 14:00
Good point Josh. Yes, it may slow down other cases, and there is a difference between sets and dicts.

1. If the key is not contained in the set, then if the first tested entry table[hash & mask] is empty, then the lookup is slowed down by one pointer comparison (2 comparisons instead of 1). If the entry is used then the lookup is not slow downed or even speed up (if need to test more than one additional entries).

2. If the same case is in the set, then if the first tested entry is the set then the lookup will use one comparison instead of 4. In other cases (the key is placed in other entry) the lookup will use at least one comparison less.

3. If the set contains key equal but not identical to tested key, then the lookup will use at least one comparison less.

Only first case can cause slowdown. If we test a lot of keys not existing in the set with small ratio between used and allocated entries numbers. May be the worst case is creating a copy of the set. For other operations I did not find significant difference.

$ ./python -m timeit -s "s = set(range(10**4))" -- "frozenset(s)"
Unpatched: 1000 loops, best of 3: 658 usec per loop
Patched: 1000 loops, best of 3: 668 usec per loop

But issue23290 prevents this regression.
msg234679 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-01-25 18:33
I had intentionally moved the entry->key == key test to be after the hash test.  Unlike dicts where exact key matches are norm, sets are used for deduping (cased where keys are equal but not dups).

By moving the test inside the entry->key == hash test, we reduce the number of comparisons per loop for collisions and increase the predictability of the entry->key == key branch (once the hash matches, the chance of an equality match is very high.

The one part I'm looking at changing is moving the dummy test after the identity key test.  However, the dummy test inside the hash test is perfectly predictable (it basicly never matches) and may be cost free.
msg234683 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-25 19:55
What about just removing the dummy test inside the hash test? The benefit is smaller but statistically sygnificant and always positive.

$ ./python -m timeit -s "a = list(range(10**6)); s1 = set(a); s2 = set(a)" -- "s1 <= s2"

Unpatched: 10 loops, best of 3: 39.3 msec per loop
Patched: 10 loops, best of 3: 38.7 msec per loop
History
Date User Action Args
2015-01-25 19:55:50serhiy.storchakasetmessages: + msg234683
2015-01-25 18:33:17rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg234679
2015-01-23 14:00:59serhiy.storchakasetmessages: + msg234555
2015-01-23 11:57:16josh.rsetnosy: + josh.r
messages: + msg234553
2015-01-23 01:42:24rhettingersetassignee: rhettinger
2015-01-20 10:39:16serhiy.storchakacreate