Issue 9685: tuples should remember their hash value

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53894

classification

Title:	tuples should remember their hash value
Type:	performance	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.4

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	Anthony Sottile, benjamin.peterson, christian.heimes, dtorp, georg.brandl, mark.dickinson, meador.inge, python-dev, rhettinger, serhiy.storchaka, tim.peters
Priority:	low	Keywords:	easy, patch

Created on 2010-08-25 19:33 by dtorp, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
tuplehash.patch	christian.heimes, 2013-01-07 09:10		review

Messages (14)
msg114929 - (view)	Author: David Albert Torpey (dtorp)	Date: 2010-08-25 19:33
Dictionary keys are commonly numbers, strings, or tuples. Python has optimized numbers and strings to remember their hash values on successive calls. Tuples should do this too since their recursive hash function can take a long time to compute. Tuples are Python's official record type and the one obvious way of making non-scalar dictionary keys. The code to do this in stringobject.c is short and sweet, so this major speed boost should be an easy thing to. static long string_hash(PyStringObject a) { register Py_ssize_t len; register unsigned char p; register long x; if (a->ob_shash != -1) <== return a->ob_shash; <== len = Py_SIZE(a); p = (unsigned char ) a->ob_sval; x = p << 7; while (--len >= 0) x = (1000003x) ^ p++; x ^= Py_SIZE(a); if (x == -1) <== x = -2; <== a->ob_shash = x; return x; } The code in tupleobject.c would just need to add the four lines marked above. Here's what is looks like now. static long tuplehash(PyTupleObject v) { register long x, y; register Py_ssize_t len = Py_SIZE(v); register PyObject p; long mult = 1000003L; x = 0x345678L; p = v->ob_item; while (--len >= 0) { y = PyObject_Hash(p++); if (y == -1) return -1; x = (x ^ y) * mult; /* the cast might truncate len; that doesn't change hash stability / mult += (long)(82520L + len + len); } x += 97531L; if (x == -1) x = -2; return x; } Thank you guys for all of your work. David
msg114930 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2010-08-25 19:36
This seems reasonable. Will look at it in the next few days.
msg114932 - (view)	Author: Tim Peters (tim.peters) *	Date: 2010-08-25 20:00
- Tuple objects don't currently reserve space to store their hash code, so it's likely this would increase the size of every tuple. - It's unclear to me which natural use patterns would actually enjoy a major speed boost. Note that dicts remember the hash codes of keys already, regardless of whether the key type remembers them too. A tuple is typically constructed right before being used in a dict lookup, so for a one-shot use no time would be saved. If the new tuple is used in multiple dict lookups, sure - but is that common?
msg114936 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-08-25 21:37
FWIW, I'm -1 on this without a demonstrable improvement on some real-world cases.
msg114937 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2010-08-25 22:04
Hello Tim! If you have a chance, please also take a look at issue9685 which I was planning to work on in the next couple of weeks. For memoizing tuple hashes, I'm inclined to think the one extra field is worth it. That would help all the cases where people are concerned about double accesses to dicts in a look-before-you-leap pattern or for a pattern of fetch-item-update-value-store-new-item. It looks like the code for collections.OrderedDict() would benefit because it does multiple lookups and stores on the same key: http://svn.python.org/view/python/branches/release27-maint/Lib/collections.py?revision=84148&view=markup It would also help the multiple lookups and stores in caching code such as that at http://code.activestate.com/recipes/498245-lru-and-lfu-cache-decorators I suppose we could prepare a patch, instrument it, and try it with Twisted, SQLalchemy, and Django to find-out how many tuple hash calculations would be saved by memoizing.
msg114939 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-08-25 22:14
2010/8/25 Raymond Hettinger <report@bugs.python.org>: > I suppose we could prepare a patch, instrument it, and try it with Twisted, SQLalchemy, and Django to find-out how many tuple hash calculations would be saved by memoizing. You should also check memory usage.
msg179202 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-01-06 18:32
Sorry, Raymond. It was a bad day for Roundup.
msg179235 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-01-06 22:38
My apologies once again, Raymond. I mistakenly thought that I unassigned the issue from you (it was a Roundup bug at this day). As for the issue, I totally agree with Tim.
msg179248 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-01-07 08:40
Given the responses so far, I suggest closing this as rejected.
msg179250 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2013-01-07 09:10
I'm not too worried about the slightly increased memory usage. For example one of our largest application instances consumes about 8 GB memory right now. It has just about 22k tuples in gc.get_objects(). An additional Py_hash_t in tuple's struct would increase the memory usage by less than 200kB. I've attached a simple patch.
msg179254 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2013-01-07 09:59
Still, actual benefits in some kind of benchmark will be needed to show that this is not a premature optimization.
msg179277 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2013-01-07 19:27
Benchmark doesn't show any serious improvements. Some test cases are even slower. Report on Linux freimann 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:42:16 UTC 2012 x86_64 x86_64 Total CPU cores: 6 ### call_simple ### Min: 0.201824 -> 0.208248: 1.03x slower Avg: 0.210608 -> 0.217300: 1.03x slower Significant (t=-2.23) Stddev: 0.00818 -> 0.00829: 1.0134x larger Timeline: b'http://tinyurl.com/axqoqp4' ### go ### Min: 0.534641 -> 0.550004: 1.03x slower Avg: 0.537874 -> 0.552782: 1.03x slower Significant (t=-11.89) Stddev: 0.00184 -> 0.00211: 1.1495x larger Timeline: b'http://tinyurl.com/b5k3ua4' ### pathlib ### Min: 0.121589 -> 0.117025: 1.04x faster Avg: 0.126679 -> 0.122279: 1.04x faster Significant (t=3.64) Stddev: 0.00429 -> 0.00427: 1.0048x smaller Timeline: b'http://tinyurl.com/acbb69o' ### spectral_norm ### Min: 0.280749 -> 0.305213: 1.09x slower Avg: 0.281194 -> 0.305390: 1.09x slower Significant (t=-120.69) Stddev: 0.00044 -> 0.00011: 4.1101x smaller Timeline: b'http://tinyurl.com/awyeejp' The following not significant results are hidden, use -v to show them: call_method, call_method_slots, call_method_unknown, chaos, fannkuch, fastpickle, fastunpickle, float, formatted_logging, iterative_count, json_dump_v2, json_load, meteor_contest, nbody, normal_startup, nqueens, pidigits, raytrace, regex_compile, regex_effbot, regex_v8, richards, silent_logging, simple_logging, startup_nosite, telco, threaded_count, unpack_sequence.
msg179281 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-01-07 20:24
New changeset 17038de56fd4 by Christian Heimes in branch 'default': Add a comment about not caching the hash value. Issue #9685 suggested to memorize the hash value, but the feature request was rejected because no speed ups were found. http://hg.python.org/cpython/rev/17038de56fd4
msg364258 - (view)	Author: Anthony Sottile (Anthony Sottile) *	Date: 2020-03-15 20:46
hit this today unfortunately -- I'm working with some pretty complex (and nested) `typing.NamedTuple` objects and the lack of caching here results in quite the slowdown (in my macro benchmarks it's the difference between a render pass taking 180ms and 25ms) the class in question is being used as a cache key: https://github.com/asottile/babi/blob/1be4e80eddc1bff0eb8047cc89337fdf006ad148/babi/highlight.py#L217 with the hash cached: ``` μs event 123744 startup 27833 kEND5 2859 ^X ``` without the hash cached: ``` μs event 122575 startup 180415 kEND5 3725 ^X ``` (my hash cache of course being slower than it could be in C) ``` @@ -214,10 +214,21 @@ class Region(NamedTuple): scope: Scope +_state_hash_cache = {} + + class State(NamedTuple): entries: Tuple['Entry', ...] while_stack: Tuple[Tuple['WhileRule', int], ...] + def __hash__(self): + k = id(self) + try: + return _state_hash_cache[k] + except KeyError: + ret = _state_hash_cache[k] = super().__hash__() + return ret + @classmethod def root(cls, entry: 'Entry') -> 'State': return cls((entry,), ()) ``` ___ I get that this is a pretty extreme case and it's unlikely to change the resolution of this issue, but figured I'd point it out in case anyone else is hitting similar issues

History
Date	User	Action	Args
2022-04-11 14:57:05	admin	set	github: 53894
2020-03-15 20:46:20	Anthony Sottile	set	nosy: + Anthony Sottile messages: + msg364258
2013-01-07 20:24:29	python-dev	set	nosy: + python-dev messages: + msg179281
2013-01-07 19:35:41	benjamin.peterson	set	status: open -> closed resolution: rejected
2013-01-07 19:27:45	christian.heimes	set	messages: + msg179277
2013-01-07 09:59:45	georg.brandl	set	messages: + msg179254
2013-01-07 09:10:28	christian.heimes	set	files: + tuplehash.patch nosy: + christian.heimes messages: + msg179250 keywords: + patch stage: needs patch -> patch review
2013-01-07 08:40:53	mark.dickinson	set	nosy: + mark.dickinson messages: + msg179248
2013-01-06 22:38:11	serhiy.storchaka	set	messages: + msg179235
2013-01-06 21:29:17	rhettinger	set	assignee: rhettinger ->
2013-01-06 18:32:22	serhiy.storchaka	set	assignee: rhettinger messages: + msg179202 nosy: + serhiy.storchaka
2013-01-06 17:53:44	georg.brandl	set	nosy: + georg.brandl
2013-01-05 21:47:47	meador.inge	set	nosy: + meador.inge
2013-01-04 15:37:26	serhiy.storchaka	set	type: resource usage -> performance components: + Interpreter Core versions: + Python 3.4, - Python 3.2
2010-10-31 16:28:58	rhettinger	set	assignee: rhettinger -> (no value)
2010-08-25 22:14:50	benjamin.peterson	set	messages: + msg114939
2010-08-25 22:04:14	rhettinger	set	messages: + msg114937
2010-08-25 21:37:33	benjamin.peterson	set	nosy: + benjamin.peterson messages: + msg114936
2010-08-25 20:00:02	tim.peters	set	nosy: + tim.peters messages: + msg114932
2010-08-25 19:36:11	rhettinger	set	priority: normal -> low assignee: rhettinger versions: + Python 3.2, - Python 2.6 keywords: + easy nosy: + rhettinger messages: + msg114930 stage: needs patch
2010-08-25 19:33:15	dtorp	create