Issue9685
Created on 2010-08-25 19:33 by dtorp, last changed 2010-10-31 16:28 by rhettinger.
| Messages (6) | |||
|---|---|---|---|
| msg114929 - (view) | Author: David Albert Torpey (dtorp) | Date: 2010-08-25 19:33 | |
Dictionary keys are commonly numbers, strings, or tuples. Python has optimized numbers and strings to remember their hash values on successive calls. Tuples should do this too since their recursive hash function can take a long time to compute. Tuples are Python's official record type and the one obvious way of making non-scalar dictionary keys.
The code to do this in stringobject.c is short and sweet, so this major speed boost should be an easy thing to.
static long
string_hash(PyStringObject *a)
{
register Py_ssize_t len;
register unsigned char *p;
register long x;
if (a->ob_shash != -1) <==
return a->ob_shash; <==
len = Py_SIZE(a);
p = (unsigned char *) a->ob_sval;
x = *p << 7;
while (--len >= 0)
x = (1000003*x) ^ *p++;
x ^= Py_SIZE(a);
if (x == -1) <==
x = -2; <==
a->ob_shash = x;
return x;
}
The code in tupleobject.c would just need to add the four lines marked above. Here's what is looks like now.
static long
tuplehash(PyTupleObject *v)
{
register long x, y;
register Py_ssize_t len = Py_SIZE(v);
register PyObject **p;
long mult = 1000003L;
x = 0x345678L;
p = v->ob_item;
while (--len >= 0) {
y = PyObject_Hash(*p++);
if (y == -1)
return -1;
x = (x ^ y) * mult;
/* the cast might truncate len; that doesn't change hash stability */
mult += (long)(82520L + len + len);
}
x += 97531L;
if (x == -1)
x = -2;
return x;
}
Thank you guys for all of your work.
*David
|
|||
| msg114930 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2010-08-25 19:36 | |
This seems reasonable. Will look at it in the next few days. |
|||
| msg114932 - (view) | Author: Tim Peters (tim_one) * ![]() |
Date: 2010-08-25 20:00 | |
- Tuple objects don't currently reserve space to store their hash code, so it's likely this would increase the size of every tuple. - It's unclear to me which natural use patterns would actually enjoy a major speed boost. Note that dicts remember the hash codes of keys already, regardless of whether the key type remembers them too. A tuple is typically constructed right before being used in a dict lookup, so for a one-shot use no time would be saved. If the new tuple is used in multiple dict lookups, sure - but is that common? |
|||
| msg114936 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2010-08-25 21:37 | |
FWIW, I'm -1 on this without a demonstrable improvement on some real-world cases. |
|||
| msg114937 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2010-08-25 22:04 | |
Hello Tim! If you have a chance, please also take a look at issue9685 which I was planning to work on in the next couple of weeks. For memoizing tuple hashes, I'm inclined to think the one extra field is worth it. That would help all the cases where people are concerned about double accesses to dicts in a look-before-you-leap pattern or for a pattern of fetch-item-update-value-store-new-item. It looks like the code for collections.OrderedDict() would benefit because it does multiple lookups and stores on the same key: http://svn.python.org/view/python/branches/release27-maint/Lib/collections.py?revision=84148&view=markup It would also help the multiple lookups and stores in caching code such as that at http://code.activestate.com/recipes/498245-lru-and-lfu-cache-decorators I suppose we could prepare a patch, instrument it, and try it with Twisted, SQLalchemy, and Django to find-out how many tuple hash calculations would be saved by memoizing. |
|||
| msg114939 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2010-08-25 22:14 | |
2010/8/25 Raymond Hettinger <report@bugs.python.org>: > I suppose we could prepare a patch, instrument it, and try it with Twisted, SQLalchemy, and Django to find-out how many tuple hash calculations would be saved by memoizing. You should also check memory usage. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2010-10-31 16:28:58 | rhettinger | set | assignee: rhettinger -> |
| 2010-08-25 22:14:50 | benjamin.peterson | set | messages: + msg114939 |
| 2010-08-25 22:04:14 | rhettinger | set | messages: + msg114937 |
| 2010-08-25 21:37:33 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg114936 |
| 2010-08-25 20:00:02 | tim_one | set | nosy:
+ tim_one messages: + msg114932 |
| 2010-08-25 19:36:11 | rhettinger | set | nosy: + rhettinger versions: + Python 3.2, - Python 2.6 messages: + msg114930 priority: normal -> low assignee: rhettinger keywords: + easy stage: needs patch |
| 2010-08-25 19:33:15 | dtorp | create | |
