Issue 5169: Default hash not equal to id on AMD Sempron

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49419

classification

Title:	Default hash not equal to id on AMD Sempron
Type:	behavior	Stage:
Components:	Interpreter Core	Versions:	Python 2.5

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:	Reduce hash collisions for objects with no __hash__ method View: 5186
Assigned To:	tim.peters	Nosy List:	chemacortes, jcea, mark.dickinson, pitrou, rhettinger, terry.reedy, tim.peters
Priority:	normal	Keywords:

Created on 2009-02-06 14:16 by chemacortes, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (12)
msg81269 - (view)	Author: Chema Cortés (chemacortes)	Date: 2009-02-06 14:16
Sometimes, the default hash for user-defined object is not equal to the id of the object: In [1]: class A: ...: pass In [2]: a=A() In [3]: id(a),hash(a) Out[3]: (3082955212L, -1212012084) The test box has an AMD Sempron, a 64bit CPU archictecture emulating a 32bit one. This following relation can be deduced: hash(a)=id(a)-2**32
msg81270 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-02-06 15:50
I wouldn't qualify this as a bug. hash() doesn't need to be equal to the id() even in the default case. Actually, it may be better for hash() to be equal to id()/4 or id()/8, depending on the standard alignment of the memory allocator.
msg81277 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-02-06 16:24
It looks like this is a platform with sizeof(long) == 4 and sizeof(void *) == 8. Is that right? As Antoine says, I can't see any problem here. Why do you think that hash(a) should be equal to id(a) in this case? Antoine, in what way would id()/4 be better than id()?
msg81282 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-02-06 16:53
Because with hash() == id() == address of the PyObject, the hash is always a multiple of 4 or 8 (I think it's 8), so (hash() % dict_or_set_table_size) is non-uniformly distributed in most cases.
msg81283 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-02-06 17:06
Hah. Good point. I'd forgotten about the taking-the-low-order-bits thing. Should probably do some timings (and possibly also number-of- collisions measurements) to find out whether using id() >> 3 actually makes any significant difference (either way) for dicts of objects.
msg81292 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-02-06 19:21
Some preliminary timings indicate that it may well be worth replacing 'return (long)p' with 'return (long)p >> 3' in _Py_HashPointer (in Objects/object.c): I'm getting a 10% speedup in dict-building and dict-lookup for dicts of plain objects. I'll open a separate issue for this and post details of the timings. In the meantime, I think this issue can be closed as invalid: there's no reason that id(a) and hash(a) have to be equal. (chemacortes, if you disagree then please do comment; we'll still see the comments even though the issue is closed).
msg81307 - (view)	Author: Jesús Cea Avión (jcea) *	Date: 2009-02-06 22:09
The issue is trivially reproductible in any 32 bits platform, simply allocating objects until you go up the 2GB mark. Since __hash__() wants to take advantage of every bit in a 32 bit platform, and we don't have unsigned integers in python, I vote for "invalid" too. There is no promise of "id(obj)==hash(obj)": you can overload "__hash__()" anytime, and that is already done for strings, integers, tuples, etc. The speed advantage is interesting, though.
msg81333 - (view)	Author: Chema Cortés (chemacortes)	Date: 2009-02-07 04:41
I also agree to close this bug as invalid. Indeed, there is not any reason to make equal id(a) and hash(a), but the description of "hashable" object from the documentation (but this is a different issue). 'hash' and 'id' returns the same-wordsize integer (32bit): 'id' as unsigned long (Py_uintptr_t), and 'hash' as signed long casted from 'id'. Thanks for your time, Chema
msg81480 - (view)	Author: Jesús Cea Avión (jcea) *	Date: 2009-02-09 19:55
Marc, please post the bugid for the "hash>>3" issue. It is interesting enough to pursue it.
msg81481 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-02-09 20:03
See issue 5186 for using id()/8 for the hash.
msg81482 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2009-02-09 20:06
Tim, any thoughts?
msg81964 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2009-02-13 19:13
Chema, thank you for bringing this semi-conscious assumption to light. Intentionally breaking it significantly speeds up set and dict creation.

History
Date	User	Action	Args
2022-04-11 14:56:45	admin	set	github: 49419
2009-02-13 19:13:14	terry.reedy	set	status: open -> closed resolution: not a bug superseder: Reduce hash collisions for objects with no __hash__ method messages: + msg81964 nosy: + terry.reedy
2009-02-09 20:06:40	rhettinger	set	assignee: tim.peters messages: + msg81482 nosy: + tim.peters
2009-02-09 20:03:33	mark.dickinson	set	messages: + msg81481
2009-02-09 19:55:36	jcea	set	status: closed -> open resolution: not a bug -> (no value) messages: + msg81480
2009-02-07 04:41:31	chemacortes	set	messages: + msg81333
2009-02-06 22:09:20	jcea	set	messages: + msg81307
2009-02-06 19:21:06	mark.dickinson	set	status: open -> closed resolution: not a bug messages: + msg81292
2009-02-06 17:06:54	mark.dickinson	set	messages: + msg81283
2009-02-06 16:53:35	pitrou	set	messages: + msg81282
2009-02-06 16:24:18	mark.dickinson	set	nosy: + mark.dickinson messages: + msg81277
2009-02-06 15:50:06	pitrou	set	nosy: + rhettinger, pitrou messages: + msg81270
2009-02-06 14:16:26	chemacortes	create