This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Kevin.Norris
Recipients Kevin.Norris, docs@python
Date 2016-10-07.04:37:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1475815043.27.0.00299962010674.issue28383@psf.upfronthosting.co.za>
In-reply-to
Content
The documentation for __hash__ contains this text:

"The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects."

The recommendation of "using exclusive or" is likely to result in programmers naively doing this:

def __hash__(self):
    return hash(self.thing1) ^ hash(self.thing2) ^ hash(self.thing3)

In the event that (say) self.thing1 and self.thing2 have almost or exactly the same hash (with "almost" referring to bitwise differences rather than integral distance), this wipes out most or all of the entropy from both values and greatly increases the likelihood of hash collisions.  Indeed, Python's own tuple type does not do this (while it does use XOR, it also does some other math to ensure the bits are as mixed up as is practical).[1]

Because the correct algorithm is both nontrivial to implement and already exists in the tuple type's __hash__, I propose that the documentation be updated to recommend something like the following:

def __hash__(self):
    return hash((self.thing1, self.thing2, self.thing3))

One possible wording:

"The only required property is that objects which compare equal have the same hash value; it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple: [code example]"

[1]: https://hg.python.org/cpython/file/fca5c4a63251/Objects/tupleobject.c#l348
History
Date User Action Args
2016-10-07 04:37:23Kevin.Norrissetrecipients: + Kevin.Norris, docs@python
2016-10-07 04:37:23Kevin.Norrissetmessageid: <1475815043.27.0.00299962010674.issue28383@psf.upfronthosting.co.za>
2016-10-07 04:37:23Kevin.Norrislinkissue28383 messages
2016-10-07 04:37:22Kevin.Norriscreate