classification
Title: __eq__ / __hash__ check doesn't take inheritance into account
Type: behavior Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ncoghlan Nosy List: amaury.forgeotdarc, barry, benjamin.peterson, eikeon, glyph, gvanrossum, jek, ncoghlan, rhettinger, schmir
Priority: critical Keywords: patch

Created on 2008-03-04 19:49 by jek, last changed 2008-08-31 13:22 by ncoghlan. This issue is now closed.

Files
File name Uploaded Description Edit
inherit_hash.patch amaury.forgeotdarc, 2008-03-05 00:00 restore previous behavior of __hash__ inheritance
inherit_hash2.patch amaury.forgeotdarc, 2008-03-05 00:48
inherit_hash3.patch amaury.forgeotdarc, 2008-04-10 23:44
issue2235_fix_hash_inheritance.diff ncoghlan, 2008-06-29 12:25 Patch to allow classes to explicitly block inheritance of tp_hash
issue2235_fix_hash_inheritance_with_function_ptr.diff ncoghlan, 2008-07-06 07:52 Block inheritance with PyObject_HashNotImplemented
Messages (37)
msg63258 - (view) Author: jason kirtland (jek) Date: 2008-03-04 19:49
In 2.6a, seems like the __hash__ implementation and __eq__ must be
defined together, in the same class.  See also #1549.  Ensuring that a
__hash__ implementation isn't being pulled from a builtin type is
probably a sufficient check...?

>>> class Base(object):
...     def __init__(self, name):
...         self.name = name
...     def __eq__(self, other):
...         return self.name == other.name
...     def __hash__(self):
...         return hash(self.name)
... 
>>> class Extended(Base):
...     def __eq__(self, other):
...         print "trace: __eq__ called"
...         return Base.__eq__(self, other)
... 
>>> hash(Base('b1'))
603887253
>>> hash(Extended('e1'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Extended'
msg63263 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-03-04 21:56
This issue is similar to issue1549. Note that only new-style classes are
concerned.

I think the change was intentional. Guido, do you confirm?

However there should be some documentation about it, a NEWS entry or an
item in the "porting to 2.6" section.

The initial modification appeared in the py3k branch (r51454):
"""
Change the way __hash__ is inherited; when __eq__ or __cmp__ is
overridden but __hash__ is not, set __hash__ explicitly to None (and
tp_hash to NULL).
"""
msg63264 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-03-04 22:15
Wouldn't you expect this sort of thing to break code?  Does it meet the
criteria for backporting to 2.6?
msg63265 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-03-04 22:17
The py3k feature was intentional.

I'm not sure who did the backport (could've been me, long ago).  I think
the backport should be replaced with a warning that is only issued when
-3 is given.
msg63269 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-03-05 00:00
The attached patch reverts r59576 and the part of r59106 about the
tp_hash slot.

It also adds the py3k warning::
   type defines __eq__ but not __hash__, and will not be hashable in 3.x
printed when calling hash() on such an object.
msg63270 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-03-05 00:05
I noticed that my patch uses Py_TYPE(self)->tp_hash, whereas normal
processing of slot_tp_hash() uses lookup_method(self,"__hash__",hash_str).

I am almost sure that both expressions return the same value.
Is this correct? Py_TYPE(self)->tp_hash seems much faster.
msg63271 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-03-05 00:12
>  I noticed that my patch uses Py_TYPE(self)->tp_hash, whereas normal
>  processing of slot_tp_hash() uses lookup_method(self,"__hash__",hash_str).
>
>  I am almost sure that both expressions return the same value.
>  Is this correct? Py_TYPE(self)->tp_hash seems much faster.

HERE BE DRAGONS

There are cases where Py_TYPE(self)->tp_hash is the function currently
executing (some wrapper(s) in typeobject.c). OTOH, lookup_method(...)
finds the Python code in the class __dict__s along the MRO.
msg63272 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-03-05 00:48
Ah, I remember that it took me some time to understand the boundaries
between a slot and the corresponding special method.

Here is another version of the patch, which does not test tp_hash while
we are currently running the tp_hash function...
I also added the warning for old-style classes.
msg65297 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-04-10 16:46
Hi Amaury, thanks for your work on this. I applied your patch and looked
at the code but there seem to be some issues left still.

(1) I don't think you need to add a warning to classic classes that
define __eq__ without __hash__ -- these aren't hashable and never were.
 The problem is purely with new classes AFAICT.

(2) I can't seen to trigger the warning for new classes (and yes, I
specified -3 on the command line :-):

>>> class C(object):
...   def __eq__(self, other): return False
...
>>> hash(C())
-134976284

(3) test_Hashable in test_collections.py fails.  This is because the
Hashable class believes that list, set, and dict are hashable. 
Something odd is going on:

>>> list.__hash__([])
-135100116

but

>>> hash([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Note that in 2.5, both raise TypeError.
msg65298 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-04-10 16:47
Let's try to have this resolved before the next release.

Also let's try not to accidentally merge the 2.6 changes related to this
issue into 3.0. :-)
msg65299 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-04-10 16:51
Note: some more tests also fail:

test_descr, test_quopri, test_set
msg65329 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-04-10 23:44
Here is a new patch, all tests pass.
- no warning for classic classes
- added tests for -3 warnings

I had to add two TPSLOT entries to the slodefs list; can you please
check that this is correct (I wanted to reset tp_hash to NULL when
__eq__ or __cmp__ are defined).
msg66399 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-05-08 03:43
Guido, can you comment on Amaury's latest patch?  I'm going to bump this
back to critical so as not to hold up 2.6 alpha, but it should be marked
as a release blocker for the first beta.
msg68792 - (view) Author: Glyph Lefkowitz (glyph) Date: 2008-06-26 16:50
As barry said, this should have been a release blocker for the first
beta... ;-)
msg68795 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-06-26 17:06
I'm going to have to ask someone else to look at this code.  I am too
busy with too many things to be able to take on a detailed code review.
msg68828 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-06-27 13:23
Amaury's patch doesn't currently remove the new-in-2.6 code that turns
"tp_hash = NULL" at the C level into "__hash__ = None" at the Python
level. I suspect that will prove to be a problem (and may be the cause
of Django's woes, if I remember Glyph's python-dev posts correctly). I
also don't believe the new TPSLOT entries should be necessary - we
should be able to just revert all of this tp_hash related code to match
the 2.5 implementation, and then see if we can find a way to emit the
Py3k warning when __cmp__ or __eq__ are overridden without overriding
__hash__ without breaking any existing code (if we can't, we may have to
figure out a way to get 2to3 to check for it).

P.S. you'd think I would have learnt by now not to try to make sense of
typeobject.c when I'm tired ;)
msg68839 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-06-27 18:05
Thanks for giving this some time. I think that backwards compatibility
should be a higher priority than being able to issue -3 warnings -- if
the warnings can't be generated, too bad, we'll just have to document
it. (I don't think that checking things at class scope is an easy task
in 2to3, though I may be underestimating it.)

So, concluding, insofar as the proposal is to revert 2.6 to the 2.5
semantics where it comes to __eq__ and __hash__, I'm okay with that if
there's no other way to maintain backwards compatibility, even though my
preference would be to still flag this when -3 is specified.  (There may
be similar backwards compatibility issues caused by stricted arg
checking in __new__ and __init__ -- the same remarks apply there.)
msg68866 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-06-28 00:56
The _new__/__init__ stuff showed up on my diff as well (obviously) -
that seemed to be doing a pretty good job of maintaining backwards
compatibility.

I'll dig into this further today and tomorrow - I'll probably start by
reverting the relevant code back to exactly what is in 2.5, and then see
how far I can get with adding the warning code back in while still
keeping jek's test case working correctly.
msg68937 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-06-29 08:34
Well, I think I figured out why the __hash__ changes were backported
from Py3k: without them, the existence of object.__hash__ makes the
Hashable ABC completely useless (every newstyle class meets it).

I see two options here. Option 1 is to revert the changes so __hash__
behave entirely as it did in 2.5 and also delete collections.Hashable
from 2.6 (which may break other backported Py3k items). Doesn't seem
like a particularly good idea.

Option 2 is what I have implemented locally (and will be uploading
soon): changing typeobject.c to allow Py_None to be stored in type
method slots, and have it convert correctly to None at the Python level.
Classes that don't want to inherit the default object.tp_hash
implementation can then write "__hash__ = None" or "(hashfunc)Py_None"
to block the inheritance.
msg68948 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-06-29 12:25
Attached patch allows classes to explicitly block inheritance of
object.__hash__ by setting the tp_hash slot to Py_None or the __hash__
attribute to None.

This is similar to the approach used in Py3k, but allows __hash__ to be
inherited by default, with classes explicitly disabling it as
appropriate. I'd also suggest forward porting this approach to Py3k as
well, and then possibly look at replacing a copied tp_hash slot with
Py_None when the inherited tp_hash is object.tp_hash, but tp_cmp or
tp_richcompare are different.
msg68949 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-06-29 13:30
Unassigning for the moment - I'll take a look at adding the Py3k warning
back in at some point, but the main thing I would like now is a sanity
check from Guido or Barry (or anyone else happy to dig into the guts of
typeobject.c) that the basic idea behind my fix is sound.

It would be nice to avoid the explicit check for Py_None in
PyObject_Compare by getting update_one_slot() to replace the Py_None it
encounters in __hash__ with the default slot_tp_hash implementation, but
my attempts at doing that have all led to segfaults.

(Also, glyph, I'd love to know if my patch helps to clear up Django's
issues with 2.6b1)
msg69096 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-07-02 14:42
Suggestion from GvR (one I like): instead of re-using Py_None, add a new
C function that is stored in the tp_hash slot as a sentinel instead of
the Py_None value used in the posted version of the patch. This will
avoid breaking code that just checks for NULL before calling the tp_hash
slot directly, while still allowing typeobject.c to detect that the
object isn't actually hashable despite the presence of a non-NULL value
in the tp_hash slot.

(I'll keep the None at the Python level though, since that matches the
Py3k behaviour and plays nicely with collections.Hashable)
msg69101 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-02 15:15
I don't like the changes in listobject.c and dictobject.c: they seem to
imply that all unhashable types must be modified when upgrading to 2.6.
msg69129 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-07-02 21:43
Unhashable types that implement an "I'm not hashable" __hash__ method
will indeed require modification in 2.6 in order to avoid incorrectly
passing an "isinstance(obj, collections.Hashable)" check (note that
several of the mutable standard library types such as collections.deque
are incorrectly detected as hashable in the current SVN trunk).
msg69137 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-02 22:32
Isn't that a serious compatibility issue? 2.5 code should work on 2.6
with minimal effort.


For deque objects, I don't get your point:

Python 2.6b1+ (trunk, Jul  1 2008, 22:35:48) [MSC v.1500 32 bit Intel)]
on win32
>>> from collections import deque
>>> hash(deque())
TypeError: deque objects are unhashable
msg69190 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-07-03 10:50
As far as deque goes, the following behaviour on the trunk is the
problem I am trying to fix:

Python 2.6b1+ (trunk:64655, Jul  2 2008, 22:48:24)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from collections import deque, Hashable
>>> isinstance(deque(), Hashable)
True

All of the container types that my patch touches were already unhashable
in 2.5 - my patch just ensures that they all correctly return false for
isinstance(obj, collections.Hashable) even after we make object.__hash__
inherited by default again. This is also the reason why simply reverting
the trunk hash implementation to the 2.5 behaviour (which is what I
first tried) is inadequate.

Since collections.Hashable is new in 2.6, I can live with it returning
the wrong answer for types which define __hash__ to always raise an
error (that's a known limitation of the ABC system, even in Py3k).
However, we should at least make sure it returns the correct answer for
all of the types in the standard library.
msg69324 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-07-06 07:52
Attached updated patch which implements Guido's suggestion from above
(inheritance of __hash__ is blocked at the C level by setting the slot
to PyObject_HashNotImplemented and at the Python level by setting
__hash__ = None).

All tests which don't depend on a -u resource flag still pass (currently
running a -uall set of tests as well).
msg69325 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-07-06 08:09
Two failures in the -uall run (test_codecmaps_hk, test_linuxaudiodev).
However, I see those test failures even after reverting my changes, so
they aren't related to this.
msg69691 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-07-15 15:49
Fixed for 2.6 in r64962 and the underlying PyObject_HashNotImplemented
mechanic forward ported to 3.0 in r64967.

Still need to update the C API documentation and the library reference,
as well as implement the Py3k warning in 2.6, but I won't get to those
for this beta - dropping priority to deferred blocker.
msg70748 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-08-05 16:06
How's this going?
msg70777 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-08-06 09:44
The documentation and Py3k warning will be ready for the third beta next
week (the remaining part is a lot easier than the initial fix).
msg71021 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-08-11 15:48
Py3k warnings for this have been checked in on the trunk as r65642.

There's an unfortunate problem with having to define a fixed stack level
for the warnings - for classes with a metaclass implemented in Python,
the warning message will point to the __new__ method of the metaclass
instead of to the first line of the offending class definition. I don't
see a way around this problem, but it definitely caused me grief in
tracking down some of the misbehaving classes in the standard library
that use ABCMeta as their metaclass in 2.6 (I actually cheated and
recompiled with the stack level on the warnings set to 2 instead of 1).

The update also eliminates the spurious and not-so-spurious Py3k
warnings that this update triggered in the test suite under the -3 flag.
Most were just test suite classes that happen to become unhashable in
Py3k, but a few were classes in the standard lib itself that defined
__eq__ or __cmp__ methods that were incompatible with the default
__hash__ implementation (collections.Mapping, collections.Set,
unittest.TestSuite, xml.dom.minidom.NamedNodeMap, numbers.Number,
UserList.UserList). Instances of all of the affected classes are now
explicitly unhashable in 2.x as well as 3.x (note that any code which
relied on any of them being hashable was already broken due to the fact
that these classes didn't provide the correct hash and equality invariants).

The numbers.Number change deserves special mention - the actual warning
occurs further down in the number stack (when numbers.Complex defines a
new __eq__ method), but it seemed appropriate to block the inheritance
of object.__hash__ in Number since an id() based hash isn't appropriate
for any numeric type. This particular aspect of the change should
probably be forward ported to Py3k.

In implementing the warnings, I'm also pretty happy with the current
Py3k approach that overriding __cmp__ or __eq__ means that __hash__
isn't inherited *at all*, rather than restricting the block solely to
object.__hash__. The current behaviour is simple to explain, and more
importantly, I think it is more correct - any time you change the
definition of equality for the class, you really do need to think
carefully about what it means for mutability and hashability.

P.S. I should never have said this part of the change was going to be
easy, because that was just begging old Murphy to come slap me upside
the head...
msg71022 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-08-11 15:53
I still intend to do the necessary documentation updates for this, but
they're going to be delayed a bit since getting the warnings right took
much longer than I expected.
msg71223 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-08-16 16:23
Let's lower the priority a little then.
msg71325 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-08-18 13:21
numbers.Number change forward ported to Py3k in r65808

Docs updated for 2.6 and 3.0 in r65810 and r65811 respectively.

Which means I can finally close this one :)
msg72114 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-08-28 21:48
Reopening - still need to fix the Python level docs for hash() and
__hash__().
msg72207 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-08-31 13:22
__hash__() documentation fixed for 2.6 in r66085 and for 3.0 in r66086.
History
Date User Action Args
2008-08-31 13:22:05ncoghlansetstatus: open -> closed
resolution: fixed
messages: + msg72207
2008-08-28 21:48:38ncoghlansetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg72114
2008-08-18 13:21:15ncoghlansetstatus: open -> closed
resolution: fixed
messages: + msg71325
2008-08-16 16:23:58benjamin.petersonsetpriority: release blocker -> critical
nosy: + benjamin.peterson
messages: + msg71223
2008-08-11 15:53:44ncoghlansetmessages: + msg71022
2008-08-11 15:48:31ncoghlansetmessages: + msg71021
2008-08-06 09:44:26ncoghlansetmessages: + msg70777
2008-08-05 16:06:41gvanrossumsetmessages: + msg70748
2008-07-18 03:45:40barrysetpriority: deferred blocker -> release blocker
2008-07-15 15:49:22ncoghlansetpriority: release blocker -> deferred blocker
messages: + msg69691
2008-07-06 08:09:53ncoghlansetmessages: + msg69325
2008-07-06 07:52:36ncoghlansetfiles: + issue2235_fix_hash_inheritance_with_function_ptr.diff
messages: + msg69324
2008-07-03 10:50:33ncoghlansetassignee: ncoghlan
messages: + msg69190
2008-07-02 22:32:05amaury.forgeotdarcsetmessages: + msg69137
2008-07-02 21:43:23ncoghlansetmessages: + msg69129
2008-07-02 15:15:24amaury.forgeotdarcsetmessages: + msg69101
2008-07-02 14:42:18ncoghlansetmessages: + msg69096
2008-06-29 13:30:52ncoghlansetassignee: ncoghlan -> (no value)
messages: + msg68949
2008-06-29 12:25:39ncoghlansetfiles: + issue2235_fix_hash_inheritance.diff
messages: + msg68948
2008-06-29 08:34:33ncoghlansetmessages: + msg68937
2008-06-28 01:02:45ncoghlansetassignee: ncoghlan
2008-06-28 00:56:16ncoghlansetmessages: + msg68866
2008-06-27 18:05:09gvanrossumsetmessages: + msg68839
2008-06-27 13:23:19ncoghlansetnosy: + ncoghlan
messages: + msg68828
2008-06-26 17:06:12gvanrossumsetassignee: gvanrossum -> (no value)
messages: + msg68795
2008-06-26 16:58:44schmirsetnosy: + schmir
2008-06-26 16:50:23glyphsetpriority: critical -> release blocker
nosy: + glyph
messages: + msg68792
2008-05-08 03:43:12barrysetpriority: release blocker -> critical
nosy: + barry
messages: + msg66399
2008-04-15 05:56:19nnorwitzsetassignee: amaury.forgeotdarc -> gvanrossum
2008-04-10 23:44:06amaury.forgeotdarcsetfiles: + inherit_hash3.patch
messages: + msg65329
2008-04-10 16:51:50gvanrossumsetmessages: + msg65299
2008-04-10 16:47:39gvanrossumsetpriority: high -> release blocker
messages: + msg65298
2008-04-10 16:46:57gvanrossumsetassignee: amaury.forgeotdarc
messages: + msg65297
2008-03-16 21:00:11eikeonsetnosy: + eikeon
2008-03-05 00:48:31amaury.forgeotdarcsetfiles: + inherit_hash2.patch
messages: + msg63272
2008-03-05 00:12:29gvanrossumsetmessages: + msg63271
2008-03-05 00:05:33amaury.forgeotdarcsetmessages: + msg63270
2008-03-05 00:00:51amaury.forgeotdarcsetfiles: + inherit_hash.patch
keywords: + patch
messages: + msg63269
2008-03-04 22:20:13rhettingersetpriority: high
2008-03-04 22:17:35gvanrossumsetmessages: + msg63265
2008-03-04 22:15:27rhettingersetnosy: + rhettinger
messages: + msg63264
2008-03-04 21:56:09amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, gvanrossum
type: crash -> behavior
messages: + msg63263
2008-03-04 19:49:23jekcreate