Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__eq__ / __hash__ check doesn't take inheritance into account #46488

Closed
jek mannequin opened this issue Mar 4, 2008 · 37 comments
Closed

__eq__ / __hash__ check doesn't take inheritance into account #46488

jek mannequin opened this issue Mar 4, 2008 · 37 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@jek
Copy link
Mannequin

jek mannequin commented Mar 4, 2008

BPO 2235
Nosy @gvanrossum, @warsaw, @rhettinger, @amauryfa, @ncoghlan, @benjaminp, @glyph
Files
  • inherit_hash.patch: restore previous behavior of hash inheritance
  • inherit_hash2.patch
  • inherit_hash3.patch
  • issue2235_fix_hash_inheritance.diff: Patch to allow classes to explicitly block inheritance of tp_hash
  • issue2235_fix_hash_inheritance_with_function_ptr.diff: Block inheritance with PyObject_HashNotImplemented
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = <Date 2008-08-31.13:22:05.534>
    created_at = <Date 2008-03-04.19:49:23.717>
    labels = ['interpreter-core', 'type-bug']
    title = "__eq__ / __hash__ check doesn't take inheritance into account"
    updated_at = <Date 2008-08-31.13:22:05.533>
    user = 'https://bugs.python.org/jek'

    bugs.python.org fields:

    activity = <Date 2008-08-31.13:22:05.533>
    actor = 'ncoghlan'
    assignee = 'ncoghlan'
    closed = True
    closed_date = <Date 2008-08-31.13:22:05.534>
    closer = 'ncoghlan'
    components = ['Interpreter Core']
    creation = <Date 2008-03-04.19:49:23.717>
    creator = 'jek'
    dependencies = []
    files = ['9607', '9608', '10002', '10773', '10827']
    hgrepos = []
    issue_num = 2235
    keywords = ['patch']
    message_count = 37.0
    messages = ['63258', '63263', '63264', '63265', '63269', '63270', '63271', '63272', '65297', '65298', '65299', '65329', '66399', '68792', '68795', '68828', '68839', '68866', '68937', '68948', '68949', '69096', '69101', '69129', '69137', '69190', '69324', '69325', '69691', '70748', '70777', '71021', '71022', '71223', '71325', '72114', '72207']
    nosy_count = 10.0
    nosy_names = ['gvanrossum', 'barry', 'rhettinger', 'amaury.forgeotdarc', 'ncoghlan', 'schmir', 'jek', 'benjamin.peterson', 'eikeon', 'glyph']
    pr_nums = []
    priority = 'critical'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue2235'
    versions = ['Python 2.6']

    @jek
    Copy link
    Mannequin Author

    jek mannequin commented Mar 4, 2008

    In 2.6a, seems like the __hash__ implementation and __eq__ must be
    defined together, in the same class. See also bpo-1549. Ensuring that a
    __hash__ implementation isn't being pulled from a builtin type is
    probably a sufficient check...?

    >>> class Base(object):
    ...     def __init__(self, name):
    ...         self.name = name
    ...     def __eq__(self, other):
    ...         return self.name == other.name
    ...     def __hash__(self):
    ...         return hash(self.name)
    ... 
    >>> class Extended(Base):
    ...     def __eq__(self, other):
    ...         print "trace: __eq__ called"
    ...         return Base.__eq__(self, other)
    ... 
    >>> hash(Base('b1'))
    603887253
    >>> hash(Extended('e1'))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'Extended'

    @jek jek mannequin added type-crash A hard crash of the interpreter, possibly with a core dump interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Mar 4, 2008
    @amauryfa
    Copy link
    Member

    amauryfa commented Mar 4, 2008

    This issue is similar to bpo-1549. Note that only new-style classes are
    concerned.

    I think the change was intentional. Guido, do you confirm?

    However there should be some documentation about it, a NEWS entry or an
    item in the "porting to 2.6" section.

    The initial modification appeared in the py3k branch (r51454):
    """
    Change the way __hash__ is inherited; when __eq__ or __cmp__ is
    overridden but __hash__ is not, set __hash__ explicitly to None (and
    tp_hash to NULL).
    """

    @amauryfa amauryfa added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Mar 4, 2008
    @rhettinger
    Copy link
    Contributor

    Wouldn't you expect this sort of thing to break code? Does it meet the
    criteria for backporting to 2.6?

    @gvanrossum
    Copy link
    Member

    The py3k feature was intentional.

    I'm not sure who did the backport (could've been me, long ago). I think
    the backport should be replaced with a warning that is only issued when
    -3 is given.

    @amauryfa
    Copy link
    Member

    amauryfa commented Mar 5, 2008

    The attached patch reverts r59576 and the part of r59106 about the
    tp_hash slot.

    It also adds the py3k warning::
    type defines __eq__ but not __hash__, and will not be hashable in 3.x
    printed when calling hash() on such an object.

    @amauryfa
    Copy link
    Member

    amauryfa commented Mar 5, 2008

    I noticed that my patch uses Py_TYPE(self)->tp_hash, whereas normal
    processing of slot_tp_hash() uses lookup_method(self,"__hash__",hash_str).

    I am almost sure that both expressions return the same value.
    Is this correct? Py_TYPE(self)->tp_hash seems much faster.

    @gvanrossum
    Copy link
    Member

    I noticed that my patch uses Py_TYPE(self)->tp_hash, whereas normal
    processing of slot_tp_hash() uses lookup_method(self,"__hash__",hash_str).

    I am almost sure that both expressions return the same value.
    Is this correct? Py_TYPE(self)->tp_hash seems much faster.

    HERE BE DRAGONS

    There are cases where Py_TYPE(self)->tp_hash is the function currently
    executing (some wrapper(s) in typeobject.c). OTOH, lookup_method(...)
    finds the Python code in the class __dict__s along the MRO.

    @amauryfa
    Copy link
    Member

    amauryfa commented Mar 5, 2008

    Ah, I remember that it took me some time to understand the boundaries
    between a slot and the corresponding special method.

    Here is another version of the patch, which does not test tp_hash while
    we are currently running the tp_hash function...
    I also added the warning for old-style classes.

    @gvanrossum
    Copy link
    Member

    Hi Amaury, thanks for your work on this. I applied your patch and looked
    at the code but there seem to be some issues left still.

    (1) I don't think you need to add a warning to classic classes that
    define __eq__ without __hash__ -- these aren't hashable and never were.
    The problem is purely with new classes AFAICT.

    (2) I can't seen to trigger the warning for new classes (and yes, I
    specified -3 on the command line :-):

    >>> class C(object):
    ...   def __eq__(self, other): return False
    ...
    >>> hash(C())
    -134976284

    (3) test_Hashable in test_collections.py fails. This is because the
    Hashable class believes that list, set, and dict are hashable.
    Something odd is going on:

    >>> list.__hash__([])
    -135100116

    but

    >>> hash([])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'

    Note that in 2.5, both raise TypeError.

    @gvanrossum
    Copy link
    Member

    Let's try to have this resolved before the next release.

    Also let's try not to accidentally merge the 2.6 changes related to this
    issue into 3.0. :-)

    @gvanrossum
    Copy link
    Member

    Note: some more tests also fail:

    test_descr, test_quopri, test_set

    @amauryfa
    Copy link
    Member

    Here is a new patch, all tests pass.

    • no warning for classic classes
    • added tests for -3 warnings

    I had to add two TPSLOT entries to the slodefs list; can you please
    check that this is correct (I wanted to reset tp_hash to NULL when
    __eq__ or __cmp__ are defined).

    @nnorwitz nnorwitz mannequin assigned gvanrossum and unassigned amauryfa Apr 15, 2008
    @warsaw
    Copy link
    Member

    warsaw commented May 8, 2008

    Guido, can you comment on Amaury's latest patch? I'm going to bump this
    back to critical so as not to hold up 2.6 alpha, but it should be marked
    as a release blocker for the first beta.

    @glyph
    Copy link
    Mannequin

    glyph mannequin commented Jun 26, 2008

    As barry said, this should have been a release blocker for the first
    beta... ;-)

    @glyph glyph mannequin added the release-blocker label Jun 26, 2008
    @gvanrossum
    Copy link
    Member

    I'm going to have to ask someone else to look at this code. I am too
    busy with too many things to be able to take on a detailed code review.

    @gvanrossum gvanrossum removed their assignment Jun 26, 2008
    @ncoghlan
    Copy link
    Contributor

    Amaury's patch doesn't currently remove the new-in-2.6 code that turns
    "tp_hash = NULL" at the C level into "__hash__ = None" at the Python
    level. I suspect that will prove to be a problem (and may be the cause
    of Django's woes, if I remember Glyph's python-dev posts correctly). I
    also don't believe the new TPSLOT entries should be necessary - we
    should be able to just revert all of this tp_hash related code to match
    the 2.5 implementation, and then see if we can find a way to emit the
    Py3k warning when __cmp__ or __eq__ are overridden without overriding
    __hash__ without breaking any existing code (if we can't, we may have to
    figure out a way to get 2to3 to check for it).

    P.S. you'd think I would have learnt by now not to try to make sense of
    typeobject.c when I'm tired ;)

    @gvanrossum
    Copy link
    Member

    Thanks for giving this some time. I think that backwards compatibility
    should be a higher priority than being able to issue -3 warnings -- if
    the warnings can't be generated, too bad, we'll just have to document
    it. (I don't think that checking things at class scope is an easy task
    in 2to3, though I may be underestimating it.)

    So, concluding, insofar as the proposal is to revert 2.6 to the 2.5
    semantics where it comes to __eq__ and __hash__, I'm okay with that if
    there's no other way to maintain backwards compatibility, even though my
    preference would be to still flag this when -3 is specified. (There may
    be similar backwards compatibility issues caused by stricted arg
    checking in __new__ and __init__ -- the same remarks apply there.)

    @ncoghlan
    Copy link
    Contributor

    The _new__/init stuff showed up on my diff as well (obviously) -
    that seemed to be doing a pretty good job of maintaining backwards
    compatibility.

    I'll dig into this further today and tomorrow - I'll probably start by
    reverting the relevant code back to exactly what is in 2.5, and then see
    how far I can get with adding the warning code back in while still
    keeping jek's test case working correctly.

    @ncoghlan ncoghlan self-assigned this Jun 28, 2008
    @ncoghlan
    Copy link
    Contributor

    Attached patch allows classes to explicitly block inheritance of
    object.__hash__ by setting the tp_hash slot to Py_None or the __hash__
    attribute to None.

    This is similar to the approach used in Py3k, but allows __hash__ to be
    inherited by default, with classes explicitly disabling it as
    appropriate. I'd also suggest forward porting this approach to Py3k as
    well, and then possibly look at replacing a copied tp_hash slot with
    Py_None when the inherited tp_hash is object.tp_hash, but tp_cmp or
    tp_richcompare are different.

    @ncoghlan
    Copy link
    Contributor

    Unassigning for the moment - I'll take a look at adding the Py3k warning
    back in at some point, but the main thing I would like now is a sanity
    check from Guido or Barry (or anyone else happy to dig into the guts of
    typeobject.c) that the basic idea behind my fix is sound.

    It would be nice to avoid the explicit check for Py_None in
    PyObject_Compare by getting update_one_slot() to replace the Py_None it
    encounters in __hash__ with the default slot_tp_hash implementation, but
    my attempts at doing that have all led to segfaults.

    (Also, glyph, I'd love to know if my patch helps to clear up Django's
    issues with 2.6b1)

    @ncoghlan ncoghlan removed their assignment Jun 29, 2008
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jul 2, 2008

    Suggestion from GvR (one I like): instead of re-using Py_None, add a new
    C function that is stored in the tp_hash slot as a sentinel instead of
    the Py_None value used in the posted version of the patch. This will
    avoid breaking code that just checks for NULL before calling the tp_hash
    slot directly, while still allowing typeobject.c to detect that the
    object isn't actually hashable despite the presence of a non-NULL value
    in the tp_hash slot.

    (I'll keep the None at the Python level though, since that matches the
    Py3k behaviour and plays nicely with collections.Hashable)

    @amauryfa
    Copy link
    Member

    amauryfa commented Jul 2, 2008

    I don't like the changes in listobject.c and dictobject.c: they seem to
    imply that all unhashable types must be modified when upgrading to 2.6.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jul 2, 2008

    Unhashable types that implement an "I'm not hashable" __hash__ method
    will indeed require modification in 2.6 in order to avoid incorrectly
    passing an "isinstance(obj, collections.Hashable)" check (note that
    several of the mutable standard library types such as collections.deque
    are incorrectly detected as hashable in the current SVN trunk).

    @amauryfa
    Copy link
    Member

    amauryfa commented Jul 2, 2008

    Isn't that a serious compatibility issue? 2.5 code should work on 2.6
    with minimal effort.

    For deque objects, I don't get your point:

    Python 2.6b1+ (trunk, Jul  1 2008, 22:35:48) [MSC v.1500 32 bit Intel)]
    on win32
    >>> from collections import deque
    >>> hash(deque())
    TypeError: deque objects are unhashable

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jul 3, 2008

    As far as deque goes, the following behaviour on the trunk is the
    problem I am trying to fix:

    Python 2.6b1+ (trunk:64655, Jul  2 2008, 22:48:24)
    [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from collections import deque, Hashable
    >>> isinstance(deque(), Hashable)
    True

    All of the container types that my patch touches were already unhashable
    in 2.5 - my patch just ensures that they all correctly return false for
    isinstance(obj, collections.Hashable) even after we make object.__hash__
    inherited by default again. This is also the reason why simply reverting
    the trunk hash implementation to the 2.5 behaviour (which is what I
    first tried) is inadequate.

    Since collections.Hashable is new in 2.6, I can live with it returning
    the wrong answer for types which define __hash__ to always raise an
    error (that's a known limitation of the ABC system, even in Py3k).
    However, we should at least make sure it returns the correct answer for
    all of the types in the standard library.

    @ncoghlan ncoghlan self-assigned this Jul 3, 2008
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jul 6, 2008

    Attached updated patch which implements Guido's suggestion from above
    (inheritance of __hash__ is blocked at the C level by setting the slot
    to PyObject_HashNotImplemented and at the Python level by setting
    __hash__ = None).

    All tests which don't depend on a -u resource flag still pass (currently
    running a -uall set of tests as well).

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jul 6, 2008

    Two failures in the -uall run (test_codecmaps_hk, test_linuxaudiodev).
    However, I see those test failures even after reverting my changes, so
    they aren't related to this.

    @ncoghlan
    Copy link
    Contributor

    Fixed for 2.6 in r64962 and the underlying PyObject_HashNotImplemented
    mechanic forward ported to 3.0 in r64967.

    Still need to update the C API documentation and the library reference,
    as well as implement the Py3k warning in 2.6, but I won't get to those
    for this beta - dropping priority to deferred blocker.

    @gvanrossum
    Copy link
    Member

    How's this going?

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Aug 6, 2008

    The documentation and Py3k warning will be ready for the third beta next
    week (the remaining part is a lot easier than the initial fix).

    @ncoghlan
    Copy link
    Contributor

    Py3k warnings for this have been checked in on the trunk as r65642.

    There's an unfortunate problem with having to define a fixed stack level
    for the warnings - for classes with a metaclass implemented in Python,
    the warning message will point to the __new__ method of the metaclass
    instead of to the first line of the offending class definition. I don't
    see a way around this problem, but it definitely caused me grief in
    tracking down some of the misbehaving classes in the standard library
    that use ABCMeta as their metaclass in 2.6 (I actually cheated and
    recompiled with the stack level on the warnings set to 2 instead of 1).

    The update also eliminates the spurious and not-so-spurious Py3k
    warnings that this update triggered in the test suite under the -3 flag.
    Most were just test suite classes that happen to become unhashable in
    Py3k, but a few were classes in the standard lib itself that defined
    __eq__ or __cmp__ methods that were incompatible with the default
    __hash__ implementation (collections.Mapping, collections.Set,
    unittest.TestSuite, xml.dom.minidom.NamedNodeMap, numbers.Number,
    UserList.UserList). Instances of all of the affected classes are now
    explicitly unhashable in 2.x as well as 3.x (note that any code which
    relied on any of them being hashable was already broken due to the fact
    that these classes didn't provide the correct hash and equality invariants).

    The numbers.Number change deserves special mention - the actual warning
    occurs further down in the number stack (when numbers.Complex defines a
    new __eq__ method), but it seemed appropriate to block the inheritance
    of object.__hash__ in Number since an id() based hash isn't appropriate
    for any numeric type. This particular aspect of the change should
    probably be forward ported to Py3k.

    In implementing the warnings, I'm also pretty happy with the current
    Py3k approach that overriding __cmp__ or __eq__ means that __hash__
    isn't inherited *at all*, rather than restricting the block solely to
    object.__hash__. The current behaviour is simple to explain, and more
    importantly, I think it is more correct - any time you change the
    definition of equality for the class, you really do need to think
    carefully about what it means for mutability and hashability.

    P.S. I should never have said this part of the change was going to be
    easy, because that was just begging old Murphy to come slap me upside
    the head...

    @ncoghlan
    Copy link
    Contributor

    I still intend to do the necessary documentation updates for this, but
    they're going to be delayed a bit since getting the warnings right took
    much longer than I expected.

    @benjaminp
    Copy link
    Contributor

    Let's lower the priority a little then.

    @ncoghlan
    Copy link
    Contributor

    numbers.Number change forward ported to Py3k in r65808

    Docs updated for 2.6 and 3.0 in r65810 and r65811 respectively.

    Which means I can finally close this one :)

    @ncoghlan
    Copy link
    Contributor

    Reopening - still need to fix the Python level docs for hash() and
    __hash__().

    @ncoghlan ncoghlan reopened this Aug 28, 2008
    @ncoghlan
    Copy link
    Contributor

    __hash__() documentation fixed for 2.6 in r66085 and for 3.0 in r66086.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    arizvisa added a commit to arizvisa/syringe that referenced this issue Feb 7, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants