Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataclasses: make it easier to use user-supplied special methods #76694

Closed
ericvsmith opened this issue Jan 7, 2018 · 25 comments
Closed

dataclasses: make it easier to use user-supplied special methods #76694

ericvsmith opened this issue Jan 7, 2018 · 25 comments
Assignees
Labels
3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error

Comments

@ericvsmith
Copy link
Member

BPO 32513
Nosy @gvanrossum, @ncoghlan, @larryhastings, @ericvsmith, @ilevkivskyi
PRs
  • Misc questions, suggestions, and error corrections for dataclasses #4726
  • bpo-32513: Make it easier to override dunders in dataclasses. #5366
  • bpo-32153: Add unit test for create_autospec with partial function returned in getattr #10398
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericvsmith'
    closed_at = <Date 2018-01-28.00:09:26.547>
    created_at = <Date 2018-01-07.17:06:05.599>
    labels = ['type-bug', '3.7']
    title = 'dataclasses: make it easier to use user-supplied special methods'
    updated_at = <Date 2018-11-09.02:30:23.612>
    user = 'https://github.com/ericvsmith'

    bugs.python.org fields:

    activity = <Date 2018-11-09.02:30:23.612>
    actor = 'xtreak'
    assignee = 'eric.smith'
    closed = True
    closed_date = <Date 2018-01-28.00:09:26.547>
    closer = 'eric.smith'
    components = []
    creation = <Date 2018-01-07.17:06:05.599>
    creator = 'eric.smith'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32513
    keywords = ['patch']
    message_count = 25.0
    messages = ['309628', '309976', '309979', '310392', '310394', '310399', '310402', '310404', '310417', '310462', '310463', '310464', '310556', '310569', '310653', '310668', '310760', '310775', '310776', '310830', '310836', '310870', '310871', '310901', '311149']
    nosy_count = 5.0
    nosy_names = ['gvanrossum', 'ncoghlan', 'larry', 'eric.smith', 'levkivskyi']
    pr_nums = ['4726', '5366', '10398']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue32513'
    versions = ['Python 3.7']

    @ericvsmith
    Copy link
    Member Author

    Modify dataclasses to make it easier to specify special methods.

    For example: currently, if you want to override __repr__, you need to specify @DataClass(repr=False), and also provide your own __repr__. Also, it's current an error to not specify repr=False and to also provide your own __repr__. @DataClass should be able to determine that if you provide __repr__, you don't want it to provide it's own __repr__.

    The methods that @DataClass will provide for you are:
    __init__
    __repr__
    __eq__
    __ne__
    __lt__
    __le__
    __gt__
    __ge__
    _hash__
    and if using frozen=True, also:
    __setattr__
    __delattr__

    Per the discussion that started at https://mail.python.org/pipermail/python-dev/2017-December/151487.html, the change will be to override these methods only if:

    1. the method would have been generated based on @DataClass params, and
    2. the method is not present in the class's __dict__.

    The first item is to allow the user to continue to suppress method generation, and the second item is so that base-class methods are not considered in the decision.

    I'm still thinking through how __eq__ and __ne__ should be handled (as well as the ordering comparisons, too). I'll raise an issue on python-dev and point it here.

    @ericvsmith ericvsmith added the 3.7 (EOL) end of life label Jan 7, 2018
    @ericvsmith ericvsmith self-assigned this Jan 7, 2018
    @ericvsmith ericvsmith added the type-bug An unexpected behavior, bug, or error label Jan 7, 2018
    @ericvsmith
    Copy link
    Member Author

    Another case to consider is that if a class defines __eq__, Python will automatically set hash=False before the decorator gets involved. NOte to self: be sure to add a test case for this.

    @ericvsmith
    Copy link
    Member Author

    See bpo-32546 for the hash=None problem.

    @ericvsmith
    Copy link
    Member Author

    Here is my proposal for making it easier for the user to supply dunder methods that dataclasses would otherwise automatically add to the class.

    For all of these cases, when I talk about init=, repr=, eq=, order=, hash=, or frozen=, I'm referring to the parameters to the dataclass decorator.

    When checking if a dunder method already exists, I mean check for an entry in the class's __dict__. I never check to see if a member is defined in a base class.

    Let's get the easy ones out of the way first.

    __init__
    If __init__ exists or init=False, don't generate __init__.

    __repr__
    If __repr__ exists or repr=False, don't generate __repr__.

    __setattr__
    __delattr__
    If frozen=True and either of these methods exist, raise a TypeError. These methods are needed for the "frozen-ness" of the class to be implemented, and if they're already set then that behavior can't be enforced.

    __eq__
    If __eq__ exists or eq=False, don't generate __eq__.

    And now the harder ones:

    __ne__
    I propose to never generate a __ne__ method. Python will call __eq__ and negate the result for you. If you feel you really need a non-matching __ne__, then you can write it yourself without interference from dataclasses.

    __lt__
    __le__
    __gt__
    __ge__
    I propose to treat each of these individually, but for each of them using the value of the order= parameter. So:
    If __lt__ exists or order=False, don't generate __lt__.
    If __le__ exists or order=False, don't generate __le__.
    If __gt__ exists or order=False, don't generate __gt__.
    If __ge__ exists or order=False, don't generate __ge__.
    If for some crazy reason you want to define some of these but not others, then set order=False and write your desired methods.

    __hash__
    Whether dataclasses might generate __hash__ depends on the values of hash=, eq=, and frozen=. Note that the default value of hash= is None. See below for an explanation.

    If hash=False, never generate __hash__. If hash=True, generate __hash__ unless it already exists.

    If hash=None (the default), then use this table to decide whether and how to generate __hash__:
    eq=? frozen=? __hash__
    False False do not generate __hash__
    False True do not generate __hash__
    True False set __hash__ to None unless it already exists
    True True generate __hash__ unless it already exists
    and is None

    Note that it's almost always a bad idea to specify hash=True or hash=False. The action based on the above table (where hash=None, the default), is usually the correct behavior.

    One special case to recognize is if the class defines a __eq__. In this case, Python will assign __hash__=None before the dataclass decorator is called. The decorator cannot distinguish between these two cases (except possibly by using the order of __dict__ keys, but that seems overly fragile):

    @dataclass
    class A:
        def __eq__(self, other): pass
    
    @dataclass
    class B:
        def __eq__(self, other): pass
        __hash__ = None

    This is the source of the last line in the above table: for a dataclass where eq=True, frozen=True, and hash=None, if __hash__ is None it will still be overwritten. The assumption is that this is what the user wants, but it's a tricky corner case. It also occurs if setting hash=True and defining __eq__. Again, it's not expected to come up in normal usage.

    @larryhastings
    Copy link
    Contributor

    Do all the parameters to the decorator take a default value of "None", so that you can differentiate between explicit True, explicit False, and did-not-specify?

    Is this the complete list of these dunder-method-generation-control parameters? init repr eq order hash

    @ericvsmith
    Copy link
    Member Author

    Only hash has the tri-state True/False/None behavior, defaulting to None. It's this way because None is the "do what's rational, based on eq and frozen" behavior. None of the other parameters work this way.

    There's a long issue in the attrs repo that describes how they came to this conclusion: python-attrs/attrs#136. I think it makes sense.

    None of the dataclass parameters have a sentinel that would let me detect if the user provided a value or not. In the case of hash, I can't detect if they explicitly passed hash=None or just didn't provide a value. I've given this some thought, and couldn't come up with a use case for knowing this. For example, if init had a sentinel value of MISSING, what would the code ever do except start with:
    if init is sentinel:
    init = True
    ?

    In addition to your list of dunder-control-parameters, there's frozen. It determines if instances are immutable (to the extent that they can be made immutable).

    @ericvsmith
    Copy link
    Member Author

    Oops, that should have been:

    if init is MISSING:
        init = True

    @larryhastings
    Copy link
    Contributor

    So if I understand correctly: the default value of the "repr" parameter is True.

    Decision matrix: does dataclass insert a __repr__ into the class?

      +-- row: argument passed in to decorator's repr parameter
      |
      |
      v    | yes     | no      |  <--- columns: does the class have a __repr__?
    -------+---------+---------+
     True  | ???     | yes     |
    -------+---------+---------+
     False | no      | no      |
    -------+---------+---------+

    If the user specifies "repr=True", and also specifies their own __repr__ in the class, does dataclasses stomp on their __repr__ with its own? Does it throw an exception?

    I still prefer the tri-state value here. In this version, the default value of the "repr" parameter would be None, and the decision matrix for inserting a __repr__ looks like this:

      +-- row: argument passed in to decorator's repr parameter
      |
      |
      v    | yes     | no      |  <--- columns: does the class have a __repr__?
    -------+---------+---------+
     True  | raise   | yes     |
    -------+---------+---------+
     None  | no      | yes     |
    -------+---------+---------+
     False | no      | no      |
    -------+---------+---------+

    But we've talked about using the tri-state default for all of these parameters before, and clearly you weren't swayed then, so I guess I've said my peace and I'll stop suggesting it.

    @ericvsmith
    Copy link
    Member Author

    The discussion on python-dev was that your ??? box would be "no": if the user supplied __repr__, they obviously meant for dataclass() to not provide one.

    I can't see why the user would say repr=True ("I want dataclass() to add __repr__"), but then provide a __repr__ and get an exception. That looks like the only functionality added by your repr=True row over the proposal. Where your proposal uses repr=None for the "no", "yes" row, mine uses repr=True.

    It's not like there's action at a distance here: the user is writing the class. Especially since base classes are ignored.

    I'm ignoring make_dataclasses(), where the user is dynamically creating a class and maybe a __repr__ snuck in. But I don't care so much about that case.

    I do think your ascii tables are a good way of explaining this. Thanks! (Now I need a 3D version for eq, frozen, hash!)

    @ncoghlan
    Copy link
    Contributor

    For the ordering operators, my only question would be whether or not I can rely on them to act like functools.total_ordering: if I supply __eq__ and one of the ordering operators (e.g. __lt__), will dataclasses make sure the other three ordering operators are consistent with those base methods? Or will it bypass them and act directly on the underlying fields?

    My suggestion would be to say that if any of __lt__, __le__, __gt__ or __ge__ are defined, then data classes will implicitly generate the other methods based on functools.total_ordering semantics, and will only reference the underlying fields directly if *none* of them are defined. Otherwise I can see folks defining a single method like "__lt__", and being surprised when they end up with inconsistent comparison behaviour.

    @ericvsmith
    Copy link
    Member Author

    Rather than re-implementing (and maintaining) functools.total_ordering semantics, I'd rather advise them to specify order=False and just use functools.total_ordering. It's an odd use case for dataclasses, anyway.

    @ncoghlan
    Copy link
    Contributor

    I'd be fine with that recommendation (since @dataclass(order=False) and @total_ordering will compose without any problems), but in that case I'd suggest having "order=True" + any of the ordering methods result in an exception (as you've proposed for frozen=True and the methods that it needs to override).

    @gvanrossum
    Copy link
    Member

    Are we clear on the proposal now? It would be good to have this implemented in beta 1. Eric's long message looks good to me (though I have to admit the logic around __hash__ confuses me, but I've spent two long days at PyCascades, and I trust Eric's judgment).

    @ericvsmith
    Copy link
    Member Author

    I'm working on a patch before beta 1. The only issue is whether to raise in the case that order=True and the user specifies one of the 4 ordering dunders. I think raising is okay, especially since order=False is the default: you'd have to go out of your way to cause the exception. I'll mention functools.total_ordering in the exception message.

    I do think that the discussion in my long message about __hash__ is correct, although I'm not done thinking it through. I think it's already pretty well tested, but if not I'll make sure there are enough test cases for objects that I think should or shouldn't be hashable that I'll have all of the kinks worked about before beta 1.

    @gvanrossum
    Copy link
    Member

    Raising for order=True if one of the ordering dunders exists sounds fine.

    I am confused by the corner case for hash. Your table:

    """
    eq=? frozen=? __hash__
    False False do not generate __hash__
    False True do not generate __hash__
    True False set __hash__ to None unless it already exists
    True True generate __hash__ unless it already exists
    and is None
    """

    Then you write at the end of that message:

    """
    One special case to recognize is if the class defines a __eq__. In this case, Python will assign __hash__=None before the dataclass decorator is called. The decorator cannot distinguish between these two cases (except possibly by using the order of __dict__ keys, but that seems overly fragile):

    @dataclass
    class A:
        def __eq__(self, other): pass
    
    @dataclass
    class B:
        def __eq__(self, other): pass
        __hash__ = None

    This is the source of the last line in the above table: for a dataclass where eq=True, frozen=True, and hash=None, if __hash__ is None it will still be overwritten. The assumption is that this is what the user wants, but it's a tricky corner case. It also occurs if setting hash=True and defining __eq__. Again, it's not expected to come up in normal usage.
    """

    I think I understand what you are saying there -- the two cases are treated the same, and a __hash__ is created (assuming the decorator is really "@DataClass(eq=True, frozen=True)"), overwriting the "__hash__ = None" for class B.

    However the table's last line says "generate __hash__ unless it already exists and is None". Perhaps that was a typo and you meant to write "and is *not* None"?

    @ericvsmith
    Copy link
    Member Author

    I'll look at the hash question later today. It's sufficiently complex that I really want to sit down and work through all of the cases.

    @ericvsmith
    Copy link
    Member Author

    Here's where I am on what methods are added under which circumstances. Hopefully these tables don't get mangled.

    The more I think about what to do with hash and raising exceptions, the more I think that the right answer is to not raise any exceptions, and just follow the rule of "don't overwrite an attribute that the user specified". The uses of hash=True or hash=None are so specialized that users can be expected to understand the implications. The interaction among hash=, eq=, and frozen= are sufficiently complicated (see the last table below) that we don't need to add any more rules.

    There are only 2 places below where exceptions are raised:

    1. frozen=True and the user supplied __setattr__ or __delattr__. These methods are crucial for frozen-ness, so we must be able to add them.

    2. order=True and the user supplied at least one of __lt__, __le__, __ge__, or __gt__. This is at Nick's request (see msg310464 in this issue). I'm not so sure about this one: if the user wants to explicitly ask that these methods be added (by specifying order=True, which is not the default), and they also specify some of the ordering function, then why not let them, and have dataclass add the missing ones as we normally would? But I'm not heavily invested in the outcome, as I don't see any reason to write such code, anyway. The only reason I'd suggest leaving it out is to avoid having to explain it in the documentation.

    I have this all coded up and ready to create a PR, modulo a few more tests. I'll get it committed before Beta 1. Hopefully these tables answers the questions about the new proposal.

    # Conditions for adding methods:
    # added = method is added
    # = no action: method is not added
    # raise = TypeError is raised
    # None = attribute is set to None

    # __init__

    # +--- init= parameter
    # |
    # v | yes | no | <--- class has __init__?
    # -------+---------+---------+
    # False | | |
    # -------+---------+---------+
    # True | | added | <- the default
    # -------+---------+---------+

    # __repr__

    # +--- repr= parameter
    # |
    # v | yes | no | <--- class has __repr__?
    # -------+---------+---------+
    # False | | |
    # -------+---------+---------+
    # True | | added | <- the default
    # -------+---------+---------+

    # __setattr__
    # __delattr__

    # +--- frozen= parameter
    # |
    # v | yes | no | <--- class has __setattr__ or __delattr__?
    # -------+---------+---------+
    # False | | | <- the default
    # -------+---------+---------+
    # True | raise * | added |
    # -------+---------+---------+

    # * Raise because not adding these methods
    # would break the "frozen-ness" of the class.

    # __eq__

    # +--- eq= parameter
    # |
    # v | yes | no | <--- class has __eq__?
    # -------+---------+---------+
    # False | | |
    # -------+---------+---------+
    # True | | added | <- the default
    # -------+---------+---------+

    # __lt__
    # __le__
    # __gt__
    # __ge__

    # +--- order= parameter
    # |
    # v | yes | no | <--- class has any comparison method?
    # -------+---------+---------+
    # False | | | <- the default
    # -------+---------+---------+
    # True | raise * | added |
    # -------+---------+---------+

    # * Raise because to allow this case would interfere with using
    # functools.total_ordering.

    # __hash__

    # +------------------- hash= parameter
    # | +----------- eq= parameter
    # | | +--- frozen= parameter
    # | | |
    # v | v | v | yes | no | <--- class has __hash__?
    # -------+-------+-------+-------+-------+
    # False | False | False | | |
    # -------+-------+-------+-------+-------+
    # False | False | True | | |
    # -------+-------+-------+-------+-------+
    # False | True | False | | |
    # -------+-------+-------+-------+-------+
    # False | True | True | | |
    # -------+-------+-------+-------+-------+
    # True | False | False | | added | Has no __eq__, but hashable
    # -------+-------+-------+-------+-------+
    # True | False | True | | added | Has no __eq__, but hashable
    # -------+-------+-------+-------+-------+
    # True | True | False | | added | Not frozen, but hashable
    # -------+-------+-------+-------+-------+
    # True | True | True | | added | Frozen, so hashable
    # -------+-------+-------+-------+-------+
    # None | False | False | | | No __eq__, use the base class __hash__
    # -------+-------+-------+-------+-------+
    # None | False | True | | | No __eq__, use the base class __hash__
    # -------+-------+-------+-------+-------+
    # None | True | False | | None | <-- the default, not hashable
    # -------+-------+-------+-------+-------+
    # None | True | True | | added | Frozen, so hashable
    # -------+-------+-------+-------+-------+

    # For boxes that are blank, __hash__ is untouched and therefore
    # inherited from the base class. If the base is object, then
    # id-based hashing is used.
    # Note that a class may have already __hash__=None if it specified an
    # __eq__ method in the class body (not one that was created by
    # dataclass).

    @ericvsmith
    Copy link
    Member Author

    I'm working on a better formulation of the __hash__ issue, so don't invest much time in this until I post my next message. I'm going to re-organize the various tables to make them easier to read.

    @gvanrossum
    Copy link
    Member

    Can you also clarify that this *never* looks at whether the method is
    implemented in a superclass?

    And clarify what happens with the __hash__ = None that's automatically
    created when you define __eq__.

    Please also repeat the default value of the flag in each case.

    @ericvsmith
    Copy link
    Member Author

    I apologize for the length of this, but I want to be as precise as
    possible. I've no doubt made some mistakes, so corrections and
    discussion are welcomed.

    I'm adding the commented text at the end of this message to
    dataclasses.py. I'm repeating it here for discussion. These tables
    are slightly different from previous versions on this issue, and I've
    added line numbers to the __hash__ table to make it easier to discuss.
    So any further comments and discussion should reference the tables in
    this message.

    I think the __init__, __repr__, __set/delattr__, __eq__, and ordering
    tables are not controversial.

    ** hash

    For __hash__, the 99.9% use case is that the default value of
    hash=None is sufficient. This is rows 1-4 of that table. It's
    unfortunate that I have to spend so much of the following text
    describing cases that I think will rarely be used, but I think it's
    important that in these special cases we don't do anything surprising.

    **** hash=None

    First, let's discuss hash=None, which is the default. This is lines
    1-4 of the __hash__ table.

    Here's an example of line 1:

    @dataclass(hash=None, eq=False, frozen=False)
    class A:
        i: int

    The user doesn't want an __eq__, and the class is not frozen. Whether
    or not the user supplied a __hash__, no hash will be added by
    @DataClass. In the absense of __hash__, the base class (in this case,
    object) __hash__. object.__hash__ and object.__eq__ are based on
    object identity.

    Here's an example of line 2:

    @dataclass(hash=None, eq=False, frozen=True)
    class A:
        i: int

    This is a frozen class, where the user doesn't want an __eq__. The
    same logic is used as for line 1: no __hash__ is added.

    Here's an example of line 3, "no" column (no __hash__). Note that
    this line shows the default values for hash=, eq=, frozen=:

    @dataclass(hash=None, eq=True, frozen=False)
    class A:
        i: int

    In this case, if the user doesn't provide __hash__ (the "no" column),
    we'll set __hash__=None. That's because it's a non-frozen class with
    an __eq__, it should not be hashable.

    Here's an example of the line 3, "yes" column (hash defined):

    @dataclass(hash=None, eq=True, frozen=False)
    class A:
        i: int
        def __hash__(self): pass

    Since a __hash__ exists, it will not be overwritten. Note that this
    is also true if __hash__ were set to None. We're just checking that
    __hash__ exists, not it's value.

    Here's an example of line 4, "no" column:

    @dataclass(hash=None, eq=True, frozen=True)
    class A:
        i: int

    In this case the action is "add", and a __hash__ method will be
    created.

    And finally, consider line 4, "yes" column. This case is hash=None,
    eq=True, frozen=True. Here we also apply the auto-hash test, just
    like we do in the hash=True case (line 12, "yes" column). We want to
    make the frozen instances hashable, so we add our __hash__ method,
    unless the user has directly implemented __hash__.

    **** hash=False

    Next, consider hash=False (rows 5-8). In this case, @DataClass will
    never add a __hash__ method, and if one exists it is not modified.

    **** hash=True

    Lastly, consider hash=True. This is rows 9-12 of the __hash__ table.
    The user is saying they always want a __hash__ method on the class.
    If __hash__ does not already exist (the "no" column), then @DataClass
    will add a __hash__ method. If __hash__ does already exist in the
    class's __dict__, @DataClass will overwrite it if and only if the
    current value of __hash__ is None, and if the class has an __eq__ in
    the class definition. Let's call this condition the "auto-hash test".
    The assumption is that the only reason a class has a __hash__ of None
    and an __eq__ method is that the __hash__ was automatically added by
    Python's class creation machinery. And if the hash was auto-added,
    then by the user specifying hash=True, they're saying they want the
    __hash__ = None overridden by a generated __hash__ method.

    I'll give examples from line 9 of the __hash__ table, but this
    behavior is the same for rows 9-12. Consider:

    @dataclass(hash=True, eq=False, frozen=False)
    class A:
        i: int
        __hash__ = None

    This is line 9, "yes" column, action="add*", which says to add a
    __hash__ method if the class passes the auto-hash test. In this case
    the class fails the test because it doesn't have an __eq__ in the
    class definition. __hash__ will not be overwritten.

    Now consider:

    @dataclass(hash=True, eq=False, frozen=False)
    class A:
        i: int
        def __eq__(self, other): ...

    This again is line 9, "yes" column, action="add*". In this case, the
    user is saying to add a __hash__, but it already exists because Python
    automatically added __hash__=None when it created the class. So, the
    class passes the auto-hash test. So even though there is a __hash__,
    we'll overwrite it with a generated method.

    Now consider:

    @dataclass(hash=True, eq=False, frozen=False)
    class A:
        i: int
        def __eq__(self, other): ...
        def __hash__(self): ...

    Again, this is line 9, "yes" column, action="add*". The existing
    __hash__ is not None, so we don't overwrite the user's __hash__
    method.

    Note that a class can pass the auto-hash test but not have an
    auto-generated __hash__=None. There's no way for @DataClass to
    actually know that __hash__=False was auto-generated, it just assumes
    that that's the case. For example, this class passes the auth-hash
    test and __hash__ will be overwritten:

    @dataclass(hash=True, eq=False, frozen=False)
    class A:
        i: int
        def __eq__(self, other): ...
        __hash__=None

    A special case to consider is lines 11 and 12 from the table. Here's an
    example of line 11, but line 12 is the same for the purposes of this
    discussion:

    @dataclass(hash=True, eq=True, frozen=False)
    class A:
        i: int
        __hash__=None

    The class will have a generated __eq__, because eq=True. However, the
    class still fails the auto-hash test, because the class's __dict__ did
    not have an __eq__ that was added by the class definition. Instead,
    it was added by @DataClass. So this class fails the auto-hash test
    and the __hash__ value will not be overwritten.

    Tables follow:

    # Conditions for adding methods. The boxes indicate what action the
    # dataclass decorator takes. For all of these tables, when I talk
    # about init=, repr=, eq=, order=, hash=, or frozen=, I'm referring
    # to the arguments to the @DataClass decorator. When checking if a
    # dunder method already exists, I mean check for an entry in the
    # class's __dict__. I never check to see if an attribute is defined
    # in a base class.

    # Key:
    # +=========+=========================================+
    # + Value | Meaning |
    # +=========+=========================================+
    # | <blank> | No action: no method is added. |
    # +---------+-----------------------------------------+
    # | add | Generated method is added. |
    # +---------+-----------------------------------------+
    # | add* | Generated method is added only if the |
    # | | existing attribute is None and if the |
    # | | user supplied a __eq__ method in the |
    # | | class definition. |
    # +---------+-----------------------------------------+
    # | raise | TypeError is raised. |
    # +---------+-----------------------------------------+
    # | None | Attribute is set to None. |
    # +=========+=========================================+

    # __init__

    # +--- init= parameter
    # |
    # v | | |
    # | no | yes | <--- class has __init__ in __dict__?
    # +=======+=======+=======+
    # | False | | |
    # +-------+-------+-------+
    # | True | add | | <- the default
    # +=======+=======+=======+

    # __repr__

    # +--- repr= parameter
    # |
    # v | | |
    # | no | yes | <--- class has __repr__ in __dict__?
    # +=======+=======+=======+
    # | False | | |
    # +-------+-------+-------+
    # | True | add | | <- the default
    # +=======+=======+=======+

    # __setattr__
    # __delattr__

    # +--- frozen= parameter
    # |
    # v | | |
    # | no | yes | <--- class has __setattr__ or __delattr__ in __dict__?
    # +=======+=======+=======+
    # | False | | | <- the default
    # +-------+-------+-------+
    # | True | add | raise |
    # +=======+=======+=======+
    # Raise because not adding these methods would break the "frozen-ness"
    # of the class.

    # __eq__

    # +--- eq= parameter
    # |
    # v | | |
    # | no | yes | <--- class has __eq__ in __dict__?
    # +=======+=======+=======+
    # | False | | |
    # +-------+-------+-------+
    # | True | add | | <- the default
    # +=======+=======+=======+

    # __lt__
    # __le__
    # __gt__
    # __ge__

    # +--- order= parameter
    # |
    # v | | |
    # | no | yes | <--- class has any comparison method in __dict__?
    # +=======+=======+=======+
    # | False | | | <- the default
    # +-------+-------+-------+
    # | True | add | raise |
    # +=======+=======+=======+
    # Raise because to allow this case would interfere with using
    # functools.total_ordering.

    # __hash__

    # +------------------- hash= parameter
    # | +----------- eq= parameter
    # | | +--- frozen= parameter
    # | | |
    # v v v | | |
    # | no | yes | <--- class has __hash__ in __dict__?
    # +=========+=======+=======+========+========+
    # | 1 None | False | False | | | No __eq__, use the base class __hash__
    # +---------+-------+-------+--------+--------+
    # | 2 None | False | True | | | No __eq__, use the base class __hash__
    # +---------+-------+-------+--------+--------+
    # | 3 None | True | False | None | | <-- the default, not hashable
    # +---------+-------+-------+--------+--------+
    # | 4 None | True | True | add | add* | Frozen, so hashable
    # +---------+-------+-------+--------+--------+
    # | 5 False | False | False | | |
    # +---------+-------+-------+--------+--------+
    # | 6 False | False | True | | |
    # +---------+-------+-------+--------+--------+
    # | 7 False | True | False | | |
    # +---------+-------+-------+--------+--------+
    # | 8 False | True | True | | |
    # +---------+-------+-------+--------+--------+
    # | 9 True | False | False | add | add* | Has no __eq__, but hashable
    # +---------+-------+-------+--------+--------+
    # |10 True | False | True | add | add* | Has no __eq__, but hashable
    # +---------+-------+-------+--------+--------+
    # |11 True | True | False | add | add* | Not frozen, but hashable
    # +---------+-------+-------+--------+--------+
    # |12 True | True | True | add | add* | Frozen, so hashable
    # +=========+=======+=======+========+========+
    # For boxes that are blank, __hash__ is untouched and therefore
    # inherited from the base class. If the base is object, then
    # id-based hashing is used.
    # Note that a class may have already __hash__=None if it specified an
    # __eq__ method in the class body (not one that was created by
    # @DataClass).

    @ncoghlan
    Copy link
    Contributor

    Eric's current proposal sounds sensible to me, but I'll note that if we deem it necessary, the code that implicitly sets __hash__ = None to override object.hash when eq is defined could also set some other marker to make it more explicit that that happened.

    I think Eric's proposed heuristic will be sufficient, though - for subclasses that inherit both __hash__ and __eq__ from a base class the dataclass decorator won't pay any attention to either of them, and if a base class has set __hash__ to something, then the type creation machinery won't inject "__hash__ = None" into the class being defined.

    @gvanrossum
    Copy link
    Member

    If Nick says it's okay it is okay. I think it's okay too but my brain
    doesn't want to start up today -- I suggest not waiting for me.

    @ericvsmith
    Copy link
    Member Author

    I'm completely burned out on it, too. I'll get the code checked in today, then update the PEP later. I don't really want to add all of this reasoning to the PEP, but I guess I should.

    I think the gist of it is correct, in any event. We can fix some corner cases during the betas, if need be.

    Thanks Guido and Nick for looking at it.

    @ericvsmith
    Copy link
    Member Author

    New changeset ea8fc52 by Eric V. Smith in branch 'master':
    bpo-32513: Make it easier to override dunders in dataclasses. (GH-5366)
    ea8fc52

    @gvanrossum
    Copy link
    Member

    Nit: I think the presentation of the 12-row table in msg310830 can be improved, e.g. by putting the legend for add* closer and/or giving add* a clearer name, and by splitting it into three 4-row tables. Currently the legend comes before all the other tables, and those aren't very useful (they're all the same, except for 'order', and the exception there is better explained in text).

    But that's a matter for whoever writes the docs. As a spec it is unambiguous. On python-dev I've just posted that I support this version.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants