New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataclasses: make it easier to use user-supplied special methods #76694
Comments
Modify dataclasses to make it easier to specify special methods. For example: currently, if you want to override __repr__, you need to specify @DataClass(repr=False), and also provide your own __repr__. Also, it's current an error to not specify repr=False and to also provide your own __repr__. @DataClass should be able to determine that if you provide __repr__, you don't want it to provide it's own __repr__. The methods that @DataClass will provide for you are: Per the discussion that started at https://mail.python.org/pipermail/python-dev/2017-December/151487.html, the change will be to override these methods only if:
The first item is to allow the user to continue to suppress method generation, and the second item is so that base-class methods are not considered in the decision. I'm still thinking through how __eq__ and __ne__ should be handled (as well as the ordering comparisons, too). I'll raise an issue on python-dev and point it here. |
Another case to consider is that if a class defines __eq__, Python will automatically set hash=False before the decorator gets involved. NOte to self: be sure to add a test case for this. |
See bpo-32546 for the hash=None problem. |
Here is my proposal for making it easier for the user to supply dunder methods that dataclasses would otherwise automatically add to the class. For all of these cases, when I talk about init=, repr=, eq=, order=, hash=, or frozen=, I'm referring to the parameters to the dataclass decorator. When checking if a dunder method already exists, I mean check for an entry in the class's __dict__. I never check to see if a member is defined in a base class. Let's get the easy ones out of the way first. __init__ __repr__ __setattr__ __eq__ And now the harder ones: __ne__ __lt__ __hash__ If hash=False, never generate __hash__. If hash=True, generate __hash__ unless it already exists. If hash=None (the default), then use this table to decide whether and how to generate __hash__: Note that it's almost always a bad idea to specify hash=True or hash=False. The action based on the above table (where hash=None, the default), is usually the correct behavior. One special case to recognize is if the class defines a __eq__. In this case, Python will assign __hash__=None before the dataclass decorator is called. The decorator cannot distinguish between these two cases (except possibly by using the order of __dict__ keys, but that seems overly fragile): @dataclass
class A:
def __eq__(self, other): pass
@dataclass
class B:
def __eq__(self, other): pass
__hash__ = None This is the source of the last line in the above table: for a dataclass where eq=True, frozen=True, and hash=None, if __hash__ is None it will still be overwritten. The assumption is that this is what the user wants, but it's a tricky corner case. It also occurs if setting hash=True and defining __eq__. Again, it's not expected to come up in normal usage. |
Do all the parameters to the decorator take a default value of "None", so that you can differentiate between explicit True, explicit False, and did-not-specify? Is this the complete list of these dunder-method-generation-control parameters? init repr eq order hash |
Only hash has the tri-state True/False/None behavior, defaulting to None. It's this way because None is the "do what's rational, based on eq and frozen" behavior. None of the other parameters work this way. There's a long issue in the attrs repo that describes how they came to this conclusion: python-attrs/attrs#136. I think it makes sense. None of the dataclass parameters have a sentinel that would let me detect if the user provided a value or not. In the case of hash, I can't detect if they explicitly passed hash=None or just didn't provide a value. I've given this some thought, and couldn't come up with a use case for knowing this. For example, if init had a sentinel value of MISSING, what would the code ever do except start with: In addition to your list of dunder-control-parameters, there's frozen. It determines if instances are immutable (to the extent that they can be made immutable). |
Oops, that should have been: if init is MISSING:
init = True |
So if I understand correctly: the default value of the "repr" parameter is True. Decision matrix: does dataclass insert a __repr__ into the class? +-- row: argument passed in to decorator's repr parameter
|
|
v | yes | no | <--- columns: does the class have a __repr__?
-------+---------+---------+
True | ??? | yes |
-------+---------+---------+
False | no | no |
-------+---------+---------+ If the user specifies "repr=True", and also specifies their own __repr__ in the class, does dataclasses stomp on their __repr__ with its own? Does it throw an exception? I still prefer the tri-state value here. In this version, the default value of the "repr" parameter would be None, and the decision matrix for inserting a __repr__ looks like this: +-- row: argument passed in to decorator's repr parameter
|
|
v | yes | no | <--- columns: does the class have a __repr__?
-------+---------+---------+
True | raise | yes |
-------+---------+---------+
None | no | yes |
-------+---------+---------+
False | no | no |
-------+---------+---------+ But we've talked about using the tri-state default for all of these parameters before, and clearly you weren't swayed then, so I guess I've said my peace and I'll stop suggesting it. |
The discussion on python-dev was that your ??? box would be "no": if the user supplied __repr__, they obviously meant for dataclass() to not provide one. I can't see why the user would say repr=True ("I want dataclass() to add __repr__"), but then provide a __repr__ and get an exception. That looks like the only functionality added by your repr=True row over the proposal. Where your proposal uses repr=None for the "no", "yes" row, mine uses repr=True. It's not like there's action at a distance here: the user is writing the class. Especially since base classes are ignored. I'm ignoring make_dataclasses(), where the user is dynamically creating a class and maybe a __repr__ snuck in. But I don't care so much about that case. I do think your ascii tables are a good way of explaining this. Thanks! (Now I need a 3D version for eq, frozen, hash!) |
For the ordering operators, my only question would be whether or not I can rely on them to act like functools.total_ordering: if I supply __eq__ and one of the ordering operators (e.g. __lt__), will dataclasses make sure the other three ordering operators are consistent with those base methods? Or will it bypass them and act directly on the underlying fields? My suggestion would be to say that if any of __lt__, __le__, __gt__ or __ge__ are defined, then data classes will implicitly generate the other methods based on functools.total_ordering semantics, and will only reference the underlying fields directly if *none* of them are defined. Otherwise I can see folks defining a single method like "__lt__", and being surprised when they end up with inconsistent comparison behaviour. |
Rather than re-implementing (and maintaining) functools.total_ordering semantics, I'd rather advise them to specify order=False and just use functools.total_ordering. It's an odd use case for dataclasses, anyway. |
I'd be fine with that recommendation (since |
Are we clear on the proposal now? It would be good to have this implemented in beta 1. Eric's long message looks good to me (though I have to admit the logic around __hash__ confuses me, but I've spent two long days at PyCascades, and I trust Eric's judgment). |
I'm working on a patch before beta 1. The only issue is whether to raise in the case that order=True and the user specifies one of the 4 ordering dunders. I think raising is okay, especially since order=False is the default: you'd have to go out of your way to cause the exception. I'll mention functools.total_ordering in the exception message. I do think that the discussion in my long message about __hash__ is correct, although I'm not done thinking it through. I think it's already pretty well tested, but if not I'll make sure there are enough test cases for objects that I think should or shouldn't be hashable that I'll have all of the kinks worked about before beta 1. |
Raising for order=True if one of the ordering dunders exists sounds fine. I am confused by the corner case for hash. Your table: """ Then you write at the end of that message: """ @dataclass
class A:
def __eq__(self, other): pass
@dataclass
class B:
def __eq__(self, other): pass
__hash__ = None This is the source of the last line in the above table: for a dataclass where eq=True, frozen=True, and hash=None, if __hash__ is None it will still be overwritten. The assumption is that this is what the user wants, but it's a tricky corner case. It also occurs if setting hash=True and defining __eq__. Again, it's not expected to come up in normal usage. I think I understand what you are saying there -- the two cases are treated the same, and a __hash__ is created (assuming the decorator is really "@DataClass(eq=True, frozen=True)"), overwriting the "__hash__ = None" for class B. However the table's last line says "generate __hash__ unless it already exists and is None". Perhaps that was a typo and you meant to write "and is *not* None"? |
I'll look at the hash question later today. It's sufficiently complex that I really want to sit down and work through all of the cases. |
Here's where I am on what methods are added under which circumstances. Hopefully these tables don't get mangled. The more I think about what to do with hash and raising exceptions, the more I think that the right answer is to not raise any exceptions, and just follow the rule of "don't overwrite an attribute that the user specified". The uses of hash=True or hash=None are so specialized that users can be expected to understand the implications. The interaction among hash=, eq=, and frozen= are sufficiently complicated (see the last table below) that we don't need to add any more rules. There are only 2 places below where exceptions are raised:
I have this all coded up and ready to create a PR, modulo a few more tests. I'll get it committed before Beta 1. Hopefully these tables answers the questions about the new proposal. # Conditions for adding methods: # __init__ # +--- init= parameter # __repr__ # +--- repr= parameter # __setattr__ # +--- frozen= parameter # * Raise because not adding these methods # __eq__ # +--- eq= parameter # __lt__ # +--- order= parameter # * Raise because to allow this case would interfere with using # __hash__ # +------------------- hash= parameter # For boxes that are blank, __hash__ is untouched and therefore |
I'm working on a better formulation of the __hash__ issue, so don't invest much time in this until I post my next message. I'm going to re-organize the various tables to make them easier to read. |
Can you also clarify that this *never* looks at whether the method is And clarify what happens with the __hash__ = None that's automatically Please also repeat the default value of the flag in each case. |
I apologize for the length of this, but I want to be as precise as I'm adding the commented text at the end of this message to I think the __init__, __repr__, __set/delattr__, __eq__, and ordering ** hash For __hash__, the 99.9% use case is that the default value of **** hash=None First, let's discuss hash=None, which is the default. This is lines Here's an example of line 1: @dataclass(hash=None, eq=False, frozen=False)
class A:
i: int The user doesn't want an __eq__, and the class is not frozen. Whether Here's an example of line 2: @dataclass(hash=None, eq=False, frozen=True)
class A:
i: int This is a frozen class, where the user doesn't want an __eq__. The Here's an example of line 3, "no" column (no __hash__). Note that @dataclass(hash=None, eq=True, frozen=False)
class A:
i: int In this case, if the user doesn't provide __hash__ (the "no" column), Here's an example of the line 3, "yes" column (hash defined): @dataclass(hash=None, eq=True, frozen=False)
class A:
i: int
def __hash__(self): pass Since a __hash__ exists, it will not be overwritten. Note that this Here's an example of line 4, "no" column: @dataclass(hash=None, eq=True, frozen=True)
class A:
i: int In this case the action is "add", and a __hash__ method will be And finally, consider line 4, "yes" column. This case is hash=None, **** hash=False Next, consider hash=False (rows 5-8). In this case, @DataClass will **** hash=True Lastly, consider hash=True. This is rows 9-12 of the __hash__ table. I'll give examples from line 9 of the __hash__ table, but this @dataclass(hash=True, eq=False, frozen=False)
class A:
i: int
__hash__ = None This is line 9, "yes" column, action="add*", which says to add a Now consider: @dataclass(hash=True, eq=False, frozen=False)
class A:
i: int
def __eq__(self, other): ... This again is line 9, "yes" column, action="add*". In this case, the Now consider: @dataclass(hash=True, eq=False, frozen=False)
class A:
i: int
def __eq__(self, other): ...
def __hash__(self): ... Again, this is line 9, "yes" column, action="add*". The existing Note that a class can pass the auto-hash test but not have an @dataclass(hash=True, eq=False, frozen=False)
class A:
i: int
def __eq__(self, other): ...
__hash__=None A special case to consider is lines 11 and 12 from the table. Here's an @dataclass(hash=True, eq=True, frozen=False)
class A:
i: int
__hash__=None The class will have a generated __eq__, because eq=True. However, the Tables follow: # Conditions for adding methods. The boxes indicate what action the # Key: # __init__ # +--- init= parameter # __repr__ # +--- repr= parameter # __setattr__ # +--- frozen= parameter # __eq__ # +--- eq= parameter # __lt__ # +--- order= parameter # __hash__ # +------------------- hash= parameter |
Eric's current proposal sounds sensible to me, but I'll note that if we deem it necessary, the code that implicitly sets I think Eric's proposed heuristic will be sufficient, though - for subclasses that inherit both __hash__ and __eq__ from a base class the dataclass decorator won't pay any attention to either of them, and if a base class has set __hash__ to something, then the type creation machinery won't inject "__hash__ = None" into the class being defined. |
If Nick says it's okay it is okay. I think it's okay too but my brain |
I'm completely burned out on it, too. I'll get the code checked in today, then update the PEP later. I don't really want to add all of this reasoning to the PEP, but I guess I should. I think the gist of it is correct, in any event. We can fix some corner cases during the betas, if need be. Thanks Guido and Nick for looking at it. |
Nit: I think the presentation of the 12-row table in msg310830 can be improved, e.g. by putting the legend for add* closer and/or giving add* a clearer name, and by splitting it into three 4-row tables. Currently the legend comes before all the other tables, and those aren't very useful (they're all the same, except for 'order', and the exception there is better explained in text). But that's a matter for whoever writes the docs. As a spec it is unambiguous. On python-dev I've just posted that I support this version. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: