This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: dataclass defaults and property don't work together
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: Michael Robellard, Thomas701, UnHumbleBen, eric.smith, iivanyuk, juanpa.arrivillaga
Priority: normal Keywords:

Created on 2020-01-07 17:35 by Michael Robellard, last changed 2022-04-11 14:59 by admin.

Messages (19)
msg359528 - (view) Author: Michael Robellard (Michael Robellard) Date: 2020-01-07 17:35
I ran into a strange issue while trying to use a dataclass together with a property.

I have it down to a minumum to reproduce it:

import dataclasses

@dataclasses.dataclass
class FileObject:
    _uploaded_by: str = dataclasses.field(default=None, init=False)
    uploaded_by: str = None

    def save(self):
        print(self.uploaded_by)

    @property
    def uploaded_by(self):
        return self._uploaded_by

    @uploaded_by.setter
    def uploaded_by(self, uploaded_by):
        print('Setter Called with Value ', uploaded_by)
        self._uploaded_by = uploaded_by

p = FileObject()
p.save()
This outputs:

Setter Called with Value  <property object at 0x7faeb00150b0>
<property object at 0x7faeb00150b0>
I would expect to get None instead

Here is the StackOverflow Question where I started this:
https://stackoverflow.com/questions/59623952/weird-issue-when-using-dataclass-and-property-together
msg359564 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-01-08 00:32
Your code basically becomes similar to this:

sentinel = object()

class FileObject:
    _uploaded_by: str = None
    uploaded_by = None

    def __init__(self, uploaded_by=sentinel):
        if uploaded_by is sentinel:
            self.uploaded_by = FileObject.uploaded_by
        else:
            self.uploaded_by = uploaded_by

    def save(self):
        print(self.uploaded_by)

    @property
    def uploaded_by(self):
        return self._uploaded_by

    @uploaded_by.setter
    def uploaded_by(self, uploaded_by):
        print('Setter Called with Value ', uploaded_by)
        self._uploaded_by = uploaded_by

Which has the same problem. I'll have to give it some thought.
msg359757 - (view) Author: Juan Arrivillaga (juanpa.arrivillaga) Date: 2020-01-10 21:12
So, after glancing at the source code:
https://github.com/python/cpython/blob/ce54519aa09772f4173b8c17410ed77e403f3ebf/Lib/dataclasses.py#L869

During this processing of fields, couldn't you just special case property/descriptor objects?
msg359764 - (view) Author: Juan Arrivillaga (juanpa.arrivillaga) Date: 2020-01-10 22:24
Actually, couldn't the following be a workaround, just set the property on the class after the class definition:


import dataclasses
import typing
@dataclasses.dataclass
class FileObject:
    uploaded_by:typing.Optional[None]=None

    def _uploaded_by_getter(self):
        return self._uploaded_by

    def _uploaded_by_setter(self, uploaded_by):
        print('Setter Called with Value ', uploaded_by)
        self._uploaded_by = uploaded_by

FileObject.uploaded_by = property(
    FileObject._uploaded_by_getter,
    FileObject._uploaded_by_setter
)
p = FileObject()
print(p)
print(p.uploaded_by)
msg359774 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-01-11 02:23
> During this processing of fields, couldn't you just special case property/descriptor objects?

What if you want the field to be a descriptor?

I think the best way of handling this would be to use some sentinel value for the default, and if found look up the value on the instance, not the class.

But I'm a little worried this might break something else.
msg366546 - (view) Author: Juan Arrivillaga (juanpa.arrivillaga) Date: 2020-04-15 20:00
But when would you want to have a descriptor as an instance attribute? Descriptors must be in the class dictionary to work:

https://docs.python.org/3/reference/datamodel.html#implementing-descriptors

I suppose, you could want some container class of descriptor objects, but that seems like an extremely narrow use-case, compared to the normal and common use-case of descriptors acting like descriptors. I think special-casing descriptors make sense because they act in a special way.
msg371820 - (view) Author: Ivan Ivanyuk (iivanyuk) Date: 2020-06-18 16:08
Was there some solution in progress here? We would like to use dataclasses and seems this problem currently limits their usefulness to us.

We recently came upon the same behaviour https://mail.python.org/pipermail/python-list/2020-June/897502.html and I was wondering if it was possible to make it work without changing the property decorator behaviour. Is there a way at all to preserve the default value on the class with @property even before dataclass starts processing it? 

 An example from that mail thread to workaround this:

 from dataclasses import dataclass, field
 
 def set_property():
     Container.x = property(Container.get_x, Container.set_x)
     return 30
 
 @dataclass
 class Container:
     x: int = field(default_factory=set_property)
 
     def get_x(self) -> int:
         return self._x
 
     def set_x(self, z: int):
         if z > 1:
             self._x = z
         else:
             raise ValueError

set_property can also be made a class method and referenced like this:
 x: int = field(default_factory=lambda: Container.set_property())

Is it possible that this kind of behaviour can be made one of standard flows for the field() function and dataclasses module can generate a function like this and set it on the class during processing?
 Or maybe it's better to extend @property decorator to update property object with default value which can be used later by the dataclass?
msg395191 - (view) Author: Benjamin Lee (UnHumbleBen) Date: 2021-06-06 00:25
Would this issue not be trivially resolved if there was a way to specify alias in the dataclasses field? I.e.:

_uploaded_by: str = dataclasses.field(alias="uploaded_by", default=None, init=False)

Ultimately, the main goal is to make it so that the generated __init__ constructor does

self._uploaded_by = uploaded_by

but with current implementation, there is no aliasing so the default __init__ constructor is always:

self._uploaded_by = _uploaded_by
msg395192 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-06-06 01:08
> _uploaded_by: str = dataclasses.field(alias="uploaded_by", default=None, init=False)

That's an interesting idea. I'll play around with it. I'm not sure "alias" feels quite right, as it only applies to __init__ (if I'm understanding it correctly).
msg395195 - (view) Author: Michael Robellard (Michael Robellard) Date: 2021-06-06 01:33
The sample I uploaded doesn't do any processing, but the use case originally had some logic inside the property getter/setter, would the alias idea allow for that? The purpose of the property is to add some logic to compute the value if it has not already been computed, however if it is computed don't recompute it because it is expensive to recompute.
msg395196 - (view) Author: Benjamin Lee (UnHumbleBen) Date: 2021-06-06 01:45
> I'm not sure "alias" feels quite right, as it only applies to __init__ (if I'm understanding it correctly).

Maybe `init_alias` might be a better name. In any case, this would support private variables in dataclasses.
msg404617 - (view) Author: Thomas (Thomas701) Date: 2021-10-21 16:49
Hello everyone,

A quick look on SO and Google + this python issue + this blog post and its comments: https://florimond.dev/en/posts/2018/10/reconciling-dataclasses-and-properties-in-python/ show that this is still a problem where dataclass users keep hitting a wall.

The gist here seems to be that there's two ways to solve this:
- have descriptor be treated differently when found as default value in the __init__. I like this solution. The argument against is that users might want to have the descriptor object itself as an instance attribute and this solution would prevent them from doing it. I'd argue that, if the user intention was to have the descriptor object as a default value, the current dataclass implementation allows it in a weird way: as shown above, it actually sets and gets the descriptor using the descriptor as its own getter/setter (although it makes sense when one thinks of how dataclass are implemented, specifically "when" the dataclass modifies the class, it is nonetheless jarring at first glance).

- add an "alias/name/public_name/..." keyword to the field constructor so that we could write _bar: int = field(default=4, alias="bar"). The idea here keeps the usage of this alias to the __init__ method but I'd go further. The alias should be used everywhere we need to show the public API of the dataclass (repr, str, to_dict, ...). Basically, if a field has an alias, we only ever show / give access to the alias and essentially treat the original attribute name as a private name (i.e.: if the dataclass maintainer changes the attribute name, none of the user code should break).

I like both solutions for the given problem but I still have a preference for the first, as it covers more cases that are not shown by the example code: what if the descriptor doesn't delegate to a private field on the class? It is a bit less common, but one could want to have a field in the init that delegates to a resource that is not a field on the dataclass. The first solution allows that, the second doesn't.

So I'd like to propose a variation of the first solution that, hopefully, also solves the counter argument to that solution:

@dataclass
class FileObject:
    _uploaded_by: str = field(init=False)

    @property
    def uploaded_by(self):
        return self._uploaded_by

    @uploaded_by.setter
    def uploaded_by(self, uploaded_by):
        print('Setter Called with Value ', uploaded_by)
        self._uploaded_by = uploaded_by

    uploaded_by: str = field(default=None, descriptor=uploaded_by)


Basically, add an argument to the field constructor that allows developers to tell the dataclass constructor that this field requires special handling: in the __init__, it should use the default value as it would do for normal fields but at the class level, it should install the descriptor, instead of the default value.

What do you think ?
msg404624 - (view) Author: Thomas (Thomas701) Date: 2021-10-21 17:09
Thinking a little more about this, maybe a different solution would be to have default values be installed at the class level by default without being overwritten in the init, as is the case today. default_factory should keep being set in the init as is the case today.

With this approach:

@dataclass
class Foo:
    bar = field(default=4)
    # assigns 4 to Foo.bar but not to foo.bar (bonus: __init__ will be faster)

    bar = field(default=some_descriptor)
    # assigns some_descriptor to Foo.bar, so Foo().bar does a __get__ on the descriptor

    bar = field(default_factory=SomeDescriptor)
    # assigns a new SomeDescriptor instance to every instance of Foo

    bar = field(default_factory=lambda: some_descriptor)
    # assigns the same descriptor object to every instance of Foo

I don't think this change would break a lot of existing code as the attribute overwrite that happens at the instance level in the __init__ is essentially an implementation detail. It also seems this would solve the current problem and allow for a cleaner way to assign a descriptor object as a default value. Am I not seeing some obvious problem here ?
msg404625 - (view) Author: Thomas (Thomas701) Date: 2021-10-21 17:27
Scratch that last one, it leads to problem when mixing descriptors with actual default values:

@dataclass
class Foo:
    bar = field(default=some_descriptor)
    # technically this is a descriptor field without a default value or at the very least, the dataclass constructor can't know because it doesn't know what field, if any, this delegates to. This means this will show up as optional in the __init__ signature but it might not be.

    bar = field(default=some_descriptor, default_factory=lambda:4)
    # this could be a solve for the above problem. The dc constructor would install the constructor at the class level and assign 4 to the instance attribute in the __init__. Still doesn't tell the dc constructor if a field is optional or not when it's default value is a descriptor and no default_factory is passed. And it feels a lot more like hack than anything else.


So ignore my previous message. I'm still 100% behind the "descriptor" arg in the field constructor, though :)

PS: Sorry for the noise, I just stumbled onto this problem for the nth-times and I can't get my brain to shut off.
msg404648 - (view) Author: Michael Robellard (Michael Robellard) Date: 2021-10-21 20:25
I can confirm that Juan Arrivillaga (juanpa.arrivillaga) workaround does work. 

Given that it works, then wouldn't it be relatively trivial to do what Thomas701 suggests and add a descriptor parameter to fields. Then apply the descriptor after all the other work is done so that it doesn't get clobbered, which is basically reproducing the workaround.

import dataclasses
@dataclasses.dataclass
class FileObject:
    _uploaded_by: str = dataclasses.field(default=None, init=False)

    def _uploaded_by_getter(self):
        return self._uploaded_by

    def _uploaded_by_setter(self, uploaded_by):
        print('Setter Called with Value ', uploaded_by)
        self._uploaded_by = uploaded_by

    uploaded_by: str = field(default=None, descriptor=property(
        FileObject._uploaded_by_getter, 
        FileObject._uploaded_by_setter))

p = FileObject()
print(p)
print(p.uploaded_by)

This would allow any descriptor to be applied to a dataclass field. If we allow descriptor to accept an iterable as well you could have multiple descriptors just like normal.
msg404684 - (view) Author: Thomas (Thomas701) Date: 2021-10-21 22:12
Agreed on everything but that last part, which I'm not sure I understand:
> If we allow descriptor to accept an iterable as well you could have multiple descriptors just like normal.
Could you give an example of what you mean with a regular class?

I've had a bit more time to think about this and I think one possible solution would be to mix the idea of a "descriptor" argument to the field constructor and the idea of not applying regular defaults at __init__ time.


Basically, at dataclass construction time (when the @dataclass decorator inspects and enhances the class), apply regular defaults at the class level unless the field has a descriptor argument, then apply that instead at the class level. At __init__ time, apply default_factories only unless the field has a descriptor argument, then do apply the regular default value.

If the implementation changed in these two ways, we'd have code like this work exactly as expected:

from dataclasses import dataclass, field


@dataclass
class Foo:
    _bar: int = field(init=False)
    
    @property
    def bar(self):
        return self._bar

    @bar.setter
    def bar(self, value):
        self._bar = value
    
    # field is required,
    # uses descriptor bar for get/set
    bar: int = field(descriptor=bar)

    # field is optional,
    # default of 5 is set at __init__ time
    # using the descriptor bar for get/set,
    bar: int = field(descriptor=bar, default=5)

    # field is optional,
    # default value is the descriptor instance,
    # it is set using regular attribute setter
    bar: int = field(default=bar)

Not only does this allow for descriptor to be used with dataclasses, it also fixes the use case of trying to have a descriptor instance as a default value because the descriptor wouldn't be used to get/set itself.

Although I should say, at this point, I'm clearly seeing this with blinders on to solve this particular problem... It's probable this solution breaks something somewhere that I'm not seeing. Fresh eyes appreciated :)
msg404685 - (view) Author: Michael Robellard (Michael Robellard) Date: 2021-10-21 22:18
An example of multiple descriptors would be to have:

@cached_property
@property
def expensive_calc(self):
    #Do something expensive
msg404694 - (view) Author: Thomas (Thomas701) Date: 2021-10-21 22:31
Just to rephrase, because the explanation in my last message can be ambiguous:

At dataclass construction time (when the @dataclass decorator inspects and enhances the class):

for field in fields:
    if descriptor := getattr(field, 'descriptor'):
        setattr(cls, field.name, descriptor)
    elif default := getattr(field, 'default'):
        setattr(cls, field.name, default)


Then at __init__ time:

for field in fields:
    if (
        (descriptor := getattr(field, 'descriptor'))
        and (default := getattr(field, 'default'))
    ):
        setattr(self, field.name, default)
    elif default_factory := getattr(field, 'default_factory'):
        setattr(self, field.name, default_factory())

Now, this is just pseudo-code to illustrate the point, I know the dataclass implementation generates the __init__ on the fly by building its code as a string then exec'ing it. This logic would have to be applied to that generative code.

I keep thinking I'm not seeing some obvious problem here, so if something jumps out let me know.
msg404700 - (view) Author: Thomas (Thomas701) Date: 2021-10-21 22:51
> An example of multiple descriptors would be to have:
> @cached_property
> @property
> def expensive_calc(self):
>     #Do something expensive

That's decorator chaining. The example you gave is not working code (try to return something from expensive_calc and print(obj.expensive_calc()), you'll get a TypeError). Correct me if I'm wrong, but I don't think you can chain descriptors the way you want unless the descriptors themselves have knowledge that they're acting on descriptors. E.g., given:

class Foo:
    @descriptorA
    @descriptorB
    def bar(self):
        return 5

You would need descriptorA to be implemented such that its __get__ method return .__get__() of whatever it was wrapping (in this case descriptorB).

Either way, at the class level (I mean the Foo class, the one we'd like to make a dataclass), all of this doesn't matter because it only sees the outer descriptor (descriptorA). Assuming the proposed solution is accepted, you would be able to do this:

@dataclass
class Foo:
    @descriptorA
    @descriptorB
    def bar(self):
        return some_value
    
    @bar.setter
    def bar(self, value):
        ...  # store value
    
    bar: int = field(descriptor=bar)

and, assuming descriptorA is compatible with descriptorB on both .__get__ and .__set__, as stated above, it would work the way you intend it to.
History
Date User Action Args
2022-04-11 14:59:25adminsetgithub: 83428
2021-10-21 22:51:04Thomas701setmessages: + msg404700
2021-10-21 22:31:27Thomas701setmessages: + msg404694
2021-10-21 22:18:26Michael Robellardsetmessages: + msg404685
2021-10-21 22:12:24Thomas701setmessages: + msg404684
2021-10-21 20:25:23Michael Robellardsetmessages: + msg404648
versions: + Python 3.10, Python 3.11
2021-10-21 17:27:36Thomas701setmessages: + msg404625
2021-10-21 17:09:20Thomas701setmessages: + msg404624
2021-10-21 16:49:32Thomas701setnosy: + Thomas701
messages: + msg404617
2021-06-06 01:45:14UnHumbleBensetmessages: + msg395196
2021-06-06 01:33:02Michael Robellardsetmessages: + msg395195
2021-06-06 01:08:57eric.smithsetmessages: + msg395192
2021-06-06 00:25:00UnHumbleBensetnosy: + UnHumbleBen
messages: + msg395191
2020-06-18 16:08:07iivanyuksetnosy: + iivanyuk
messages: + msg371820
2020-04-15 20:00:22juanpa.arrivillagasetmessages: + msg366546
2020-01-11 02:23:39eric.smithsetmessages: + msg359774
2020-01-10 22:24:53juanpa.arrivillagasetmessages: + msg359764
2020-01-10 21:12:14juanpa.arrivillagasetnosy: + juanpa.arrivillaga
messages: + msg359757
2020-01-08 00:32:39eric.smithsetmessages: + msg359564
2020-01-07 19:40:37eric.smithsetassignee: eric.smith
2020-01-07 18:19:06xtreaksetnosy: + eric.smith
2020-01-07 17:35:37Michael Robellardcreate