This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: @dataclass defaults
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: anthony, eric.smith, gvanrossum, rhettinger, veky
Priority: normal Keywords:

Created on 2019-11-09 20:29 by anthony, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (11)
msg356306 - (view) Author: Anthony (anthony) Date: 2019-11-09 20:29
Given one of the motivations of @dataclass is to reduce boilerplate code then, in the context of @dataclass,

x: list = [] should be equal to x: list = field(default_factory=lambda: [])

The example in PEP 557 is not reasonable. It should be:

class D:
    def __init__(self, x=[]):
        self.x = x

That x = None works (without specifying a default factory, and is different from plain "x")  makes the whole "factory" argument even more bizarre. Why would a None Type work, but a List Type not?

I think either the behavior of this should be different or the docs should at address this more clearly.
msg356545 - (view) Author: Vedran Čačić (veky) * Date: 2019-11-13 18:11
Have you read https://github.com/ericvsmith/dataclasses/issues/3?
msg356552 - (view) Author: Anthony (anthony) Date: 2019-11-13 21:04
Thanks for adding it, I read it now.

And sorry to back track a moment - I love the idea of @dataclass and I can
only imagine how must work it was to implement as I am only a beginner.

I'm looking at this primarily from the narrow view point of a user - not so
much understanding the backdrop of how it can actually work.
But I still think this is worth considering as I think this is common
pattern based on this logic:

When using the default construction method it seems reasonable to pass a
default, such as a List.
Because
def __init__(x=[]):
  self.x = x
Is the cleanest way I can think of to write that.
In this new context, if x: list = [] is the cleanest way to write it.
Therefore it should do that, however it is actually implemented.

Let's imagine for a moment that @dataclass becomes the default way of
constructing classes in python.
Do we really want such a common case to require such verbose language?

Looking at Guido's first comment on this, I think that detection mechanism
is what I would expect to happen. To illustrate verbosely:
https://gist.github.com/swirlingsand/d59d01ef79c5ee93f430ed324199bc65
I clearly misunderstood the way classes get passed by default.
I continue to believe that defining it in the argument should be equal to
defining it after. (if there are no other context items present).

From a beginners perspective here, It would appear that in a sense the
default init is actually "silently failing" in that the expected (because
of the two instances not being equal to each other as shown in L10 above)
isolation is not happening.
In a sense then, @dataclass is turning that silent "failure" into a
ValueError which is actually better.

What about something like this:
https://gist.github.com/swirlingsand/2494bc482902fada2248698f7b8af61e
If the common case is for it to be None, then this doesn't fire so there's
no cost.
If it's a common default then it handles it as expected.
If it's not found it raises a ValueError like before, so there's no loss or
harm.
A handful of defaults here may cover 80% of cases and the 20% can continue
to be handled by ValueError
In a project with 40 classes with 10 defaults per class that's 400 lines of
code that look like:
[]
instead of
field(default_factory=lambda: [])
(or {} etc.)

Issue #3 has many comments around copying, but that's not my concern, I'm
just talking about the defaults where the attribute is not provided at all
(ie is None).

I did the above example in regular python since I don't know enough about
how @dataclass is implemented, but it seems reasonable that if it can work
in normal python there should be a way to say augment operation() on line
21 with field()

On Wed, Nov 13, 2019 at 10:11 AM Vedran Čačić <report@bugs.python.org>
wrote:

>
> Vedran Čačić <vedgar@gmail.com> added the comment:
>
> Have you read https://github.com/ericvsmith/dataclasses/issues/3?
>
> ----------
> nosy: +veky
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue38758>
> _______________________________________
>
msg356553 - (view) Author: Anthony (anthony) Date: 2019-11-13 21:13
To clarify,
A major assumption I'm making is that the default is empty, or the
"copying" is handled as some separately coupled concept.
A motivation here is wanting to do operations like .append() that depend on
the object existing.

On Wed, Nov 13, 2019 at 1:04 PM Anthony Sarkis <anthonysarkis@gmail.com>
wrote:

> Thanks for adding it, I read it now.
>
> And sorry to back track a moment - I love the idea of @dataclass and I can
> only imagine how must work it was to implement as I am only a beginner.
>
> I'm looking at this primarily from the narrow view point of a user - not
> so much understanding the backdrop of how it can actually work.
> But I still think this is worth considering as I think this is common
> pattern based on this logic:
>
> When using the default construction method it seems reasonable to pass a
> default, such as a List.
> Because
> def __init__(x=[]):
>   self.x = x
> Is the cleanest way I can think of to write that.
> In this new context, if x: list = [] is the cleanest way to write it.
> Therefore it should do that, however it is actually implemented.
>
> Let's imagine for a moment that @dataclass becomes the default way of
> constructing classes in python.
> Do we really want such a common case to require such verbose language?
>
> Looking at Guido's first comment on this, I think that detection mechanism
> is what I would expect to happen. To illustrate verbosely:
> https://gist.github.com/swirlingsand/d59d01ef79c5ee93f430ed324199bc65
> I clearly misunderstood the way classes get passed by default.
> I continue to believe that defining it in the argument should be equal to
> defining it after. (if there are no other context items present).
>
> From a beginners perspective here, It would appear that in a sense the
> default init is actually "silently failing" in that the expected (because
> of the two instances not being equal to each other as shown in L10 above)
> isolation is not happening.
> In a sense then, @dataclass is turning that silent "failure" into a
> ValueError which is actually better.
>
> What about something like this:
> https://gist.github.com/swirlingsand/2494bc482902fada2248698f7b8af61e
> If the common case is for it to be None, then this doesn't fire so there's
> no cost.
> If it's a common default then it handles it as expected.
> If it's not found it raises a ValueError like before, so there's no loss
> or harm.
> A handful of defaults here may cover 80% of cases and the 20% can continue
> to be handled by ValueError
> In a project with 40 classes with 10 defaults per class that's 400 lines
> of code that look like:
> []
> instead of
> field(default_factory=lambda: [])
> (or {} etc.)
>
> Issue #3 has many comments around copying, but that's not my concern, I'm
> just talking about the defaults where the attribute is not provided at all
> (ie is None).
>
> I did the above example in regular python since I don't know enough about
> how @dataclass is implemented, but it seems reasonable that if it can work
> in normal python there should be a way to say augment operation() on line
> 21 with field()
>
>
>
>
>
>
> On Wed, Nov 13, 2019 at 10:11 AM Vedran Čačić <report@bugs.python.org>
> wrote:
>
>>
>> Vedran Čačić <vedgar@gmail.com> added the comment:
>>
>> Have you read https://github.com/ericvsmith/dataclasses/issues/3?
>>
>> ----------
>> nosy: +veky
>>
>> _______________________________________
>> Python tracker <report@bugs.python.org>
>> <https://bugs.python.org/issue38758>
>> _______________________________________
>>
>
msg356554 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-11-13 21:14
The problem is that what you wrote isn't what most people want. Here's your example without dataclasses. I've added an "append_to_x" method, which does the obvious thing:


>>> class C:
...   def __init__(self, x=[]):
...      self.x = x
...  
...   def append_to_x(self, val):
...     self.x.append(val)
... 

Now create two objects, and inspect their "x" properties:
>>> a = C()
>>> b = C()
>>> a.x
[]
>>> b.x
[]

So far so good. Now append something to "a.x":
>>> a.append_to_x(10)
>>> a.x
[10]

And notice that "b.x" changes, too:
>>> b.x
[10]

So the naive behavior isn't what you want. dataclasses is trying to prevent you from doing this.

You should look at "mutable defaults", perhaps starting here (from a random Google search): https://blog.florimond.dev/python-mutable-defaults-are-the-source-of-all-evil
msg356555 - (view) Author: Anthony (anthony) Date: 2019-11-13 21:21
Hey Eric, I think our emails crossed in the wind, please see my comment that includes (as a sub component) a similar idea to what's proposed in the article you linked.
msg356562 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-11-13 22:19
I'm not sure what you're proposing.
msg356572 - (view) Author: Vedran Čačić (veky) * Date: 2019-11-14 04:24
It seems to me that what you're missing is that "class declarations" are still perfectly normal executable statements (in most other superficially similar programming languages, they are not). So, when you say

class A:
    b = []

it is actually executed, a new empty list is made, and named A.b. If you then construct a = A(), then a.b is that same object. It must be, you never made any other list in the process. So if you really want a.b to be a _different_ empty list, you have to make it at some point. The most obvious way to do it is just to copy the A.b --- that's why people usually talk about copying.

Your approach is different: it seems to me that you say, if A.b is a list, then make a new empty list, if it is a set, then make a new empty set, and if it is a dict, then make a new empty dict. Not to mention that it feels very weird when having e.g.

class A:
    b = [4]

(-:, it doesn't take into account any other types. Which leads to another your problem, the one of perspective. Lists, sets and dicts are not that "common case" as you seem to think. Yes, they are in the beginners' code -- and that's why current dataclass implementation flags them as errors, since it's quite possible that people who write such things don't actually understand how Python is executed. But specialcasing them to actually do something useful would be the wrong thing to do, since it would incentivize people who should know better, back into using those simple types. I think it is shown in the discussion mentioned.
msg356577 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-11-14 05:12
I concur with Vedran and recommend this proposal be rejected.

During training and code review, we spend a lot of time trying to get people to not write code like this.  By making dataclasses behave differently than the rest of language, we would would lose the clear, bright line we have now.  IOW, this would be bug bait.
msg356578 - (view) Author: Anthony (anthony) Date: 2019-11-14 06:19
Vedran thank you for the detailed comments.

I want to separate the idea side and implementation:
The idea is having defaults expressed in a way of least surprise and in the least amount of code.

Specifically that 
[1]
makes more sense then 
field(default_factory=lambda: [1])

or more generally that
default
makes more sense then 
field(default_factory=lambda: default)

I agree there's a lot lacking in my example starting point implementation.
Let's not let what's lacking in that implementation distract from the spirit of the idea.

The scope of that example is meat to illustrate creating a default that is of least surprising, specifically that a new instance of a class is isolated.
But I think the point of whether or not the default is isolated feels like a distraction. 

If a motivation for using type annotations is to restrict to reduce errors. To me, "prototyping" defaults by having them declared as soon as possible in the class is a further restriction. I believe that's a reasonable coding practice, and actually a good thing. 

To me this is the current situation, in order to setup a class using @dataclass a user must either:

a) Have a relatively in depth understanding of python. I think proofs of that include both PEP having to have a section justifying that. And that there was such discussion in the issue. 
A point of higher level languages is to nicely abstract lower level APIs, if I have to learn the lower level API in order to use the language it reduces the benefit, and that's really what the points made sound like.

b) Calling two "special" / additional functions and adding a (relatively) large amount of code. I can't think of anything else in python where the default method requires knowing two separate additional functions! Python is all about short, semantically meaningful expressions yet this feels like the total opposite!

If setting up defaults is a common point of error or "bad code" then why not setup the standard library in such a way that it does it right?
ie that a user can specify the default they want and the library sets up the default in the right way.
If that's as a separate object, or a mutable object, either way. The point being that setting up a default should be straightforward.

This is my first attempt at a contribution, even if all it is is highlighting a problem and the "solution" I'm suggesting may be the wrong one. But so far all of comments feel like they are defending a cascade of "not ideal" situations. Let's try actually looking at the root from the user's perspective and how can we improve this!
msg356579 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-11-14 06:29
This should not have been an issue but a discussion on python-list or perhaps python-ideas.
History
Date User Action Args
2022-04-11 14:59:23adminsetgithub: 82939
2019-11-14 06:29:44gvanrossumsetstatus: open -> closed
resolution: not a bug
messages: + msg356579

stage: resolved
2019-11-14 06:19:11anthonysetnosy: + gvanrossum
messages: + msg356578
2019-11-14 05:12:08rhettingersetnosy: + rhettinger
messages: + msg356577
2019-11-14 04:24:51vekysetmessages: + msg356572
2019-11-13 22:19:07eric.smithsetmessages: + msg356562
2019-11-13 21:21:13anthonysetmessages: + msg356555
2019-11-13 21:14:32eric.smithsetmessages: + msg356554
2019-11-13 21:13:31anthonysetmessages: + msg356553
2019-11-13 21:04:40anthonysetmessages: + msg356552
2019-11-13 18:11:11vekysetnosy: + veky
messages: + msg356545
2019-11-10 00:40:21eric.smithsetnosy: + eric.smith
2019-11-09 20:29:38anthonycreate