This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: dataclasses should define an empty __post_init__
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.11
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: NeilGirdhar, eric.smith, rhettinger, sobolevn, veky
Priority: normal Keywords: patch

Created on 2022-02-15 11:13 by NeilGirdhar, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 31430 closed sobolevn, 2022-02-19 07:12
PR 31523 merged eric.smith, 2022-02-23 04:33
Messages (15)
msg413283 - (view) Author: Neil Girdhar (NeilGirdhar) * Date: 2022-02-15 11:13
When defining a dataclass, it's possible to define a post-init (__post_init__) method to, for example, verify contracts.

Sometimes, when you inherit from another dataclass, that dataclass has its own post-init method.  If you want that method to also do its checks, you need to explicitly call it with super.  However, if that method doesn't exist calling it with super will crash.

Since you don't know whether your superclasses implement post-init or not, you're forced to check if the superclass has one or not, and call it if it does.

Essentially, post-init implements an "augmenting pattern" like __init__, ___enter__, __exit__, __array_finalize__, etc.  All such methods define an empty method at the top level so that child classes can safely call super.

Please consider adding such an empty method to dataclasses so that children who implement __post_init__ can safely call super.
msg413536 - (view) Author: Nikita Sobolev (sobolevn) * (Python triager) Date: 2022-02-19 07:09
I like this idea.

`attrs` right now behave the same way (no default `__attrs_post_init__`:

```
>>> import attrs
>>> @attrs.define
... class Some:
...   x: int
... 
>>> @attrs.define
... class Other(Some):
...    def __attrs_post_init__(self):
...       super().__attrs_post_init__()
...       self.x += 1
... 
>>> Other(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<attrs generated init __main__.Other>", line 3, in __init__
  File "<stdin>", line 4, in __attrs_post_init__
AttributeError: 'super' object has no attribute '__attrs_post_init__'
```

I am attaching a simple patch.

Alternative solution is to not merge this patch and recommend this instead:

```
def __post_init__(self):
  try:
    super().__post_init__()
  except AttributeError:
    pass
```
msg413538 - (view) Author: Vedran Čačić (veky) * Date: 2022-02-19 07:51
That "except AttributeError" approach is a powerful bug magnet, since it can very easily mask real attribute errors stemming from misspelled attribute names in the __post_init__ call itself. What you should _really_ do is

    def __post_init__(self):
        with contextlib.suppress(AttributeError):
            post_init = super().__post_init__
        post_init()

But of course, nobody will ever write that.

Raymond in his "super considered super" video (https://youtu.be/xKgELVmrqfs?t=2068) says the right thing to do is to make your own root which knows exactly what classes it manages, and drops the supercalls from them (after possibly verifying that all kwargs have actually been used and so on).

But in case of dataclasses, usually any class can serve as such a root, and the main reason people use dataclasses is to avoid boilerplate code. So I think it would be a nice compromise.
msg413540 - (view) Author: Neil Girdhar (NeilGirdhar) * Date: 2022-02-19 08:07
On Sat, Feb 19, 2022 at 2:51 AM Vedran Čačić <report@bugs.python.org> wrote:

>
> Vedran Čačić <vedgar@gmail.com> added the comment:
>
> That "except AttributeError" approach is a powerful bug magnet, since it
> can very easily mask real attribute errors stemming from misspelled
> attribute names in the __post_init__ call itself. What you should _really_
> do is
>
>     def __post_init__(self):
>         with contextlib.suppress(AttributeError):
>             post_init = super().__post_init__
>         post_init()
>
> But of course, nobody will ever write that.
>
> Great point!

Raymond in his "super considered super" video (
> https://youtu.be/xKgELVmrqfs?t=2068) says the right thing to do is to
> make your own root which knows exactly what classes it manages, and drops
> the supercalls from them (after possibly verifying that all kwargs have
> actually been used and so on).
>
> But in case of dataclasses, usually any class can serve as such a root,
> and the main reason people use dataclasses is to avoid boilerplate code. So
> I think it would be a nice compromise.
>

I'm not sure I understand what you're saying here.  Anyone can inherit from
a class C with the special root and some other class D, and then your
introduced root won't catch D's super calls.

>
> ----------
> nosy: +veky
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue46757>
> _______________________________________
>
msg413548 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2022-02-19 12:46
I'm not crazy about adding a method to every dataclass just for the 0.1% of the times it's needed.

I think using hasattr or catching the exception is a better way to go.
msg413553 - (view) Author: Neil Girdhar (NeilGirdhar) * Date: 2022-02-19 17:13
> I'm not crazy about adding a method to every dataclass just for the 0.1% of the times it's needed.

I'm not sure I totally understand the cost here.  You can have a single shared global function that you set on each dataclass.  So the only cost would be an entry in each dataclass type's dict.  Even if a user creates a thousand dataclasses, that should only be tens of killobytes in pointers.

> I think using hasattr or catching the exception is a better way to go.

If you choose this, then I think this should be documented under the __post_init__ saying that any time you define __post_init__, you should either be a final class, or else call super.  If you call super, you musteither use hasattr(super().__post_init__) or catch the exception.

I have to admit, I find this quite ugly from a user perspective.
msg413555 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2022-02-19 17:44
The fact that it's never been needed in the years that dataclasses and attrs have existed tell me it's kind of a niche requirement.

This does not seem like the ugliest code I've ever seen:

        if hasattr(super(), "__post_init__"):
            super().__post_init__()

or the probably more efficient, but more lines of code:

        try:
            post_init = super().__post_init__
        except AttributeError:
            pass
        else:
            post_init()

As always with calling super functions, the whole hierarchy needs to cooperate. Say you had a dataclass:

@dataclass
class Base:
    def __post_init__(self, some_arg):
        pass

How would an arbitrary derived class know how to call this? It can't. There has to be knowledge of the base class's requirements already. Surely knowing "__post_init__ must be called with some_arg" isn't too different from "I know __post_init__ doesn't exist". I don't think adding ways to make the "always call super" pattern easier is a good idea.

I'm still unconvinced, but I'll hold off on making a decision to see if there's more support. Maybe taking it to python-ideas would be worthwhile.
msg413588 - (view) Author: Neil Girdhar (NeilGirdhar) * Date: 2022-02-20 14:48
> How would an arbitrary derived class know how to call this? It can't. There has to be knowledge of the base class's requirements already. Surely knowing "__post_init__ must be called with some_arg" isn't too different from "I know __post_init__ doesn't exist".

This is exactly the same problem you have with all other "augmenting methods" that have arbitrary parameters (e.g., __init__).  When calling super on a non-final class you could simply forward keyword arguments.


@dataclass
class X:
    def __post_init__(self, **kwargs):
        super().__post_init__(**kwargs)
        ...

@dataclass
class Y(X):
    def __post_init__(self, **kwargs):
        super().__post_init__(**kwargs)
        ...

> I'm still unconvinced, but I'll hold off on making a decision to see if there's more support. Maybe taking it to python-ideas would be worthwhile.

Sounds good, done:  https://groups.google.com/g/python-ideas/c/-gctNaSqgew
msg413626 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-02-21 02:00
-1

* I concur with Eric that this is mostly not needed.  Probably 99.95% of dataclass use case don't need this.  When it is needed, it is trivial to implement and probably should be explicit rather that implicit.

* Vedran is also correct in noting that it would sometimes mask errors.

* There is a runtime cost for all dataclasses that do not use __post_init__.  Calling an empty method takes time.  We shouldn't make all users pay for a feature that almost no one needs.

* With respect to use of super(), it might help a little bit, but only because __post_init__() takes no arguments.  For the other methods in cooperative multiple inheritance, a user would need to either incorporate a straight forward hasattr() check or add a terminal root class as described in super-considered-super.  I see no reason to make __post_init__ a special case that would differ from other methods (for example, object() doesn't provide an empty __call__ or __getitem__).

* Adding a default __post_init__ will create a new problem.  Currently, it is possible to easily introspect and determine whether a __post_init__ has been provided, but if there is an empty default, it becomes a much more tricky test.  We had this problem with functools.total_ordering.  When that tool was first created, we could easily use hasattr() to test for a user defined rich comparison method.  But after default methods were added to object(), the test because much more delicate:  ``getattr(cls, op, None) is not getattr(object, op, None)``.  Likewise the Hashable ABC cannot just use hasattr() because __hash__() is always present and has to be set to None to turn it off.  A default __post_init__ is worse than both cases.  You can't test for it, so you just have to call it, not knowing in advance whether it would do anything.

* Anyone supporting multiple versions of Python will still need to write the hasattr() check or provide a terminal/root class.  They won't be able to rely on the default being present.

I recommend leaving dataclasses as is.
msg413690 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-02-22 03:50
Note that adding an empty __post_init__ method would be a breaking change.  The following prints out 'B' then 'C'.  But if class A adds an empty __post_init__, then 'B' never gets printed.  The arrangement relies on class A being a passthrough to class B.

    from dataclasses import dataclass

    @dataclass
    class A:
        pass

    @dataclass
    class B:
        def __post_init__(self):
            print('B')

    @dataclass
    class C(A, B):
        def __post_init__(self):
            super().__post_init__()
            print('C')

    c = C()
msg413691 - (view) Author: Neil Girdhar (NeilGirdhar) * Date: 2022-02-22 04:32
@Raymond yeah I've been thinking about this some more, and there's no way to have a "top level" method with the dataclass decorator.

I think I will make a case to have a class version of dataclasses that works with inheritance.  Class versions of dataclasses are used in some places like here: https://github.com/google/flax/blob/main/flax/struct.py#L184
They just call dataclass on the class in __init_subclass__.

If we had something like this in the standard library, then you could put your empty __post_init__ in that class.  You could also make __init__ call super so that mixins would be initialized (right now the collider pattern you showed also breaks if B is not a dataclass, and has a non-trivial __init__).
msg413710 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2022-02-22 15:45
I'm going to close this issue. As Raymond says, it's a breaking change, and the workaround is easy enough.
msg413770 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2022-02-23 04:36
I'm adding a test that mimic's Raymond's example of the proposed addition being a breaking change. This way, if we ever decide to actually add this feature, we'll break this test. If we do decide to continue and make the change anyway, at least we'll do so with the knowledge that it's a breaking change.
msg413771 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2022-02-23 05:14
New changeset 288af845a32fd2a92e3b49738faf8f2de6a7bf7c by Eric V. Smith in branch 'main':
bpo-46757: Add a test to verify dataclass's __post_init__ isn't being automatically added. (GH-31523)
https://github.com/python/cpython/commit/288af845a32fd2a92e3b49738faf8f2de6a7bf7c
msg413775 - (view) Author: Neil Girdhar (NeilGirdhar) * Date: 2022-02-23 06:24
@eric

Good thinking.  Would it make sense to add to the documentation as well that the __post_init__ methods aren't collected, and you should call super's __post_init__ if there is one using something like

        if hasattr(super(), "__post_init__"):
            super().__post_init__()

Noting this will make it easier to point to the docs if someone wonders when they see code like this.
History
Date User Action Args
2022-04-11 14:59:56adminsetgithub: 90913
2022-02-23 06:24:18NeilGirdharsetmessages: + msg413775
2022-02-23 05:14:43eric.smithsetmessages: + msg413771
2022-02-23 04:36:11eric.smithsetmessages: + msg413770
2022-02-23 04:33:21eric.smithsetpull_requests: + pull_request29650
2022-02-22 15:45:31eric.smithsetstatus: open -> closed
resolution: rejected
messages: + msg413710

stage: patch review -> resolved
2022-02-22 04:32:24NeilGirdharsetmessages: + msg413691
2022-02-22 03:50:24rhettingersetmessages: + msg413690
2022-02-21 02:00:28rhettingersetnosy: + rhettinger
messages: + msg413626
2022-02-20 15:45:50eric.smithsetassignee: eric.smith
2022-02-20 14:48:08NeilGirdharsetmessages: + msg413588
2022-02-19 17:44:03eric.smithsetmessages: + msg413555
2022-02-19 17:13:14NeilGirdharsetmessages: + msg413553
2022-02-19 12:46:37eric.smithsetmessages: + msg413548
2022-02-19 08:07:03NeilGirdharsetmessages: + msg413540
2022-02-19 07:51:27vekysetnosy: + veky
messages: + msg413538
2022-02-19 07:42:14sobolevnsettype: behavior
versions: + Python 3.11
2022-02-19 07:12:06sobolevnsetkeywords: + patch
stage: patch review
pull_requests: + pull_request29565
2022-02-19 07:09:51sobolevnsetnosy: + sobolevn
messages: + msg413536
2022-02-15 11:39:43xtreaksetnosy: + eric.smith
2022-02-15 11:13:27NeilGirdharcreate