classification
Title: dataclasses.asdict breaks with defaultdict fields
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: eric.smith, levkivskyi, p-ganssle, remi.lapeyre, wrmsr
Priority: normal Keywords: patch, patch, patch

Created on 2018-12-19 22:06 by wrmsr, last changed 2019-09-24 19:48 by p-ganssle.

Pull Requests
URL Status Linked Edit
PR 11361 open remi.lapeyre, 2018-12-30 00:55
PR 11361 open remi.lapeyre, 2018-12-30 00:55
PR 11361 open remi.lapeyre, 2018-12-30 00:55
PR 16356 open p-ganssle, 2019-09-24 14:17
Messages (4)
msg332166 - (view) Author: Will T (wrmsr) Date: 2018-12-19 22:06
_asdict_inner attempts to manually recursively deepcopy dicts by calling type(obj) with a generator of transformed keyvalue tuples @ https://github.com/python/cpython/blob/b2f642ccd2f65d2f3bf77bbaa103dd2bc2733734/Lib/dataclasses.py#L1080 . defaultdicts are dicts so this runs but unlike other dicts their first arg has to be a callable or None:

    import collections
    import dataclasses as dc

    @dc.dataclass()
    class C:
        d: dict

    c = C(collections.defaultdict(lambda: 3, {}))
    d = dc.asdict(c)

    assert isinstance(d['d'], collections.defaultdict)
    assert d['d']['a'] == 3

=>

    Traceback (most recent call last):
      File "boom.py", line 9, in <module>
        d = dc.asdict(c)
      File "/Users/spinlock/.pyenv/versions/3.7.1/lib/python3.7/dataclasses.py", line 1019, in asdict
        return _asdict_inner(obj, dict_factory)
      File "/Users/spinlock/.pyenv/versions/3.7.1/lib/python3.7/dataclasses.py", line 1026, in _asdict_inner
        value = _asdict_inner(getattr(obj, f.name), dict_factory)
      File "/Users/spinlock/.pyenv/versions/3.7.1/lib/python3.7/dataclasses.py", line 1058, in _asdict_inner
        for k, v in obj.items())
    TypeError: first argument must be callable or None

I understand that it isn't this bit of code's job to support every dict (and list etc.) subclass under the sun but given defaultdict is stdlib it's imo worth supporting explicitly.
msg332741 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-12-30 01:25
Hi @wrmsr, this happens because the constructor for `collections.defaultdict` differs from the one of `dict`.

I think the argument that collections.defaultdict is in the stdlib and should be supported is right, the changes in PR #11361 should do what you want.
msg353092 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2019-09-24 14:25
Considering that `namedtuple` is special-cased, I think it's reasonable to special-case `defaultdict` as well, though it may be worth considering more general solutions that will also work for things other than the standard library. One would be to solve this the same way that other "subclasses may have a different constructor" problems are solved (e.g. `float`, `int`, formerly `datetime`) and ignore the subclass (or selectively ignore it if it's a problem), for example changing _asdict_inner to something like this:

if isinstance(obj, dict):
    new_keys = tuple((_asdict_inner(k, dict_factory),
                      _asdict_inner(v, dict_factory))
                      for k, v in obj.items())

    try:
        return type(obj)(new_keys)
    except Exception:
        return dict(new_keys)

Another more general alternative would be to add a type registry for `asdict`, either as an additional parameter or with a new transformer class of some sort. I created a quick proof of concept for this in GH-16356 to see one way it could look.

In any case I think it's quite unfortunate that we can't easily just support anything that has a __deepcopy__ defined. There may be some crazy solution that involves passing a class with a custom __getitem__ to the `memo` argument of copy.deepcopy, but if it's even possible (haven't thought about it enough) I'm not sure it's *advisable*.
msg353116 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2019-09-24 19:48
I checked and it appears that `attrs` handles this by creating *all* dicts using the default dict_factory (similar to my original suggestion of just using `dict` instead of the specific type), if I'm reading this right: https://github.com/python-attrs/attrs/blob/master/src/attr/_funcs.py#L102

Using `attr.asdict` seems to bear this out, as `defaultdict` attributes are converted to `dict` when the dict factory is not specified.

I think changing the default behavior like that would be a backwards-incompatible change at this point (and one that it's really hard to warn about, unfortunately), but we could still use the "fall back to dict_factory" behavior by trying to construct a `type(obj)(...)` and in the case of an exception return `dict_factory(...)`.
History
Date User Action Args
2019-09-24 19:48:27p-gansslesetkeywords: patch, patch, patch

messages: + msg353116
2019-09-24 14:25:11p-gansslesetkeywords: patch, patch, patch
nosy: + p-ganssle
messages: + msg353092

2019-09-24 14:17:37p-gansslesetpull_requests: + pull_request15935
2018-12-30 01:25:41remi.lapeyresetnosy: + remi.lapeyre
messages: + msg332741
2018-12-30 00:55:46remi.lapeyresetkeywords: + patch
stage: patch review
pull_requests: + pull_request10684
2018-12-30 00:55:38remi.lapeyresetkeywords: + patch
stage: (no value)
pull_requests: + pull_request10683
2018-12-30 00:55:30remi.lapeyresetkeywords: + patch
stage: (no value)
pull_requests: + pull_request10682
2018-12-21 17:56:37levkivskyisetnosy: + levkivskyi
2018-12-19 22:32:30eric.smithsetassignee: eric.smith

nosy: + eric.smith
2018-12-19 22:06:44wrmsrcreate