Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataclasses.asdict() mishandles dataclass instance attributes that are instances of subclassed typing.NamedTuple #78544

Closed
alexdelorenzo mannequin opened this issue Aug 8, 2018 · 18 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@alexdelorenzo
Copy link
Mannequin

alexdelorenzo mannequin commented Aug 8, 2018

BPO 34363
Nosy @rhettinger, @ericvsmith, @NeilGirdhar, @ilevkivskyi, @alexdelorenzo
PRs
  • bpo-34363: Handle namedtuples fields correctly in asdict() and astuple(). #8728
  • bpo-34363: dataclasses.asdict() and .astuple() now handle fields which are namedtuples. #9151
  • [3.7] bpo-34363: dataclasses.asdict() and .astuple() now handle fields which are namedtuples. (GH-9151) #9304
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericvsmith'
    closed_at = <Date 2018-09-14.18:08:11.258>
    created_at = <Date 2018-08-08.19:41:25.876>
    labels = ['3.7', '3.8', 'type-bug', 'library']
    title = 'dataclasses.asdict() mishandles dataclass instance attributes that are instances of subclassed typing.NamedTuple'
    updated_at = <Date 2018-09-16.16:27:38.430>
    user = 'https://github.com/alexdelorenzo'

    bugs.python.org fields:

    activity = <Date 2018-09-16.16:27:38.430>
    actor = 'levkivskyi'
    assignee = 'eric.smith'
    closed = True
    closed_date = <Date 2018-09-14.18:08:11.258>
    closer = 'eric.smith'
    components = ['Library (Lib)']
    creation = <Date 2018-08-08.19:41:25.876>
    creator = 'alexdelorenzo'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34363
    keywords = ['patch']
    message_count = 18.0
    messages = ['323298', '323300', '323305', '323312', '323320', '323352', '323353', '323364', '323377', '323380', '323383', '323384', '323391', '324971', '325046', '325353', '325372', '325494']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'eric.smith', 'NeilGirdhar', 'levkivskyi', 'alexdelorenzo']
    pr_nums = ['8728', '9151', '9304']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue34363'
    versions = ['Python 3.7', 'Python 3.8']

    @alexdelorenzo
    Copy link
    Mannequin Author

    alexdelorenzo mannequin commented Aug 8, 2018

    Example:

    from typing import NamedTuple
    from dataclasses import dataclass, asdict
    
    class NamedTupleAttribute(NamedTuple):
        example: bool = True
    
    @dataclass
    class Data:
        attr1: bool
        attr2: NamedTupleAttribute
    
    data = Data(True, NamedTupleAttribute(example=True))
    namedtuple_attr = asdict(data)['attr2']
    print(type(namedtuple_attr.example))
    >>> generator

    One would expect that the printed type would be of type bool.

    @alexdelorenzo alexdelorenzo mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Aug 8, 2018
    @ilevkivskyi
    Copy link
    Member

    OK, so the crux of the bug is this difference:

    >>> a = (1, 2)
    >>> tuple(x for x in a)
    (1, 2)
    >>> NamedTupleAttribute(x for x in a)
    NamedTupleAttribute(example=<generator object <genexpr> at 0x10e2e52a0>)

    A potential solution would be to either use type(obj) in (list, tuple) instead of isinstance(obj, (list, tuple)) (and thus cause using copy.deepcopy for everything else), but this might break some use cases (IMO quite unlikely).

    Any other thoughts?

    @ericvsmith ericvsmith added stdlib Python modules in the Lib dir 3.8 only security fixes and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Aug 8, 2018
    @ericvsmith ericvsmith self-assigned this Aug 8, 2018
    @ericvsmith
    Copy link
    Member

    Oops, didn't see that you commented on this, Ivan.

    I'll do some analysis tomorrow.

    @ericvsmith
    Copy link
    Member

    Thanks for the pointer, Ivan.

    I haven't really thought it through yet, but this fixes the problem at hand:

    diff --git a/dataclasses.py b/dataclasses.py
    index ba34f6b..54916ee 100644
    --- a/dataclasses.py
    +++ b/dataclasses.py
    @@ -1019,7 +1019,7 @@ def _asdict_inner(obj, dict_factory):
                 result.append((f.name, value))
             return dict_factory(result)
         elif isinstance(obj, (list, tuple)):
    -        return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
    +        return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
         elif isinstance(obj, dict):
             return type(obj)((_asdict_inner(k, dict_factory), _asdict_inner(v, dict_factory))
                               for k, v in obj.items())

    There are plenty more tests needed for this, plus I need to think it through some more. astuple() has the same issue. I'll also have to think about the dict subclass case, too.

    @ilevkivskyi
    Copy link
    Member

    The problem with this solution is that it may brea (relatively common) use case of tuple valued fields, e.g.:

    >>> tuple(*[x for x in a])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: tuple expected at most 1 arguments, got 2

    Essentially, named tuples are all subclasses of tuple but they override new in an incompatible way.

    @ericvsmith
    Copy link
    Member

    Maybe do both, then. Also, it's probably better for performance reasons (I know: premature optimization and all that ...):

    @@ -1018,8 +1018,10 @@ def _asdict_inner(obj, dict_factory):
                 value = _asdict_inner(getattr(obj, f.name), dict_factory)
                 result.append((f.name, value))
             return dict_factory(result)
    -    elif isinstance(obj, (list, tuple)):
    +    elif type(obj) in (list, tuple):
             return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
    +    elif isinstance(obj, (list, tuple)):
    +        return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
         elif isinstance(obj, dict):
             return type(obj)((_asdict_inner(k, dict_factory), _asdict_inner(v, dict_factory))
                               for k, v in obj.items())

    I guess we could argue it's a bug in nametuples, but I don't see that getting us very far.

    @NeilGirdhar
    Copy link
    Mannequin

    NeilGirdhar mannequin commented Aug 10, 2018

    Sorry if I'm intruding here, but I really dislike that we are doing isinstance versus list, tuple, and dict. And I dislike even more type(x) in (list, tuple). I think the ideal thing to do would have been to check versus abstract base classes like isinstance(x, Sequence) and isinstance(x, Mapping). The advantage to this is flexibility with user-defined types.

    The problem seems to be that the abstract base classes don't promise anything about the constructor. It seems like some Sequence types accept an Iterable (e.g. tuple), but others like NamedTuple want positional arguments. I think this promise should be encoded in some way.

    Maybe a third solution is to have NamedTuple special-cased out:

    isinstance(x, Sequence) and not isinstance(x, NamedTuple)

    If there are other Sequence types that do this (are there?) then an abstract base class could be created.

    This solution has the benefit of playing the most nicely with user-defined classes.

    @ilevkivskyi
    Copy link
    Member

    Eric, I like your solution. It is probably not perfect, but at least it solves the existing problem without introducing obvious problems.

    Neil, your way will not work since named tuples don't have NamedTuple in their MROs:

    CustomNT.mro == (CustomNT, tuple, object)

    @NeilGirdhar
    Copy link
    Mannequin

    NeilGirdhar mannequin commented Aug 10, 2018

    Why can't we add an ABC into a NamedTuple instance's MRO? Because while I like Eric's solution, it seems to be backwards: tuple and list are not the special cases—NamedTuple is.

    All sequences accept an iterable in their constructor, and NamedTuple doesn't. So it should be NamedTuple that is marked as being "weird". Right now, NamedTuple instances claim to be tuples, but don't accept iterables to initialize themselves. That seems wrong.

    This problem that we have with dataclass can easily pop up somewhere else, and it will require the same restriction to list and tuple types to fix it (breaking user-defined types).

    Is it imposible to add to the MRO of NamedTuple instances?

    @ericvsmith
    Copy link
    Member

    Hmm, for some reason I'm not getting mail from this issue, so I made my PR before I saw the comments since my last comment.

    Raymond has resisted adding a base class to namedtuple. I believe the preferred way to identify a namedtuple is to look for a _fields member. We could do that instead of use the (*[]) trick for all classes derived from tuple or list.

    Maybe it's worthwhile bringing up the differences in how tuple and namedtuple handle creation with iterables on python-dev.

    I'm still not certain of the right approach, but PR 8728 adds some tests and fixes the problem identified in this issue. I probably won't commit it until I can discuss with some other people.

    @rhettinger
    Copy link
    Contributor

    Raymond has resisted adding a base class to namedtuple. I believe the
    preferred way to identify a namedtuple is to look for a _fields member.

    FWIW, that was also Guido's opinion as well. A named tuple is generic concept defined in the glossary as "Any tuple-like class whose indexable elements are also accessible using a named attribute". This includes user-defined classes, classes created by collections.namedtuple, and instances of structseq. The code generated by oollections.namedtuple was based on patterns already in widespread use at the time.

    @NeilGirdhar
    Copy link
    Mannequin

    NeilGirdhar mannequin commented Aug 10, 2018

    The code generated by oollections.namedtuple was based on patterns already in widespread use at the time.

    That's fair enough. However, it seems like there is one important difference: unlike any other Sequence, namedtuples cannot be initialized with an iterable.

    For that reason, I like Eric's option of checking for the _fields member rather than special-casing list and tuple since it seems like namedtuple is the special case.

    @ericvsmith
    Copy link
    Member

    For the record, I don't disagree with namedtuples not having a base class.

    Maybe it's best to let copy.deepcopy() deal with namedtuples, instead of trying to detect them here. So just special case exact tuples and lists, and pass everything else to copy.deepcopy().

    >>> C = namedtuple('C', 'a b c')
    >>> c = C(1, 2, 3)
    >>> b=copy.deepcopy(c)
    >>> b
    C(a=1, b=2, c=3)
    >>> hasattr(b, '_fields')
    True
    >>> hasattr(c, '_fields')
    True
    >>> b is c
    False
    >>>

    Although by doing that, you lose dict_factory or tuple_factory on nested data structures, and namedtuples that contain dataclasses would be handled differently, I think. I'll do some investigating.

    @ericvsmith
    Copy link
    Member

    I like #53397 better because it changes only the namedtuple case, not other classes that are derived from list or tuple.

    But, I might copy some of the tests from 9151 before I'm done with this.

    @ericvsmith
    Copy link
    Member

    The question here is: what should asdict() return if the dataclass contains a namedtuple? What the code is trying to do (but currently failing!) is to return another namedtuple, but with the values returned by recursively calling in to asdict() (or rather, its helper function) for each field in the namedtuple.

    I think this is the correct behavior. Specifically, I do not want to call the namedtuple's _asdict() method. There are two problems with _asdict():

    1. It doesn't recurse in to the fields, like asdict() normally does with a dict, list, or tuple.

    2. It returns a dict! This is a problem because if a dataclass field contains a dict which has a key which is a namedtuple, then asdict() would fail.

    Here's an example of #2 above, if asdict() on a namedtuple field returns a dict:

    @dataclass
    class C:
        f: 'Any'
    
    T = namedtuple('T', 'a')
    
    c = C({T('an a'): 0})
    print('c:', c)
    print(asdict(c))   # <- error here

    prints:

    c: C(f={T(a='an a'): 0})
    Traceback (most recent call last):
    ...
      File "/home/eric/python/lib/dataclasses.py", line 1019, in asdict
        return _asdict_inner(obj, dict_factory)
      File "/home/eric/python/lib/dataclasses.py", line 1026, in _asdict_inner
        value = _asdict_inner(getattr(obj, f.name), dict_factory)
      File "/home/eric/python/lib/dataclasses.py", line 1059, in _asdict_inner
        for k, v in obj.items())
    TypeError: unhashable type: 'collections.OrderedDict'

    So, although it's unfortunate, I think the only reasonable thing to do in this case is to have asdict() on a namedtuple return another namedtuple. Here's how that looks using the above code:

    c: C(f={T(a='an a'): 0})
    {'f': {T(a='an a'): 0}}

    Admittedly, this can't be used with json.dumps() (you get "TypeError: keys must be str, int, float, bool or None, not T"), but I think it's the best we can do. It's consistent with any other class derived from tuple or list: asdict() will convert it to a copy of the class, recursing into each item that's in the tuple or list.

    @ericvsmith
    Copy link
    Member

    New changeset 9b9d97d by Eric V. Smith in branch 'master':
    bpo-34363: dataclasses.asdict() and .astuple() now handle fields which are namedtuples. (GH-9151)
    9b9d97d

    @ericvsmith
    Copy link
    Member

    New changeset 78aa3d8 by Eric V. Smith (Miss Islington (bot)) in branch '3.7':
    bpo-34363: dataclasses.asdict() and .astuple() now handle fields which are namedtuples. (GH-9151) (GH-9304)
    78aa3d8

    @ilevkivskyi
    Copy link
    Member

    [..] but I think it's the best we can do. It's consistent with any other class derived from tuple or list: [...]

    I agree with this argument. Sorry for delay with my response and thank you Eric for taking care about this issue!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants