classification
Title: dataclasses.asdict() mishandles dataclass instance attributes that are instances of subclassed typing.NamedTuple
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: NeilGirdhar, alexdelorenzo, eric.smith, levkivskyi, rhettinger
Priority: normal Keywords: patch

Created on 2018-08-08 19:41 by alexdelorenzo, last changed 2018-08-10 20:53 by eric.smith.

Pull Requests
URL Status Linked Edit
PR 8728 open eric.smith, 2018-08-10 18:22
Messages (13)
msg323298 - (view) Author: Alex DeLorenzo (alexdelorenzo) Date: 2018-08-08 19:41
Example:

from typing import NamedTuple
from dataclasses import dataclass, asdict

class NamedTupleAttribute(NamedTuple):
    example: bool = True

@dataclass
class Data:
    attr1: bool
    attr2: NamedTupleAttribute

data = Data(True, NamedTupleAttribute(example=True))
namedtuple_attr = asdict(data)['attr2']
print(type(namedtuple_attr.example))
>>> generator

One would expect that the printed type would be of type bool.
msg323300 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2018-08-08 20:11
OK, so the crux of the bug is this difference:

>>> a = (1, 2)
>>> tuple(x for x in a)
(1, 2)
>>> NamedTupleAttribute(x for x in a)
NamedTupleAttribute(example=<generator object <genexpr> at 0x10e2e52a0>)

A potential solution would be to either use `type(obj) in (list, tuple)` instead of `isinstance(obj, (list, tuple))` (and thus cause using copy.deepcopy for everything else), but this might break some use cases (IMO quite unlikely).

Any other thoughts?
msg323305 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-08-08 21:53
Oops, didn't see that you commented on this, Ivan.

I'll do some analysis tomorrow.
msg323312 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-08-09 02:54
Thanks for the pointer, Ivan.

I haven't really thought it through yet, but this fixes the problem at hand:

diff --git a/dataclasses.py b/dataclasses.py
index ba34f6b..54916ee 100644
--- a/dataclasses.py
+++ b/dataclasses.py
@@ -1019,7 +1019,7 @@ def _asdict_inner(obj, dict_factory):
             result.append((f.name, value))
         return dict_factory(result)
     elif isinstance(obj, (list, tuple)):
-        return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
+        return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
     elif isinstance(obj, dict):
         return type(obj)((_asdict_inner(k, dict_factory), _asdict_inner(v, dict_factory))
                           for k, v in obj.items())

There are plenty more tests needed for this, plus I need to think it through some more. astuple() has the same issue. I'll also have to think about the dict subclass case, too.
msg323320 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2018-08-09 12:26
The problem with this solution is that it may brea (relatively common) use case of tuple valued fields, e.g.:

>>> tuple(*[x for x in a])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: tuple expected at most 1 arguments, got 2

Essentially, named tuples are all subclasses of `tuple` but they override __new__ in an incompatible way.
msg323352 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-08-10 00:49
Maybe do both, then. Also, it's probably better for performance reasons (I know: premature optimization and all that ...):

@@ -1018,8 +1018,10 @@ def _asdict_inner(obj, dict_factory):
             value = _asdict_inner(getattr(obj, f.name), dict_factory)
             result.append((f.name, value))
         return dict_factory(result)
-    elif isinstance(obj, (list, tuple)):
+    elif type(obj) in (list, tuple):
         return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
+    elif isinstance(obj, (list, tuple)):
+        return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
     elif isinstance(obj, dict):
         return type(obj)((_asdict_inner(k, dict_factory), _asdict_inner(v, dict_factory))
                           for k, v in obj.items())

I guess we could argue it's a bug in nametuples, but I don't see that getting us very far.
msg323353 - (view) Author: Neil Girdhar (NeilGirdhar) * Date: 2018-08-10 04:26
Sorry if I'm intruding here, but I really dislike that we are doing isinstance versus list, tuple, and dict.  And I dislike even more type(x) in (list, tuple).  I think the ideal thing to do would have been to check versus abstract base classes like isinstance(x, Sequence) and isinstance(x, Mapping).  The advantage to this is flexibility with user-defined types.

The problem seems to be that the abstract base classes don't promise anything about the constructor.  It seems like some Sequence types accept an Iterable (e.g. tuple), but others like NamedTuple want positional arguments.  I think this promise should be encoded in some way.

Maybe a third solution is to have NamedTuple special-cased out:

isinstance(x, Sequence) and not isinstance(x, NamedTuple)

If there are other Sequence types that do this (are there?) then an abstract base class could be created.

This solution has the benefit of playing the most nicely with user-defined classes.
msg323364 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2018-08-10 10:57
Eric, I like your solution. It is probably not perfect, but at least it solves the existing problem without introducing obvious problems.

Neil, your way will not work since named tuples don't have NamedTuple in their MROs:

CustomNT.mro == (CustomNT, tuple, object)
msg323377 - (view) Author: Neil Girdhar (NeilGirdhar) * Date: 2018-08-10 18:07
Why can't we add an ABC into a NamedTuple instance's MRO?  Because while I like Eric's solution, it seems to be backwards:  tuple and list are not the special cases—NamedTuple is.

All sequences accept an iterable in their constructor, and NamedTuple doesn't.  So it should be NamedTuple that is marked as being "weird".  Right now, NamedTuple instances claim to be tuples, but don't accept iterables to initialize themselves.  That seems wrong.

This problem that we have with dataclass can easily pop up somewhere else, and it will require the same restriction to list and tuple types to fix it (breaking user-defined types).

Is it imposible to add to the MRO of NamedTuple instances?
msg323380 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-08-10 18:30
Hmm, for some reason I'm not getting mail from this issue, so I made my PR before I saw the comments since my last comment.

Raymond has resisted adding a base class to namedtuple. I believe the preferred way to identify a namedtuple is to look for a _fields member. We could do that instead of use the (*[]) trick for all classes derived from tuple or list.

Maybe it's worthwhile bringing up the differences in how tuple and namedtuple handle creation with iterables on python-dev.

I'm still not certain of the right approach, but PR 8728 adds some tests and fixes the problem identified in this issue. I probably won't commit it until I can discuss with some other people.
msg323383 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-08-10 18:40
> Raymond has resisted adding a base class to namedtuple. I believe the
> preferred way to identify a namedtuple is to look for a _fields member.

FWIW, that was also Guido's opinion as well.  A named tuple is generic concept defined in the glossary as "Any tuple-like class whose indexable elements are also accessible using a named attribute".  This includes user-defined classes, classes created by collections.namedtuple, and instances of structseq.  The code generated by oollections.namedtuple was based on patterns already in widespread use at the time.
msg323384 - (view) Author: Neil Girdhar (NeilGirdhar) * Date: 2018-08-10 18:52
> The code generated by oollections.namedtuple was based on patterns already in widespread use at the time.

That's fair enough.  However, it seems like there is one important difference: unlike any other Sequence, namedtuples cannot be initialized with an iterable.

For that reason, I like Eric's option of checking for the _fields member rather than special-casing list and tuple since it seems like namedtuple is the special case.
msg323391 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-08-10 20:53
For the record, I don't disagree with namedtuples not having a base class.

Maybe it's best to let copy.deepcopy() deal with namedtuples, instead of trying to detect them here. So just special case exact tuples and lists, and pass everything else to copy.deepcopy().

>>> C = namedtuple('C', 'a b c')
>>> c = C(1, 2, 3)
>>> b=copy.deepcopy(c)
>>> b
C(a=1, b=2, c=3)
>>> hasattr(b, '_fields')
True
>>> hasattr(c, '_fields')
True
>>> b is c
False
>>>

Although by doing that, you lose dict_factory or tuple_factory on nested data structures, and namedtuples that contain dataclasses would be handled differently, I think. I'll do some investigating.
History
Date User Action Args
2018-08-10 20:53:14eric.smithsetmessages: + msg323391
2018-08-10 18:52:51NeilGirdharsetmessages: + msg323384
2018-08-10 18:40:52rhettingersetnosy: + rhettinger
messages: + msg323383
2018-08-10 18:30:29eric.smithsetmessages: + msg323380
2018-08-10 18:22:49eric.smithsetkeywords: + patch
stage: patch review
pull_requests: + pull_request8212
2018-08-10 18:07:45NeilGirdharsetmessages: + msg323377
2018-08-10 10:57:10levkivskyisetmessages: + msg323364
2018-08-10 04:27:00NeilGirdharsetnosy: + NeilGirdhar
messages: + msg323353
2018-08-10 00:49:30eric.smithsetmessages: + msg323352
2018-08-09 12:26:28levkivskyisetmessages: + msg323320
2018-08-09 02:54:31eric.smithsetmessages: + msg323312
2018-08-08 21:53:06eric.smithsetmessages: + msg323305
2018-08-08 21:52:38eric.smithsetassignee: eric.smith
components: + Library (Lib), - Interpreter Core
versions: + Python 3.8
2018-08-08 20:11:56levkivskyisetmessages: + msg323300
2018-08-08 20:01:59levkivskyisetnosy: + levkivskyi
2018-08-08 19:43:07eric.smithsetnosy: + eric.smith
2018-08-08 19:41:25alexdelorenzocreate