classification
Title: can't construct dataclass as ABC (or runtime check as data protocol)
Type: behavior Stage:
Components: Versions: Python 3.8, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: cybertreiber, eric.smith, gvanrossum
Priority: normal Keywords:

Created on 2019-12-26 00:05 by cybertreiber, last changed 2020-01-19 16:30 by gvanrossum.

Files
File name Uploaded Description Edit
dc_repro.py cybertreiber, 2019-12-26 00:05 reproducing cases A to D
dc2_repro.py cybertreiber, 2020-01-02 09:00
Messages (14)
msg358876 - (view) Author: Alexander Hirner (cybertreiber) Date: 2019-12-26 00:05
At runtime, we want to check whether objects adhere to a data protocol. This is not possible due to problematic interactions between ABC and @dataclass.

The attached file tests all relevant yet impossible cases. Those are:

1) A(object): Can't check due to "Protocols with non-method members don't support issubclass()" (as outlined in PEP 554)
2) B(ABC): "Can't instantiate abstract class B with abstract methods x, y"
3) C(Protocol): same as A or same as B if @property is @abstractmethod

The problem can be solved in two parts. First allowing to implement @abstractproperty in a dataclass (B). This doesn't involve typing and enables the expected use case of dataclass+ABC. I analysed this problem as follows:
Abstract properties evaluate to a default of property, not to dataclasses.MISSING. Hence, `dataclasses._init_fn` throws TypeError because of deriving from class vars without defaults.

Second, eliding the exception of @runtime_checkable Protocols with non-method members if and only if the the the protocol is in its MRO. I didn't think that through fully, but instantiation could e.g. fail for missing implementations as expected from ABC behaviour (see case D in attached file). I'm not sure about the runtime overhead of this suggestion.
msg358881 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-12-26 08:06
Is dataclasses doing something here that a regular, hand-written class wouldn't do?
msg358884 - (view) Author: Alexander Hirner (cybertreiber) Date: 2019-12-26 10:35
Here, nothing but less boiler plate. 

Wouldn't it pay off to not rewrite dataclass features like frozen, replace, runtime refelection and the ability to use dataclass aware serialization libraries (e.g. pydantic)? I think @dataclass+@abstractproperty behaviour is yet to be defined.

The second part about subclass check is not specific to dataclasses.
msg358893 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-12-27 00:58
1. PEP 554 is about multiple interpreters. Which PEP did you mean?

2. The double negative in “Wouldn't it pay off to not rewrite dataclass features” is confusing. What did you mean?
msg358904 - (view) Author: Alexander Hirner (cybertreiber) Date: 2019-12-27 13:31
Pardon my sloppiness. 

1. That should have been PEP 544. The last point referred to the notion of data protocols [0].

2. I think solving this issue for dataclasses would ensure better composition with modern libraries and other idioms like Protocol and ABC.


[0] https://www.python.org/dev/peps/pep-0544/#runtime-checkable-decorator-and-narrowing-types-by-isinstance
msg358906 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-12-27 16:49
Thanks. Can you be specific about “modern libraries”? Please clarify your use cases.
msg358946 - (view) Author: Alexander Hirner (cybertreiber) Date: 2019-12-28 15:18
We construct a computational graph and need to validate (or search for possible new) edges at runtime. The data is specific to computer vision. One example is clipping geometry within 2D boundaries. A minimal implementation of this data would be:

@dataclass
class FramedGeometry:
    width: PositiveInt
    height: PositiveInt
    geometry: Geometry

However, different properties are manifest in different encodings. Height, width can be meta data from an image database or inherent to a decoded image (as np.ndarray.shape). The transform will then `dataclasses.replace(geometry=...)` its attribute(s).. If width and height are not implemented, another transition is needed to produce them whilst only looking into the image header, not fully decoding potentially large and many images.

The read-only interface ensures that transitions are generic wrt some forms of inputs. The replace interface preserves runtime types.

Inputs and outputs are annotated with @dataclass or tuples of them. Those dataclasses are a mixin of base dataclasses that declare concrete properties like a URI of an image and ABCs that declare accessors like get_width(self) -> PositiveInt. We use @pydantic.dataclass to parse, validate and deserialize concrete classes to great extent [0]. 

In order to not implement accessors on top of dataclasses, we'd want that abstract properties are compatible with dataclasses and issubclass works with data protocols (given the necessary constraints).

PS:
Polymorphism for computer-vision data is an interesting problem. Other approaches exist, each with their own struggle to model "traits" the right way [1]. E.g., scaling would be valid for `FramedGeometry` since no image data is included but invalid when images are referenced but cannot be resized, like:

class EncodedSizedAnnotatedFrame:
    width: PositiveInt
    height: PositiveInt
    image_bin: bytes
    geometry: Geometry

Thanks!

[0] https://pydantic-docs.helpmanual.io/usage/dataclasses/
[1] https://github.com/pytorch/vision/issues/1406
msg358981 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-12-29 05:21
Have you tried dropping ABCMeta? Mypy checks @abstractmethod regardless.
msg359163 - (view) Author: Alexander Hirner (cybertreiber) Date: 2020-01-01 17:10
Dropping ABCMeta stops at instantiation. This should be in the dataclass code that's been generated.

  File "<string>", line 2, in __init__
AttributeError: can't set attribute


Repro:
```
class QuasiABC:
    @property
    @abstractmethod
    def x(self) -> int: ...

@dataclass(frozen=True)
class E(QuasiABC):
    x: int

E(10)
```

Interestingly, frozen=False is giving the same error.
msg359165 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-01-01 18:34
Try adding a setter to the base class method.
msg359186 - (view) Author: Alexander Hirner (cybertreiber) Date: 2020-01-02 09:00
This results in E(x=None).

I'd need to look into that to understand why.
msg359204 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-01-02 17:22
No doubt because your getter returns None.
msg360255 - (view) Author: Alexander Hirner (cybertreiber) Date: 2020-01-19 07:54
In that case, what should the getter return? It doesn't know about the implementation of x.
Maybe I'm not getting the idea behind adding getters/setters.
msg360261 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-01-19 16:30
In the example, it should be `int`, right?

Anyway, the bug tracker is not a good place to get questions answered. Since this is mostly about type checking, I recommend that you try this Gitter instance: https://gitter.im/python/typing
History
Date User Action Args
2020-01-19 16:30:34gvanrossumsetmessages: + msg360261
2020-01-19 07:54:31cybertreibersetmessages: + msg360255
2020-01-02 17:22:00gvanrossumsetmessages: + msg359204
2020-01-02 09:00:56cybertreibersetfiles: + dc2_repro.py

messages: + msg359186
2020-01-01 18:34:50gvanrossumsetmessages: + msg359165
2020-01-01 17:10:09cybertreibersetmessages: + msg359163
2019-12-29 05:21:08gvanrossumsetmessages: + msg358981
2019-12-28 15:18:15cybertreibersetmessages: + msg358946
2019-12-27 16:49:56gvanrossumsetmessages: + msg358906
2019-12-27 13:31:14cybertreibersetmessages: + msg358904
2019-12-27 00:58:53gvanrossumsetnosy: + gvanrossum
messages: + msg358893
2019-12-26 10:35:37cybertreibersetmessages: + msg358884
2019-12-26 08:06:55eric.smithsetmessages: + msg358881
2019-12-26 00:05:40cybertreibercreate