Issue 47174: Define behavior of descriptor-typed fields on dataclasses

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/91330

classification

Title:	Define behavior of descriptor-typed fields on dataclasses
Type:	enhancement	Stage:
Components:	Library (Lib)	Versions:	Python 3.11

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	JelleZijlstra, debonte, eric.smith, zzzeek
Priority:	normal	Keywords:

Created on 2022-03-30 19:00 by debonte, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg416392 - (view)	Author: Erik De Bonte (debonte) *	Date: 2022-03-30 19:00
Recent discussions about PEP 681 (dataclass_transform) have focused on support for descriptor-typed fields. See the email thread here: https://mail.python.org/archives/list/typing-sig@python.org/thread/BW6CB6URC4BCN54QSG2STINU2M7V4TQQ/ Initially we were thinking that dataclass_transform needed a new parameter to switch between two modes. In one mode, it would use the default behavior of dataclass. In the other mode, it would be smarter about how descriptor-typed fields are handled. For example, __init__ would pass the value for a descriptor-typed field to the descriptor's __set__ method. However, Carl Meyer found that dataclass already has the desired behavior at runtime! We missed this because mypy and Pyright do not correctly mirror this runtime behavior. Although this is the current behavior of dataclass, I haven't found it documented anywhere and the behavior is not covered by unit tests. Since dataclass_transform wants to rely on this behavior and the behavior seems desirable for dataclass as well, I'm proposing that we add additional dataclass unit tests to ensure that this behavior does not change in the future. Specifically, we would like to document (and add unit tests for) the following behavior given a field whose default value is a descriptor: 1. The value passed to __init__ for that field is passed to the descriptor’s __set__ method, rather than overwriting the descriptor object. 2. Getting/setting the value of that field goes through __get__/__set__, rather than getting/overwriting the descriptor object. Here's an example: class Descriptor(Generic[T]): def __get__(self, __obj: object \| None, __owner: Any) -> T: return getattr(__obj, "_x") def __set__(self, __obj: object \| None, __value: T) -> None: setattr(__obj, "_x", __value) @dataclass class InventoryItem: quantity_on_hand: Descriptor[int] = Descriptor[int]() i = InventoryItem(13) # calls __set__ with 13 print(i.quantity_on_hand) # 13 -- obtained via call to __get__ i.quantity_on_hand = 29 # calls __set__ with 29 print(i.quantity_on_hand) # 29 -- obtained via call to __get__ I took a first stab at unit tests here: https://github.com/debonte/cpython/commit/c583e7c91c78c4aef65a1ac69241fc06ad95d436 We are aware of two other descriptor-related behaviors that may also be worth documenting: First, if a field is annotated with a descriptor type but is not assigned a descriptor object as its default value, it acts like a non-descriptor field. Here's an example: @dataclass class InventoryItem: quantity_on_hand: Descriptor[int] # No default value i = InventoryItem(13) # Sets quantity_on_hand to 13 -- No call to Descriptor.__set__ print(i.quantity_on_hand) # 13 -- No call to Descriptor.__get__ And second, when a field with a descriptor object as its default value is initialized (when the code for the dataclass is initially executed), __get__ is called with a None instance and the return value is used as the field's default value. See the example below. Note that if __get__ doesn't handle this None instance case (for example, in the initial definition of Descriptor above), a call to InventoryItem() fails with "TypeError: InventoryItem.__init__() missing 1 required positional argument: 'quantity_on_hand'". I'm less sure about documenting this second behavior, since I'm not sure what causes it to work, and therefore I'm not sure how intentional it is. class Descriptor(Generic[T]): def __init__(self, *, default: T): self._default = default def __get__(self, __obj: object \| None, __owner: Any) -> T: if __obj is None: return self._default return getattr(__obj, "_x") def __set__(self, __obj: object \| None, __value: T) -> None: if __obj is not None: setattr(__obj, "_x", __value) # When this code is executed, __get__ is called with __obj=None and the # returned value is used as the default value of quantity_on_hand. @dataclass class InventoryItem: quantity_on_hand: Descriptor[int] = Descriptor[int](default=100) i = InventoryItem() # calls __set__ with 100 print(i.quantity_on_hand) # 100 -- obtained via call to __get__

msg416392 - (view)

Author: Erik De Bonte (debonte) *

Date: 2022-03-30 19:00

Recent discussions about PEP 681 (dataclass_transform) have focused on support for descriptor-typed fields. See the email thread here: https://mail.python.org/archives/list/typing-sig@python.org/thread/BW6CB6URC4BCN54QSG2STINU2M7V4TQQ/

Initially we were thinking that dataclass_transform needed a new parameter to switch between two modes. In one mode, it would use the default behavior of dataclass. In the other mode, it would be smarter about how descriptor-typed fields are handled. For example, __init__ would pass the value for a descriptor-typed field to the descriptor's __set__ method. However, Carl Meyer found that dataclass already has the desired behavior at runtime! We missed this because mypy and Pyright do not correctly mirror this runtime behavior.

Although this is the current behavior of dataclass, I haven't found it documented anywhere and the behavior is not covered by unit tests. Since dataclass_transform wants to rely on this behavior and the behavior seems desirable for dataclass as well, I'm proposing that we add additional dataclass unit tests to ensure that this behavior does not change in the future.

Specifically, we would like to document (and add unit tests for) the following behavior given a field whose default value is a descriptor:

1. The value passed to __init__ for that field is passed to the descriptor’s __set__ method, rather than overwriting the descriptor object.

2. Getting/setting the value of that field goes through __get__/__set__, rather than getting/overwriting the descriptor object.

Here's an example:

class Descriptor(Generic[T]):
    def __get__(self, __obj: object | None, __owner: Any) -> T:
        return getattr(__obj, "_x")

    def __set__(self, __obj: object | None, __value: T) -> None:
        setattr(__obj, "_x", __value)

@dataclass
class InventoryItem:
    quantity_on_hand: Descriptor[int] = Descriptor[int]()

i = InventoryItem(13)     # calls __set__ with 13
print(i.quantity_on_hand) # 13 -- obtained via call to __get__
i.quantity_on_hand = 29   # calls __set__ with 29
print(i.quantity_on_hand) # 29 -- obtained via call to __get__

I took a first stab at unit tests here: https://github.com/debonte/cpython/commit/c583e7c91c78c4aef65a1ac69241fc06ad95d436

We are aware of two other descriptor-related behaviors that may also be worth documenting:

First, if a field is annotated with a descriptor type but is *not* assigned a descriptor object as its default value, it acts like a non-descriptor field. Here's an example:

@dataclass
class InventoryItem:
    quantity_on_hand: Descriptor[int] # No default value

i = InventoryItem(13)      # Sets quantity_on_hand to 13 -- No call to Descriptor.__set__
print(i.quantity_on_hand)  # 13 -- No call to Descriptor.__get__

And second, when a field with a descriptor object as its default value is initialized (when the code for the dataclass is initially executed), __get__ is called with a None instance and the return value is used as the field's default value. See the example below. Note that if __get__ doesn't handle this None instance case (for example, in the initial definition of Descriptor above), a call to InventoryItem() fails with "TypeError: InventoryItem.__init__() missing 1 required positional argument: 'quantity_on_hand'".

I'm less sure about documenting this second behavior, since I'm not sure what causes it to work, and therefore I'm not sure how intentional it is.

class Descriptor(Generic[T]):
    def __init__(self, *, default: T):
        self._default = default

    def __get__(self, __obj: object | None, __owner: Any) -> T:
        if __obj is None:
            return self._default

        return getattr(__obj, "_x")

    def __set__(self, __obj: object | None, __value: T) -> None:
        if __obj is not None:
            setattr(__obj, "_x", __value)

# When this code is executed, __get__ is called with __obj=None and the
# returned value is used as the default value of quantity_on_hand.
@dataclass
class InventoryItem:
    quantity_on_hand: Descriptor[int] = Descriptor[int](default=100)

i = InventoryItem()       # calls __set__ with 100
print(i.quantity_on_hand) # 100 -- obtained via call to __get__

History
Date	User	Action	Args
2022-04-11 14:59:57	admin	set	github: 91330
2022-04-01 20:03:22	zzzeek	set	nosy: + zzzeek
2022-03-30 19:00:12	debonte	create