Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataclass(slots=True) does not account for slots in base classes #90540

Closed
ariebovenberg mannequin opened this issue Jan 14, 2022 · 19 comments
Closed

dataclass(slots=True) does not account for slots in base classes #90540

ariebovenberg mannequin opened this issue Jan 14, 2022 · 19 comments
Assignees
Labels
3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@ariebovenberg
Copy link
Mannequin

ariebovenberg mannequin commented Jan 14, 2022

BPO 46382
Nosy @ericvsmith, @hynek, @serhiy-storchaka, @TeamSpen210, @sobolevn, @AlexWaygood, @ariebovenberg
PRs
  • bpo-46382 dataclass(slots=True) now takes inherited slots into account #31980
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericvsmith'
    closed_at = <Date 2022-03-19.21:01:51.297>
    created_at = <Date 2022-01-14.19:31:43.335>
    labels = ['type-bug', 'library', '3.10', '3.11']
    title = 'dataclass(slots=True) does not account for slots in base classes'
    updated_at = <Date 2022-03-19.21:01:51.297>
    user = 'https://github.com/ariebovenberg'

    bugs.python.org fields:

    activity = <Date 2022-03-19.21:01:51.297>
    actor = 'eric.smith'
    assignee = 'eric.smith'
    closed = True
    closed_date = <Date 2022-03-19.21:01:51.297>
    closer = 'eric.smith'
    components = ['Library (Lib)']
    creation = <Date 2022-01-14.19:31:43.335>
    creator = 'ariebovenberg'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46382
    keywords = ['patch']
    message_count = 19.0
    messages = ['410591', '410601', '410628', '410630', '410641', '410642', '410820', '410841', '411010', '415393', '415399', '415400', '415401', '415419', '415453', '415471', '415474', '415570', '415571']
    nosy_count = 7.0
    nosy_names = ['eric.smith', 'hynek', 'serhiy.storchaka', 'Spencer Brown', 'sobolevn', 'AlexWaygood', 'ariebovenberg']
    pr_nums = ['31980']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue46382'
    versions = ['Python 3.10', 'Python 3.11']

    @ariebovenberg
    Copy link
    Mannequin Author

    ariebovenberg mannequin commented Jan 14, 2022

    @DataClass(slots=True) adds slots to dataclasses. It adds a slot per field.
    However, it doesn't account for slots already present in base classes:

    >>> class Base:
    ...     __slots__ = ('a', )
    ...
    >>> @dataclass(slots=True)
    ... class Foo(Base):
    ...     a: int
    ...     b: float
    ...
    >>> Foo.__slots__
    ('a', 'b')  # should be: ('b', )

    The __slots__ documentation says:

    If a class defines a slot also defined in a base class, the instance variable 
    defined by the base class slot is inaccessible (except by retrieving its descriptor 
    directly from the base class). This renders the meaning of the program undefined. 
    In the future, a check may be added to prevent this.
    

    Solution: don't add slots which are already defined in any base classes:

    >>> @dataclass
    ... class Bla(Base):
    ...     __slots__ = ('b', )
    ...     a: int
    ...     b: float
    ...
    >>> Bla(4, 5.65)
    Bla(a=4, b=5.65)

    If you agree, I'd like to submit a PR to fix this. I already have a prototype working.

    @ariebovenberg ariebovenberg mannequin added 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jan 14, 2022
    @ericvsmith
    Copy link
    Member

    I'll have to do some more research. But your analysis looks correct to me, so far.

    @ariebovenberg
    Copy link
    Mannequin Author

    ariebovenberg mannequin commented Jan 15, 2022

    There are already 2 complexities I can think of:

    1. This behavior may break some people's code, if they use slots to iterate over
      the fields of a dataclass. Solution: explicitly mention in the docs that
      not every field may get a slot on the new class. Advise them to use
      fields() to iterate over the fields.
    2. It's technically allowed for slots to be an iterator (which will then be
      exhausted at class creation). Finding the slots of such a class
      may require more elaborate introspection.

    @sobolevn
    Copy link
    Member

    Arie, can you please explain what is the technical difference between these two cases:

    class A:
        __slots__ = ('a', )
        # fields
    
    class B(A):
        __slots__ = ('a', 'b')
        # fields

    And:

    class C:
        __slots__ = ('a', )
        # fields
    
    class D(C):
        __slots__ = ('b', )
        # fields

    ?

    @TeamSpen210
    Copy link
    Mannequin

    TeamSpen210 mannequin commented Jan 15, 2022

    Both will function, but class B will add its slots after A's, causing there to be an extra unused slot in the object that you can only access by directly using the A.a descriptor. So all slotted inheriting dataclasses cause the object to use more memory than necessary.

    @ariebovenberg
    Copy link
    Mannequin Author

    ariebovenberg mannequin commented Jan 15, 2022

    Spencer is correct.

    The documentation even adds: "This renders the meaning of the program undefined."

    It's clear it doesn't break anything users would often encounter (we would have heard about it), but it's still undefined behavior.

    @ericvsmith
    Copy link
    Member

    It would also be interesting to see what attrs does in this case.

    @hynek
    Copy link
    Member

    hynek commented Jan 18, 2022

    >>> @attrs.define
    ... class C(Base):
    ...   a: int
    ...   b: int
    ...
    >>> C.__slots__
    ('b', '__weakref__')

    We've got a test specifically for this use case: https://github.com/python-attrs/attrs/blob/5f36ba9b89d4d196f80147d4f2961fb2f97ae2e5/tests/test_slots.py#L309-L334

    @ariebovenberg
    Copy link
    Mannequin Author

    ariebovenberg mannequin commented Jan 20, 2022

    @hynek interesting!

    The discussion in python-attrs/attrs#420 on the weakref slot is very interesting as well.

    Considering __weakref__ is something we don't want to make impossible in dataclasses, @eric.smith what would be your preferred solution?

    @ariebovenberg
    Copy link
    Mannequin Author

    ariebovenberg mannequin commented Mar 17, 2022

    @eric.smith did you give this some thought? Would we want to imitate the attrs behavior regarding __weafref__?

    It'd be nice if I can submit a PR to be included in 3.11

    @serhiy-storchaka
    Copy link
    Member

    __slotnames__ is used for storing all slot names.

    New object.__getstate__() proposed in bpo-26579 may have some relation to this discussion.

    @ericvsmith
    Copy link
    Member

    Serhiy: Could you point to some documentation on __slotnames__? I see a few references in the code to it, but it's not set on simple test class.

    >>> class A:
    ...     __slots__=('a',)
    ...
    >>> A.__slotnames__
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: type object 'A' has no attribute '__slotnames__'. Did you mean: '__slots__'?
    >>>

    @serhiy-storchaka
    Copy link
    Member

    It is not set automatically. It is set in object.__reduce__ by calling
    some helper function from the copyreg module.

    @serhiy-storchaka
    Copy link
    Member

    Sorry, the previous message was written via a phone because of blackout, so the formatting is a tine bit messy.

    __slotnames__ is set in copyreg._slotnames(). If you want to keep a list of all slots you should save it in __slotnames__, either using copyreg._slotnames() directly or by duplicating its code.

    But as for the original issue, I do not think that the current behavior is incorrect. At least it is consistent with the behavior of regular classes.

    @ericvsmith
    Copy link
    Member

    I don't have a problem saying that for a class to be used as a base class for a dataclass, its __slots__ must not be an iterator that's been exhausted. That doesn't seem like a very onerous requirement.

    I'm also not concerned about people using __slots__ to iterate over the fields, but I agree that a documentation note couldn't hurt.

    @ariebovenberg: go ahead and submit your PR. Thanks!

    @ariebovenberg
    Copy link
    Mannequin Author

    ariebovenberg mannequin commented Mar 18, 2022

    @eric.smith awesome!

    What would you like to do about the __weakref__ slot?

    1. take no action
    2. create a separate ticket to discuss further
    3. implement as done in attrs, now that we're changing __slots__ dataclass behavior anyway

    I wouldn't recommend (1), because sooner or later this feature request will pop up. It seems reasonable to support it.

    @ericvsmith
    Copy link
    Member

    I thought there was an existing issue that covered this, but now I can't find it.

    I'd prefer #2, create a separate issue.

    @ericvsmith
    Copy link
    Member

    New changeset 82e9b0b by Arie Bovenberg in branch 'main':
    bpo-46382 dataclass(slots=True) now takes inherited slots into account (GH-31980)
    82e9b0b

    @ericvsmith
    Copy link
    Member

    Thanks for all of your work, @ariebovenberg!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants