Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from __future__ import annotations breaks dataclasses ClassVar and InitVar handling #77634

Closed
Ricyteach mannequin opened this issue May 9, 2018 · 21 comments
Closed

from __future__ import annotations breaks dataclasses ClassVar and InitVar handling #77634

Ricyteach mannequin opened this issue May 9, 2018 · 21 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@Ricyteach
Copy link
Mannequin

Ricyteach mannequin commented May 9, 2018

BPO 33453
Nosy @gpshead, @ericvsmith, @ned-deily, @methane, @ambv, @ilevkivskyi, @miss-islington, @Ricyteach, @FFY00
PRs
  • bpo-33453: Handle string type annotations in dataclasses. #6768
  • bpo-33453: Handle string type annotations in dataclasses. #6768
  • [3.7] bpo-33453: Handle string type annotations in dataclasses. (GH-6768) #6889
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericvsmith'
    closed_at = <Date 2018-05-16.07:15:34.374>
    created_at = <Date 2018-05-09.21:38:42.693>
    labels = ['3.7', '3.8', 'type-bug', 'library']
    title = 'from __future__ import annotations breaks dataclasses ClassVar and InitVar handling'
    updated_at = <Date 2020-05-04.11:56:05.334>
    user = 'https://github.com/Ricyteach'

    bugs.python.org fields:

    activity = <Date 2020-05-04.11:56:05.334>
    actor = 'FFY00'
    assignee = 'eric.smith'
    closed = True
    closed_date = <Date 2018-05-16.07:15:34.374>
    closer = 'eric.smith'
    components = ['Library (Lib)']
    creation = <Date 2018-05-09.21:38:42.693>
    creator = 'Ricyteach'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33453
    keywords = ['patch', 'patch']
    message_count = 21.0
    messages = ['316333', '316335', '316336', '316338', '316345', '316346', '316350', '316351', '316362', '316391', '316392', '316394', '316396', '316411', '316412', '316417', '316419', '316481', '316523', '316755', '316762']
    nosy_count = 9.0
    nosy_names = ['gregory.p.smith', 'eric.smith', 'ned.deily', 'methane', 'lukasz.langa', 'levkivskyi', 'miss-islington', 'Ricyteach', 'FFY00']
    pr_nums = ['6768', '6768', '6889']
    priority = None
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue33453'
    versions = ['Python 3.7', 'Python 3.8']

    @Ricyteach
    Copy link
    Mannequin Author

    Ricyteach mannequin commented May 9, 2018

    This is broken in 3.7 (both beta 3 and 4):

    from __future__ import annotations
    from dataclasses import dataclass
    from typing import ClassVar, Any
    
    @dataclass
    class C():
        class_var: ClassVar[Any] = object()
        data: str

    Traceback:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\ricky\AppData\Local\Programs\Python\Python37\lib\dataclasses.py", line 850, in dataclass
        return wrap(_cls)
      File "C:\Users\ricky\AppData\Local\Programs\Python\Python37\lib\dataclasses.py", line 842, in wrap
        return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
      File "C:\Users\ricky\AppData\Local\Programs\Python\Python37\lib\dataclasses.py", line 763, in _process_class
        else 'self',
      File "C:\Users\ricky\AppData\Local\Programs\Python\Python37\lib\dataclasses.py", line 442, in _init_fn
        raise TypeError(f'non-default argument {f.name!r} '
    TypeError: non-default argument 'data' follows default argument

    @Ricyteach Ricyteach mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels May 9, 2018
    @Ricyteach
    Copy link
    Mannequin Author

    Ricyteach mannequin commented May 9, 2018

    To be clear: it works just fine with the annotations import.

    @Ricyteach
    Copy link
    Mannequin Author

    Ricyteach mannequin commented May 9, 2018

    Sorry, mean to say it works just fine *without* the import.

    @ericvsmith
    Copy link
    Member

    This is a known issue, but it wasn't being tracked here. So, thanks for opening the issue.

    ericvsmith/dataclasses#92 (comment)

    Not to put Łukasz on the spot (he's sitting behind me even as we speak), but I think we missed the window for 3.7.0 for this. I'll discuss it with him.

    @ericvsmith ericvsmith changed the title from __future__ import annotations breaks dataclasses ClassVar handling from __future__ import annotations breaks dataclasses ClassVar and InitVar handling May 9, 2018
    @ambv
    Copy link
    Contributor

    ambv commented May 10, 2018

    Well, this is a rather big deal and I'd like to see it solved for 3.7.0. Ned, this is after the last beta but as far as I understand, we're still fine committing fixes that maintain ABI until RC1, right?

    Note that this isn't specific to the annotations future import. If a user actually writes:

    field: "ClassVar[SomeTypeReferencedLater]" = get_some_type_object()

    the effect is the same.

    There are two ways to solve it, the right way and the fast way.

    The right way is to call get_type_hints() on the class which forces evaluation of all type annotations (including in base classes, if any). That way you can keep using your existing hack to check if a thing is a ClassVar. However, it's slow because it:

    • forces evaluation of all type annotations at import time, which is even slower than without the annotations future import because now you're tokenizing, creating the AST and the code objects, too; and

    • you force an import of typing for all users, regardless whether they use any annotations or not.

    This is why attrs went with the fast way which covers most (but not all) bases in this case. If the annotation is a string, just check if it starts with "typing.ClassVar", "t.ClassVar", or just "ClassVar". That's a fast check and doesn't ever require importing typing. On the flip side, the 0.001% of users [1]_ who import ClassVar differently, will not have their class variables recognized.

    So, Eric, unless you really want to do the right thing here and make dataclasses forever slower to start up than attrs, I would be happy to provide you with a patch for this during sprints.

    [1] .. Figure made up on the spot.

    @ned-deily
    Copy link
    Member

    Ned, this is after the last beta but as far as I understand, we're still fine committing fixes that maintain ABI until RC1, right?

    Yes

    @ericvsmith
    Copy link
    Member

    See also [2]_ for a brief discussion of forward references, which makes get_type_hints() undesirable in this case.

    This is why attrs went with the fast way which covers most (but not all) bases in this case. If the annotation is a string, just check if it starts with "typing.ClassVar", "t.ClassVar", or just "ClassVar". That's a fast check and doesn't ever require importing typing. On the flip side, the 0.001% of users [1]_ who import ClassVar differently, will not have their class variables recognized.

    I'm okay with the fast check for the string "ClassVar". My only concern is how the user would figure out what's going on, if for example they "import typing as T". The generated __init__ wouldn't have the params they expect, but with default values thrown in I'm not so sure how quickly they'd notice. Hopefully they'd figure out soon enough there's a problem, but I'm not sure they'd know the cause if it.

    Maybe we could do some additional checking if typing has been imported? If we see "T.ClassVar" ("[identifier].ClassVar" appears at the beginning of the string) then if typing has been imported, further check that "T is typing"? Although this isn't going to help with "from typing import ClassVar as CV" (but only 0.00004% of users would do that [3]_) and I don't want to use regex's for this. str.partition() is likely good enough, if we decide to go this route.

    Is there any scenario where the following code would be useful, or expected to work, if "import typing as T" hadn't been executed before @DataClass runs? After all, if we do decide to call get_type_hints() it wouldn't work without it.

    field: "T.ClassVar[SomeTypeReferencedLater]" = get_some_type_object()

    But again, unless [2]_ is addressed, get_type_hints() will fail unless both T and SomeTypeReferencedLater are known when @DataClass runs, if we used get_type_hints().

    So, I guess this is my roundabout way of saying we should do the best we can with string inspection, and not use get_type_hints(). Maybe we can discuss it with Guido at the sprints.

    For all of this, I'm assuming we'll do something similar with InitVar. Although we obviously know it's been imported, it doesn't solve all of the other issues with get_type_hints.

    [2] .. python/typing#508

    [3] .. Also a made up number.

    @ericvsmith
    Copy link
    Member

    I see that python/typing#508 is also referenced in python-attrs/attrs#361, where it contributed to attrs using string inspection.

    @Ricyteach
    Copy link
    Mannequin Author

    Ricyteach mannequin commented May 10, 2018

    Is there any scenario where the following code would be useful...

    Perhaps if someone, in search of a speedup, were sort of rolling their own lighter-weight equivalent to the typing module (in order to avoid importing the full typing module), but duck typed enough to fool the average type checker? Is that possible?

    @ericvsmith
    Copy link
    Member

    The more I think about this, the more I think Łukasz is correct that just checking for strings starting with "ClassVar", "typing.ClassVar", or "t.ClassVar" is the correct thing to do. This is the change he made in python-attrs/attrs#367.

    In some private email I had discussed extracting the module name, if any, from the string annotation, and looking it up in sys.modules. This doesn't buy you much because you have to know how the caller imported typing. If they used "import typing as t", then you can't look up "t" in sys.modules. You could do some horrible frame trick to find out what the caller knew as "t", but that still won't work in plenty of cases. I don't want to use a complex solution that still doesn't work in all cases, and would indeed work in fewer places than just examining the string. The only name we could reliably detect is "typing.ClassVar", all others wouldn't be in sys.modules correctly.

    So, that leaves us with just looking at the string and guessing what the caller meant. I think the three strings that Łukasz suggested are good enough.

    This will need to be well documented. The documentation should note that things like this will break:

    from __future__ import annotations
    from typing import ClassVar as CV
    @dataclass
    class C:
        x: CV

    x will not be a class variable here, since @DataClass will see the annotation as "CV" and not know that it means typing.ClassVar.

    InitVar has a similar problem. What strings to use there? "InitVar" and "dataclasses.InitVar", of course. But maybe "dc.InitVar"? It's hard to say with no in-the-field usage examples to search for. I'll start with just the first two strings.

    I really don't want to use regexes here. But a refinement would be to not just use .startswith, and instead use a word boundary. After all, "ClassVarOfMine" should probably not be considered a ClassVar. I'll leave that issue for another day, though.

    I'll have a patch ready before the sprints are over.

    @ericvsmith ericvsmith self-assigned this May 11, 2018
    @methane
    Copy link
    Member

    methane commented May 11, 2018

    "t.ClassVar" looks ugly...
    How about dropping ClassVar support of dataclass module for 3.7?

    @ericvsmith
    Copy link
    Member

    We can't break the API at this point in the release cycle. But I am open to what string prefixes we should allow.

    @methane
    Copy link
    Member

    methane commented May 11, 2018

    We can't break the API at this point in the release cycle.

    We hadn't release RC, and we hadn't documented dataclass module yet.
    How about making "dataclass" module as porvisional state?

    I think we've learned lesson that we shouldn't use typing in modules other than typing...

    But I am open to what string prefixes we should allow.

    If it is really needed, I think we should only allow "typing.ClassVar".
    All other prefixes seems ambiguous.

    @ambv
    Copy link
    Contributor

    ambv commented May 11, 2018

    I think we've learned lesson that we shouldn't use typing in modules other than typing...

    This is a blanket statement that as hurtful as it is factually incorrect. Let me address the latter.

    1. Dataclasses are entirely dependent on annotations as described by the typing module. While it would be entirely possible for users to pass an arbitrary expression as an annotation to denote it's a field of the dataclass, then it's confusing the reader of that code... what's the point? The fact that dataclasses isn't itself importing typing is an implementation detail of dataclasses.

    2. The problem isn't even specific to typing but stringified type annotations in general. If the magic sentinel annotation came from the dataclasses module, it would be just as tricky to get it right as it is now when this sentinel lives in typing.

    3. singledispatch now also allows registering functions that are type-annotated. In fact, that is a nicer API than the old one. That usage also imports typing. I don't see how this is a problem.

    If it is really needed, I think we should only allow "typing.ClassVar". All other prefixes seems ambiguous.

    I came up with "typing", "t", and straight "ClassVar" by grepping through typed codebases I work with. I agree that "t" is rather arbitrary so I'm totally fine with leaving that one out.

    @Ricyteach
    Copy link
    Mannequin Author

    Ricyteach mannequin commented May 11, 2018

    Lending my voice to the suggestion of limiting the class attribute check to typing.ClassVar and ClassVar. It can always be expanded later if it is needed.

    @methane
    Copy link
    Member

    methane commented May 12, 2018

    > I think we've learned lesson that we shouldn't use typing in modules
    other than typing...

    This is a blanket statement that as hurtful as it is factually
    incorrect. Let me address the latter.

    1. Dataclasses are entirely dependent on annotations as described by the
      typing module
      . While it would be entirely possible for users to pass an
      arbitrary expression as an annotation to denote it's a field of the
      dataclass, then it's confusing the reader of that code... what's the
      point? The fact that dataclasses isn't itself importing typing is an
      implementation detail of dataclasses.

    When I said "use typing module", I meant "using type information from
    annotation value" preciously.
    Using only name of the annotation is not harmful.

    1. The problem isn't even specific to typing but stringified type
      annotations in general. If the magic sentinel annotation came from the
      dataclasses module, it would be just as tricky to get it right as it is now
      when this sentinel lives in typing.

    Yes. I think we shouldn't parse annotation value until we can expect
    issues like this and estimate performance overhead of
    massive calling of get_type_hints() on application startup.

    1. singledispatch now also allows registering functions that are
      type-annotated. In fact, that is a nicer API than the old one. That usage
      also imports typing. I don't see how this is a problem.

    How about performance?
    Isn't performance overhead of it is much larger than expected by people who
    don't know internal of get_type_hints()?

    I agree that annotation could very useful.
    But I fear that utilizing annotation value may make ugly hack like
    "ClassVar" prefix
    or unexpected slow import time. So I want to be very conservative about
    utilizing annotation in stdlib.

    Once it is happen, it's very hard to remove it from stdlib.
    I think we should learn more about runtime usage of type hints in third
    party library,
    before using it in stdlib.

    @ericvsmith
    Copy link
    Member

    At this stage in the release cycle, if you really feel strongly about this, you should take it up with Guido directly.

    @ericvsmith ericvsmith self-assigned this May 12, 2018
    @ericvsmith
    Copy link
    Member

    There have been comments on the PR, but I'd like to focus the higher level issue back here. Specifically, see my comment #6768 (comment)

    To summarize: I still think string inspections are the best we can do.

    I'm going to try to organize a meeting with Guido, Ivan, and Łukasz at the sprints on Monday.

    @ericvsmith
    Copy link
    Member

    Followup from our meeting at the sprints: we're going to go with inspecting the type annotation string and use heuristics to determine if the type is a ClassVar or InitVar. I'll follow up with more specifics on the approach.

    This will obviously need to make it in to the documentation.

    @ericvsmith
    Copy link
    Member

    New changeset 2a7bacb by Eric V. Smith in branch 'master':
    bpo-33453: Handle string type annotations in dataclasses. (GH-6768)
    2a7bacb

    @miss-islington
    Copy link
    Contributor

    New changeset c73268a by Miss Islington (bot) in branch '3.7':
    bpo-33453: Handle string type annotations in dataclasses. (GH-6768)
    c73268a

    @ericvsmith ericvsmith added the 3.8 only security fixes label May 16, 2018
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants