classification
Title: from __future__ import annotations breaks dataclasses ClassVar and InitVar handling
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: Ricyteach, eric.smith, gregory.p.smith, inada.naoki, levkivskyi, lukasz.langa, miss-islington, ned.deily
Priority: Keywords: patch, patch

Created on 2018-05-09 21:38 by Ricyteach, last changed 2018-05-16 07:15 by eric.smith. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 6768 merged eric.smith, 2018-05-12 02:19
PR 6768 merged eric.smith, 2018-05-12 02:19
PR 6889 merged miss-islington, 2018-05-16 02:45
Messages (21)
msg316333 - (view) Author: Rick Teachey (Ricyteach) * Date: 2018-05-09 21:38
This is broken in 3.7 (both beta 3 and 4):

from __future__ import annotations
from dataclasses import dataclass
from typing import ClassVar, Any

@dataclass
class C():
    class_var: ClassVar[Any] = object()
    data: str

Traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\ricky\AppData\Local\Programs\Python\Python37\lib\dataclasses.py", line 850, in dataclass
    return wrap(_cls)
  File "C:\Users\ricky\AppData\Local\Programs\Python\Python37\lib\dataclasses.py", line 842, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
  File "C:\Users\ricky\AppData\Local\Programs\Python\Python37\lib\dataclasses.py", line 763, in _process_class
    else 'self',
  File "C:\Users\ricky\AppData\Local\Programs\Python\Python37\lib\dataclasses.py", line 442, in _init_fn
    raise TypeError(f'non-default argument {f.name!r} '
TypeError: non-default argument 'data' follows default argument
msg316335 - (view) Author: Rick Teachey (Ricyteach) * Date: 2018-05-09 21:39
To be clear: it works just fine with the annotations import.
msg316336 - (view) Author: Rick Teachey (Ricyteach) * Date: 2018-05-09 21:40
Sorry, mean to say it works just fine *without* the import.
msg316338 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-05-09 21:50
This is a known issue, but it wasn't being tracked here. So, thanks for opening the issue.

https://github.com/ericvsmith/dataclasses/issues/92#issuecomment-382473127

Not to put Łukasz on the spot (he's sitting behind me even as we speak), but I think we missed the window for 3.7.0 for this. I'll discuss it with him.
msg316345 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2018-05-10 03:08
Well, this is a rather big deal and I'd like to see it solved for 3.7.0. Ned, this is after the last beta but as far as I understand, we're still  fine committing fixes that maintain ABI until RC1, right?


Note that this isn't specific to the `annotations` future import. If a user actually writes:

  field: "ClassVar[SomeTypeReferencedLater]" = get_some_type_object()

the effect is the same.


There are two ways to solve it, the right way and the fast way.

The right way is to call `get_type_hints()` on the class which forces evaluation of all type annotations (including in base classes, if any). That way you can keep using your existing hack to check if a thing is a ClassVar. However, it's slow because it:

- forces evaluation of all type annotations at import time, which is even slower than without the `annotations` future import because now you're tokenizing, creating the AST and the code objects, too; and

- you force an import of typing for all users, regardless whether they use any annotations or not.


This is why attrs went with the fast way which covers most (but not all) bases in this case. If the annotation is a string, just check if it starts with "typing.ClassVar", "t.ClassVar", or just "ClassVar". That's a fast check and doesn't ever require importing typing.  On the flip side, the 0.001% of users [1]_ who import ClassVar differently, will not have their class variables recognized.

So, Eric, unless you really want to do the right thing here and make dataclasses forever slower to start up than attrs, I would be happy to provide you with a patch for this during sprints.



[1] .. Figure made up on the spot.
msg316346 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-10 03:44
> Ned, this is after the last beta but as far as I understand, we're still  fine committing fixes that maintain ABI until RC1, right?

Yes
msg316350 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-05-10 07:59
See also [2]_ for a brief discussion of forward references, which makes get_type_hints() undesirable in this case.

> This is why attrs went with the fast way which covers most (but not all) bases in this case. If the annotation is a string, just check if it starts with "typing.ClassVar", "t.ClassVar", or just "ClassVar". That's a fast check and doesn't ever require importing typing.  On the flip side, the 0.001% of users [1]_ who import ClassVar differently, will not have their class variables recognized.

I'm okay with the fast check for the string "ClassVar". My only concern is how the user would figure out what's going on, if for example they "import typing as T". The generated __init__ wouldn't have the params they expect, but with default values thrown in I'm not so sure how quickly they'd notice. Hopefully they'd figure out soon enough there's a problem, but I'm not sure they'd know the cause if it.

Maybe we could do some additional checking if typing has been imported? If we see "T.ClassVar" ("[identifier].ClassVar" appears at the beginning of the string) then if typing has been imported, further check that "T is typing"? Although this isn't going to help with "from typing import ClassVar as CV" (but only 0.00004% of users would do that [3]_) and I don't want to use regex's for this. str.partition() is likely good enough, if we decide to go this route.

Is there any scenario where the following code would be useful, or expected to work, if "import typing as T" hadn't been executed before @dataclass runs? After all, if we do decide to call get_type_hints() it wouldn't work without it.

  field: "T.ClassVar[SomeTypeReferencedLater]" = get_some_type_object()

But again, unless [2]_ is addressed, get_type_hints() will fail unless both T and SomeTypeReferencedLater are known when @dataclass runs, if we used get_type_hints().

So, I guess this is my roundabout way of saying we should do the best we can with string inspection, and not use get_type_hints(). Maybe we can discuss it with Guido at the sprints.

For all of this, I'm assuming we'll do something similar with InitVar. Although we obviously know it's been imported, it doesn't solve all of the other issues with get_type_hints.

[2] .. https://github.com/python/typing/issues/508

[3] .. Also a made up number.
msg316351 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-05-10 08:10
I see that https://github.com/python/typing/issues/508 is also referenced in https://github.com/python-attrs/attrs/issues/361, where it contributed to attrs using string inspection.
msg316362 - (view) Author: Rick Teachey (Ricyteach) * Date: 2018-05-10 13:48
> Is there any scenario where the following code would be useful...

Perhaps if someone, in search of a speedup, were sort of rolling their own lighter-weight equivalent to the typing module (in order to avoid importing the full typing module), but duck typed enough to fool the average type checker? Is that possible?
msg316391 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-05-11 11:04
The more I think about this, the more I think Łukasz is correct that just checking for strings starting with "ClassVar", "typing.ClassVar", or "t.ClassVar" is the correct thing to do. This is the change he made in https://github.com/python-attrs/attrs/pull/367.

In some private email I had discussed extracting the module name, if any, from the string annotation, and looking it up in sys.modules. This doesn't buy you much because you have to know how the caller imported typing. If they used "import typing as t", then you can't look up "t" in sys.modules. You could do some horrible frame trick to find out what the caller knew as "t", but that still won't work in plenty of cases. I don't want to use a complex solution that still doesn't work in all cases, and would indeed work in fewer places than just examining the string. The only name we could reliably detect is "typing.ClassVar", all others wouldn't be in sys.modules correctly.

So, that leaves us with just looking at the string and guessing what the caller meant. I think the three strings that Łukasz suggested are good enough.

This will need to be well documented. The documentation should note that things like this will break:

from __future__ import annotations
from typing import ClassVar as CV
@dataclass
class C:
    x: CV

x will not be a class variable here, since @dataclass will see the annotation as "CV" and not know that it means typing.ClassVar.

InitVar has a similar problem. What strings to use there? "InitVar" and "dataclasses.InitVar", of course. But maybe "dc.InitVar"? It's hard to say with no in-the-field usage examples to search for. I'll start with just the first two strings.

I really don't want to use regexes here. But a refinement would be to not just use .startswith, and instead use a word boundary. After all, "ClassVarOfMine" should probably not be considered a ClassVar. I'll leave that issue for another day, though.

I'll have a patch ready before the sprints are over.
msg316392 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-05-11 11:35
"t.ClassVar" looks ugly...
How about dropping ClassVar support of dataclass module for 3.7?
msg316394 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-05-11 11:44
We can't break the API at this point in the release cycle. But I am open to what string prefixes we should allow.
msg316396 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-05-11 11:55
> We can't break the API at this point in the release cycle.

We hadn't release RC, and we hadn't documented dataclass module yet.
How about making "dataclass" module as porvisional state?

I think we've learned lesson that we shouldn't use typing in modules other than typing...


> But I am open to what string prefixes we should allow.

If it is really needed, I think we should only allow "typing.ClassVar".
All other prefixes seems ambiguous.
msg316411 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2018-05-11 18:47
> I think we've learned lesson that we shouldn't use typing in modules other than typing...

This is a blanket statement that as hurtful as it is factually incorrect.  Let me address the latter.

1. Dataclasses are entirely dependent on annotations *as described by the typing module*.   While it would be entirely possible for users to pass an arbitrary expression as an annotation to denote it's a field of the dataclass, then it's confusing the reader of that code... what's the point?  The fact that dataclasses isn't itself importing `typing` is an implementation detail of dataclasses.

2. The problem isn't even specific to `typing` but stringified type annotations in general.  If the magic sentinel annotation came from the dataclasses module, it would be just as tricky to get it right as it is now when this sentinel lives in `typing`.

3. singledispatch now also allows registering functions that are type-annotated.  In fact, that is a nicer API than the old one.  That usage also imports `typing`.  I don't see how this is a problem.


> If it is really needed, I think we should only allow "typing.ClassVar".  All other prefixes seems ambiguous.

I came up with "typing", "t", and straight "ClassVar" by grepping through typed codebases I work with.  I agree that "t" is rather arbitrary so I'm totally fine with leaving that one out.
msg316412 - (view) Author: Rick Teachey (Ricyteach) * Date: 2018-05-11 19:02
Lending my voice to the suggestion of limiting the class attribute check to `typing.ClassVar` and `ClassVar`. It can always be expanded later if it is needed.
msg316417 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-05-12 00:16
> > I think we've learned lesson that we shouldn't use typing in modules
other than typing...

> This is a blanket statement that as hurtful as it is factually
incorrect.  Let me address the latter.

> 1. Dataclasses are entirely dependent on annotations *as described by the
typing module*.   While it would be entirely possible for users to pass an
arbitrary expression as an annotation to denote it's a field of the
dataclass, then it's confusing the reader of that code... what's the
point?  The fact that dataclasses isn't itself importing `typing` is an
implementation detail of dataclasses.

When I said "use typing module", I meant "using type information from
annotation value" preciously.
Using only name of the annotation is not harmful.

> 2. The problem isn't even specific to `typing` but stringified type
annotations in general.  If the magic sentinel annotation came from the
dataclasses module, it would be just as tricky to get it right as it is now
when this sentinel lives in `typing`.

Yes.  I think we shouldn't parse annotation value until we can expect
issues like this and estimate performance overhead of
massive calling of get_type_hints() on application startup.

> 3. singledispatch now also allows registering functions that are
type-annotated.  In fact, that is a nicer API than the old one.  That usage
also imports `typing`.  I don't see how this is a problem.

How about performance?
Isn't performance overhead of it is much larger than expected by people who
don't know internal of get_type_hints()?

I agree that annotation could very useful.
But I fear that utilizing annotation value may make ugly hack like
"ClassVar" prefix
or unexpected slow import time.  So I want to be very conservative about
utilizing annotation in stdlib.

Once it is happen, it's very hard to remove it from stdlib.
I think we should learn more about runtime usage of type hints in third
party library,
before using it in stdlib.
msg316419 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-05-12 02:21
At this stage in the release cycle, if you really feel strongly about this, you should take it up with Guido directly.
msg316481 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-05-13 21:40
There have been comments on the PR, but I'd like to focus the higher level issue back here. Specifically, see my comment https://github.com/python/cpython/pull/6768#discussion_r187813919

To summarize: I still think string inspections are the best we can do.

I'm going to try to organize a meeting with Guido, Ivan, and Łukasz at the sprints on Monday.
msg316523 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-05-14 15:46
Followup from our meeting at the sprints: we're going to go with inspecting the type annotation string and use heuristics to determine if the type is a ClassVar or InitVar. I'll follow up with more specifics on the approach.

This will obviously need to make it in to the documentation.
msg316755 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-05-16 02:44
New changeset 2a7bacbd913cf2bf568b3c0f85a758946d3cf4e9 by Eric V. Smith in branch 'master':
bpo-33453: Handle string type annotations in dataclasses. (GH-6768)
https://github.com/python/cpython/commit/2a7bacbd913cf2bf568b3c0f85a758946d3cf4e9
msg316762 - (view) Author: miss-islington (miss-islington) Date: 2018-05-16 04:22
New changeset c73268aad7645a146b3e0e088c198a1fb74d57ff by Miss Islington (bot) in branch '3.7':
bpo-33453: Handle string type annotations in dataclasses. (GH-6768)
https://github.com/python/cpython/commit/c73268aad7645a146b3e0e088c198a1fb74d57ff
History
Date User Action Args
2018-05-16 07:15:34eric.smithsetstatus: open -> closed
versions: + Python 3.8
priority: release blocker ->
keywords: patch, patch
resolution: fixed
stage: patch review -> resolved
2018-05-16 04:22:15miss-islingtonsetnosy: + miss-islington
messages: + msg316762
2018-05-16 02:45:40miss-islingtonsetpull_requests: + pull_request6562
2018-05-16 02:44:30eric.smithsetmessages: + msg316755
2018-05-14 19:14:04gregory.p.smithsetkeywords: patch, patch
nosy: + gregory.p.smith
2018-05-14 15:46:35eric.smithsetkeywords: patch, patch

messages: + msg316523
2018-05-13 21:40:08eric.smithsetkeywords: patch, patch

messages: + msg316481
2018-05-13 01:45:13levkivskyisetkeywords: patch, patch
nosy: + levkivskyi
2018-05-12 02:21:34eric.smithsetkeywords: patch, patch
assignee: eric.smith
messages: + msg316419
2018-05-12 02:19:22eric.smithsetkeywords: + patch
stage: patch review
pull_requests: + pull_request6455
2018-05-12 02:19:20eric.smithsetkeywords: + patch
stage: (no value)
pull_requests: + pull_request6456
2018-05-12 00:16:24inada.naokisetmessages: + msg316417
2018-05-11 19:02:42Ricyteachsetmessages: + msg316412
2018-05-11 18:47:34lukasz.langasetassignee: eric.smith -> (no value)
messages: + msg316411
2018-05-11 11:55:49inada.naokisetmessages: + msg316396
2018-05-11 11:44:15eric.smithsetmessages: + msg316394
2018-05-11 11:35:32inada.naokisetnosy: + inada.naoki
messages: + msg316392
2018-05-11 11:04:53eric.smithsetassignee: eric.smith
messages: + msg316391
2018-05-10 13:48:45Ricyteachsetmessages: + msg316362
2018-05-10 08:10:37eric.smithsetmessages: + msg316351
2018-05-10 07:59:16eric.smithsetpriority: normal -> release blocker

messages: + msg316350
2018-05-10 03:44:03ned.deilysetmessages: + msg316346
2018-05-10 03:08:41lukasz.langasetmessages: + msg316345
2018-05-10 03:08:11lukasz.langasetmessages: - msg316344
2018-05-10 03:07:33lukasz.langasetnosy: + ned.deily
messages: + msg316344
2018-05-09 21:50:52eric.smithsetnosy: + lukasz.langa

messages: + msg316338
title: from __future__ import annotations breaks dataclasses ClassVar handling -> from __future__ import annotations breaks dataclasses ClassVar and InitVar handling
2018-05-09 21:40:36Ricyteachsetmessages: + msg316336
2018-05-09 21:39:41Ricyteachsetmessages: + msg316335
2018-05-09 21:38:42Ricyteachcreate