This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PEP 646: Decide on substitution behavior
Type: behavior Stage: patch review
Components: Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: JelleZijlstra Nosy List: AlexWaygood, JelleZijlstra, gvanrossum, kj, matthew.rahtz, mrahtz, serhiy.storchaka
Priority: release blocker Keywords: patch

Created on 2022-03-13 20:46 by JelleZijlstra, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 32341 open matthew.rahtz, 2022-04-05 19:56
Messages (17)
msg415100 - (view) Author: Jelle Zijlstra (JelleZijlstra) * (Python committer) Date: 2022-03-13 20:46
We've had some disagreement about the behavior of TypeVarTuple substitution related to PEP 646, and the discussion has now spilled around multiple PRs. I'd like to use this issue to come to an agreement so we don't have to chase through so many different places.

Links:
- GH-31021 (merged) implements PEP 646 in typing.py. Matthew initially implemented full TypeVar substitution support, but took it out after I suggested an alternative solution in https://github.com/python/cpython/pull/31021#pullrequestreview-890627941.
- GH-31800 (merged without review) implements TypeVarTuple substitution in Python.
- GH-31828 implements TypeVarTuple substitution in C.
- GH-31844 and GH-31846 add additional test cases.
- GH-31804 (closed) implements my proposed solution.

I'd like to ask that until we come to an agreement we hold off on making any more changes, so we don't have to go back and forth and we ensure that the eventual solution covers all edge cases.

The disagreement is about what to do with TypeVarTuple substitution: the behavior when a generic type is subscripted, like `tuple[*Ts][int, str]`.

There are two possible extreme approaches:
- Implement full substitution support, just as we have it for existing TypeVars. This is complicated because TypeVarTuple makes it much harder to match up the types correctly. However, it is consistent with the behavior for other TypeVar-like objects. My example would turn into GenericAlias(tuple, (int, str)).
- Give up on substitution and just return a new GenericAlias object: GenericAlias(GenericAlias(tuple, Unpack[Ts]), (int, str). This avoids implementing any complex runtime behavior, but it inconsistent with existing behavior and less pretty when you print out the type. I prefer this approach because there's less risk that future enhancements to typing will break it. I also want to explore extending this approach to ParamSpec substitution.
msg415107 - (view) Author: Matthew Rahtz (matthew.rahtz) * Date: 2022-03-13 21:59
Thanks for starting this, Jelle - I was a bit unsure about how to proceed here.

Given that https://github.com/python/cpython/pull/31800 is already merged, I'd also propose something halfway between the two extremes: return a sensible substitution when the logic to compute that isn't too onerous, and a new GenericAlias object when it is. The upsides are that we'd probably be able to return reasonable substitutions for the vast majority of cases, and that we wouldn't have to remove what's already been merged. The downsides would be lack of consistency, and the potential for changing rules about what does and doesn't return a full substitution as time goes on and new features are added.
msg415108 - (view) Author: Matthew Rahtz (matthew.rahtz) * Date: 2022-03-13 22:01
(Having said that, to be clear: my preferred solution currently would still be the solution where we just return a new GenericAlias for anything involving a TypeVarTuple. The crux is what Serhiy is happy with.)
msg415109 - (view) Author: Jelle Zijlstra (JelleZijlstra) * (Python committer) Date: 2022-03-13 22:10
Thanks Matthew! Merged PRs can still be reverted, and we have some time before the feature freeze. I'd like to hear what Guido and Ken think too.

If we go with the GenericAlias substitution, we need to make sure that such aliases still work as base class. That would need some C work to make types.GenericAlias.__mro_entries__ recurse if the alias's origin is itself a GenericAlias. There's a few other subtleties to think about; I can work on that but don't have a ton of time today.
msg415557 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-03-19 15:32
I am for consistent behavior. If return GenericAlias(GenericAlias(tuple, Unpack[Ts]), (int, str)) for tuple[*Ts][int, str], we should also return GenericAlias(GenericAlias(list, T), int) for list[T][int], etc. And it will cause multiple problems:

* A repr can be less readable.
* It will break equality comparison and hashing. Good bye caching.
* What about __origin__, __parameters__, __args__? How will they be calculated?
* It can break code which uses annotations for something. For example it can break dataclasses.

It may be that will need to use it as a fallback for cases like tuple[T, *Ts][*Ts2] (currently it is error). But I am not sure that such cases should be supported.
msg415623 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2022-03-20 16:44
I think I'm with Serhiy, I don't understand the hesitance to transform tuple[*Ts][int, str] into tuple[int, str].

What would be an example of a substitution that's too complex to do?
msg415637 - (view) Author: Jelle Zijlstra (JelleZijlstra) * (Python committer) Date: 2022-03-20 21:54
It's simple if you only look at simple examples.

Here are some examples current main (with Serhiy's patch for the Python version of typing) gets wrong:

>>> from typing import *
>>> Ts = TypeVarTuple("Ts")
>>> T1 = TypeVar("T1")
>>> T2 = TypeVar("T2")
>>> Tuple[T1, Unpack[Ts], T2][int, Unpack[tuple[int]]]  # expect error
typing.Tuple[int, *tuple[int]]
>>> Tuple[T1, Unpack[Ts], str, T2][int, Unpack[Ts]]  # expect error (T2 missing)
typing.Tuple[int, str, *Ts]  # it put *Ts in the wrong place
>>> Tuple[T1, Unpack[Ts], str, T2][int, Unpack[Ts], Unpack[Ts]]  # expect error (*Ts can't substitute T2)
typing.Tuple[int, *Ts, str, *Ts]
>>> class G(Generic[T1, Unpack[Ts], T2]): pass
... 
>>> G[int]  # expect error
__main__.G[int]

We can probably fix that, but I'm not looking forward to implementing the fixed logic in both Python and C. Also, I'm worried that it won't work with future extensions to the type system (e.g., the rumored Map operator) that may go into 3.12 or later versions.
msg415694 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-03-21 18:17
The first case will be practically fixed by GH 32030 after chenging the grammar to allow unpacking in index tuple: A[*B].

Two other cases will be fixed by GH 32031. It does not require any C code.

In the last case no error is raised because some error checks are skipped if any of Generic arguments is a TypeVarTuple. We just need to add such checks. This is Python-only code too.

Note that the alternative proposition is even more lenient to errors.
msg415710 - (view) Author: Matthew Rahtz (matthew.rahtz) * Date: 2022-03-21 21:33
[Guido]

> What would be an example of a substitution that's too complex to do?

We also need to remember the dreaded arbitrary-length tuple. For example, I think it should be the case that:

```python
T = TypeVar('T')
Ts = TypeVarTuple('Ts')
class C(Generic[*Ts]): pass
Alias = C[T, *Ts]
Alias2 = Alias[*tuple[int, ...]]
# Alias2 should be C[int, *tuple[int, ...]]
```

Ok, this is a bit of a silly example, but if we're committing to evaluating substitutions correctly, we should probably make even this kind of example behave correctly so that users who accidentally do something silly can debug what's gone wrong.

[Serhiy]

> A repr can be less readable.

Definitely true.

> It will break equality comparison and hashing. Good bye caching.

Huh, I didn't know about this one. Fair enough, this is totally a downside.

> What about __origin__, __parameters__, __args__? How will they be calculated?

This could admittedly be thorny. We'd have to think it through carefully. Admittedly also a downside.

> It can break code which uses annotations for something. For example it can break dataclasses.

Oh, also interesting - I didn't know about this one either. Could you give an example?

> The first case will be practically fixed by GH 32030 after chenging the grammar to allow unpacking in index tuple: A[*B].

We actually deliberately chose not to unpack concrete tuple types - see the description of https://github.com/python/cpython/pull/30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

> Two other cases will be fixed by GH 32031. It does not require any C code.

I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?) seems like too restrictive a solution. I can imagine there might be completely legitimate cases where the ability to do this would be important. For example:

```python
DType = TypeVar('DType')
Shape = TypeVarTuple('Shape')
class Tensor(Generic[DType, *Shape]): ...
Uint8Tensor = Tensor[uint8, *Shape]
Unit8BatchTensor = Uint8Tensor[Batch, *Shape]
```

> Note that the alternative proposition is even more lenient to errors.

True, but at least it's predictably lenient to errors - I think the repr makes it very clear that "Woah, you're doing something advanced here. You're on your own!" I think it better fits the principle of least astonishment to have something that consistently lets through all errors of a certain class than something that sometimes catches errors and sometimes doesn't.
msg415712 - (view) Author: Matthew Rahtz (matthew.rahtz) * Date: 2022-03-21 21:35
P.s. To be clear, (I think?) these are all substitutions that are computable. We *could* implement the logic to make all these evaluate correctly if we wanted to. It's just a matter of how much complexity we want to allow in typing.py (or in the runtime in general, if we say farmed some of this logic out to a separate module).
msg415734 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2022-03-22 01:01
I'd like to look at this as a case of simplifying something to its simplest canonical form, but no simpler. This is what the existing fixed-typevar expansion does: e.g. tuple[str, T, T][int] becomes tuple[str, int, int].

I propose that we try to agree on a set of rules for what can be simplified further and what cannot, when we have B = C[...]; A = B[...], (IOW A = C[...][...]), for various shapes of the subscripts to C and B. Note that what's relevant for the second subscript is C[...].__parameters__, so I'll call that "left" below.

1. Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify. Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

2. Another edge case is that if neither side has any starred items we will always simplify (since this is the existing behavior in 3.10). This may raise an error if the number of subscripts on the right does not match the number of parameters on the left.

3. If there's a single *Ts on the left but not on the right, we should be able to simplify, which again may raise an error if there are not enough values on the right, but if there are more than enough, the excess will be consumed by *Ts (in fact that's the only way *Ts is fed).

4. If there's a *Ts on the right but not on the left, we should _not_ simplify, since whatever we have on the left serves as a constraint for *Ts. (E.g. tuple[int, int][*Ts] constrains *Ts to being (int, int).)

5. If there's exactly one *Ts on the left and one on the right, we _might__ be able to simplify if the prefix and suffix of the __parameters__ match the prefix and suffix of the subscript on the right. E.g. C[int, T, *Ts, float][str, *Ts] can be simplified to C[int, str, *Ts, float]. OTOH C[int, T, *Ts, float][*Ts] cannot be simplified -- but we cannot flag it as an error either. Note that __parameters__ in this example is (T, Ts); we have to assume that typevartuples in __parameters__ are always used as *Ts (since the PEP recognizes no valid unstarred uses of Ts).

TBH case 5 is the most complex and I may have overlooked something. I'm more sure of cases 1-4.
msg415752 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-03-22 08:49
> Alias = C[T, *Ts]
> Alias2 = Alias[*tuple[int, ...]]
> # Alias2 should be C[int, *tuple[int, ...]]

tuple[int, ...] includes also an empty tuple, and in this case there is no value for T.

> Oh, also interesting - I didn't know about this one either. Could you give an example?

If __origin__, __parameters__, __args__ are a mess, it will definitely break a code which use them.


> We actually deliberately chose not to unpack concrete tuple types - see the description of https://github.com/python/cpython/pull/30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

You assumed that *tuple[str, bool] in def foo(*args: *tuple[str, bool]) should give foo.__annotations__['args'] = tuple[str, bool], but it should rather give (str, bool). No confusion with tuple[str, bool].

And one of PEP 646 options is to implement star-syntax only in subscription, not in var-parameter type annotations.

> I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?)

No, it will only be disallowed in substitution of a VarType. Tuple[T][*Ts] -- error. Tuple[*Ts][*Ts2] -- ok.

I propose to implement simple and strict rules, and later add support of new cases where it makes sense.
msg415753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-03-22 09:06
> 1. Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify. Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

I do not understand this. Do you forbid simplifying of tuple[*Ts, float][str, *tuple[int, ...]] to tuple[str, *tuple[int, ...], float]?

I think that the rule should be that *tuple[X, ...] cannot split between different variables. Or that it cannot substitute a TypeVar. A more strong variant of rule 4.

> 5. ... but we cannot flag it as an error either.

I think that it will better to flag it as an error now. Later, after all code be merged and all edge cases be handled we can return here and reconsider this.

There are workarounds for this.

* You should not use Generic[*Ts] if you require at least one item, but Generic[*Ts, T].
* Instead of `def foo(*args: *Ts)` use `def foo(*args: *tuple[*Ts, T])`.

These tricks are common in functional programming.

The rest of the rules match my implementations more or less.
msg416707 - (view) Author: Matthew Rahtz (matthew.rahtz) * Date: 2022-04-04 23:28
Apologies for the slow reply - coming back to this now that the docs and pickling issues are mostly sorted.

[Serhiy]

> > Alias = C[T, *Ts]
> > Alias2 = Alias[*tuple[int, ...]]
> > # Alias2 should be C[int, *tuple[int, ...]]
>
> tuple[int, ...] includes also an empty tuple, and in this case there is no value for T.

This was my initial intuition too, but Pradeep pointed out to me in https://github.com/python/cpython/pull/31021#discussion_r815853784 that for tuple[int, ...], Python has chosen the opposite mindset: instead of assuming the worst-case scenario, we assume the best-case scenario. Thus, the following type-checks correctly with mypy (https://mypy-play.net/?mypy=latest&python=3.10&gist=b9ca66fb7d172f939951a741388836a6):

def return_first(tup: tuple[int, ...]) -> int:
    return tup[0]
tup: tuple[()] = ()
return_first(tup)

> > We actually deliberately chose not to unpack concrete tuple types - see the description of https://github.com/python/cpython/pull/30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)
> 
> You assumed that *tuple[str, bool] in def foo(*args: *tuple[str, bool]) should give foo.__annotations__['args'] = tuple[str, bool], but it should rather give (str, bool). No confusion with tuple[str, bool].

Fair point, we could *technically* distinguish between tuple[str, bool] and (str, bool). But if I was a naive user and I saw `foo.__annotations__['args'] == (str, bool)`, I don't think it'd be immediately obvious to me that the type of `args` was `*tuple[str, bool]`.

Also though, there's a second reason mentioned in https://github.com/python/cpython/pull/30398 why `(str, bool)` wouldn't be the best choice. We decided that the runtime behaviour of `*args: *something` should be that we essentially do `(*something,)[0]`. If we made `tuple[int, str]` unpack to `(int, str)`, then we'd end up with `__annotations__['args'] == (int,)`.

> And one of PEP 646 options is to implement star-syntax only in subscription, not in var-parameter type annotations.

As in, we would allow `Generic[*Ts]`, but not `*args: *Ts`? That'd be a *major* change to the PEP - not an option I'm willing to consider at this stage in the process.

> > I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?)
>
> No, it will only be disallowed in substitution of a VarType. Tuple[T][*Ts] -- error. Tuple[*Ts][*Ts2] -- ok.

Ah, gotcha. My mistake.

[Guido]

I ran out of time this evening :) Will reply properly soon.
msg416795 - (view) Author: Matthew Rahtz (matthew.rahtz) * Date: 2022-04-05 18:21
[Guido]

> 1. Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify.

Alright, let me think this through with some examples to get my head round it.

It would prohibit the following difficult case:

class C(Generic[*Ts]): ...
Alias = C[T, *Ts]
Alias[*tuple[int, ...]]  # Does not simplify; stays C[T, *Ts][*tuple[int, ...]]

That seems pretty reasonable. It would also prohibit these other relatively simple cases, but I guess that's fine:

Alias = C[*Ts]
Alias[*tuple[int, ...]]  # Does not simplify; stays C[*Ts][*tuple[int, ...]]

Alias = C[T, *tuple[int, ...]]
Alias[str]  # Does not simplify; stays C[T, *tuple[int, ...]][str]


> Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

Is this to say that we effectively prohibit binding *tuple[...] to anything? If we can simplify without binding *tuple[...] to anything, then we do simplify, but otherwise, we don't simplify? So under this rule, the following WOULD work?

Alias = C[T, *tuple[int, ...]]
Alias[str]  # Simplifies to C[str, *tuple[int, ...]], because we didn't have to bind *tuple[int, ...] to do it


> 2. Another edge case is that if neither side has any starred items we will always simplify (since this is the existing behavior in 3.10). This may raise an error if the number of subscripts on the right does not match the number of parameters on the left.

Alright, so this is business as usual.


> 3. If there's a single *Ts on the left but not on the right, we should be able to simplify, which again may raise an error if there are not enough values on the right, but if there are more than enough, the excess will be consumed by *Ts (in fact that's the only way *Ts is fed).

So then:

class C(Generic[*Ts]): ...
Alias = C[T, *Ts]
Alias[()]              # Raises error
Alias[int]             # Simplifies to C[int, *Ts]
Alias[int, str]        # Simplifies to C[int, str]
Alias[int, str, bool]  # Simplifies to C[int, str, bool]

Yup, seems straightforward.


> 4. If there's a *Ts on the right but not on the left, we should _not_ simplify, since whatever we have on the left serves as a constraint for *Ts.

Ok, so this is about the following situations:

class C(Generic[*Ts]): ...
Alias = C[T1, T2]
Alias[*Ts]  # Does not simplify; stays C[T1, T2][*Ts]

Yikes - in fact, this is actually super hairy; I hadn't thought about this edge case at all in the PEP.

Agreed that it seems reasonable not to simplify here.


> E.g. tuple[int, int][*Ts] constrains *Ts to being (int, int).

Was that a typo? Surely tuple[int, int][*Ts] isn't valid - since tuple[int, int] doesn't have any free parameters?


> 5. If there's exactly one *Ts on the left and one on the right, we _might__ be able to simplify if the prefix and suffix of the __parameters__ match the prefix and suffix of the subscript on the right. E.g. C[int, T, *Ts, float][str, *Ts] can be simplified to C[int, str, *Ts, float]. OTOH C[int, T, *Ts, float][*Ts] cannot be simplified -- but we cannot flag it as an error either. Note that __parameters__ in this example is (T, Ts); we have to assume that typevartuples in __parameters__ are always used as *Ts (since the PEP recognizes no valid unstarred uses of Ts).

Ok, this also makes sense.


---

Still, though, doesn't the point that Serhiy brought up about __origin__, __parameters__ and __args__ still apply? In cases where we *don't* simplify, there'd still be the issue of what we'd set these things to be.

This evening I'll also revisit the PRs adding tests for substitution to try and make them a comprehensive reference as to what's currently possible.
msg416813 - (view) Author: Matthew Rahtz (matthew.rahtz) * Date: 2022-04-05 20:03
Ok, https://github.com/python/cpython/pull/32341/files is a reference of how the current implementation behaves. Fwiw, it *is* mostly correct - with a few minor tweaks it might be alright for at least the 3.11 release.

In particular, instead of dealing with the thorny issue of what to do about splitting unpacked arbitrary-length tuples over multiple type variables - e.g. C[T, *Ts][*tuple[int, ...]] - instead either deciding to try and evaluate it properly and living with the complexity, or leaving it unsimplified and living with the __args__, __parameters__ and __origin__ problem - for now, we could just raise an exception for any substitutions which involve an unpacked arbitrary-length tuple, since I'd guess it's going to be an extremely rare use-case.
msg416913 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2022-04-07 01:50
We need to move on this, because the outcome of this discussion is a release blocker for 3.11b1 -- the next release!
History
Date User Action Args
2022-04-11 14:59:57adminsetgithub: 91162
2022-04-07 01:50:13gvanrossumsetpriority: normal -> release blocker
type: behavior
messages: + msg416913
2022-04-05 20:03:02matthew.rahtzsetmessages: + msg416813
2022-04-05 19:56:58matthew.rahtzsetkeywords: + patch
stage: patch review
pull_requests: + pull_request30396
2022-04-05 18:21:12matthew.rahtzsetmessages: + msg416795
2022-04-04 23:28:09matthew.rahtzsetmessages: + msg416707
2022-03-22 09:06:58serhiy.storchakasetmessages: + msg415753
2022-03-22 08:49:01serhiy.storchakasetmessages: + msg415752
2022-03-22 01:01:21gvanrossumsetmessages: + msg415734
2022-03-21 21:35:34matthew.rahtzsetmessages: + msg415712
2022-03-21 21:33:50matthew.rahtzsetmessages: + msg415710
2022-03-21 18:17:54serhiy.storchakasetmessages: + msg415694
2022-03-20 21:54:06JelleZijlstrasetmessages: + msg415637
2022-03-20 16:44:40gvanrossumsetmessages: + msg415623
2022-03-19 15:32:54serhiy.storchakasetmessages: + msg415557
2022-03-13 22:57:37AlexWaygoodsetnosy: + AlexWaygood
2022-03-13 22:10:02JelleZijlstrasetmessages: + msg415109
2022-03-13 22:01:18matthew.rahtzsetmessages: + msg415108
2022-03-13 21:59:28matthew.rahtzsetnosy: + matthew.rahtz
messages: + msg415107
2022-03-13 20:46:48JelleZijlstrasetnosy: + mrahtz
2022-03-13 20:46:16JelleZijlstracreate