Title: Docs of `typing.get_args`: Mention that due to caching of typing generics the order of arguments for Unions can be different from the one of the returned tuple
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.10, Python 3.9
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Dominik V., docs@python, gvanrossum, kj, levkivskyi, miss-islington
Priority: normal Keywords: patch

Created on 2020-11-10 19:35 by Dominik V., last changed 2020-11-16 01:54 by gvanrossum. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 23254 merged Dominik V., 2020-11-12 20:38
PR 23307 merged miss-islington, 2020-11-16 01:31
Messages (9)
msg380699 - (view) Author: Dominik Vilsmeier (Dominik V.) * Date: 2020-11-10 19:35
Due to caching of `__getitem__` for generic types, the order of arguments as returned by `get_args` might be different for Union:

>>> from typing import List, Union, get_args
>>> get_args(get_args(List[Union[int, str]])[0])
(<class 'int'>, <class 'str'>)
>>> get_args(get_args(List[Union[str, int]])[0])
(<class 'int'>, <class 'str'>)

This is because `List[Union[int, str]] is List[Union[str, int]]`.

I understand that caching is useful to reduce the memory footprint of type hints, so I suggest to update the documentation of `get_args`. At the moment it reads:

> For a typing object of the form X[Y, Z, ...] these functions return X and (Y, Z, ...).

This seems to imply that the returned objects are identical to the ones in the form `X[Y, Z, ...]`. However that's not the case:

>>> U1 = Union[int, str]
>>> U2 = Union[str, int]
>>> get_args(List[U1])[0] is U1
>>> get_args(List[U2])[0] is U2

I'm not so much concerned about the identity, but the fact that a subsequent call to `get_args` on the Union returns a different type seems to be relevant.

So I propose to add the following sentence to the `get_args` docs:

> [...], it gets normalized to the original class.
> If `X` is a `Union`, the order of `(Y, Z, ...)` can be different from the one of the original arguments `[Y, Z, ...]`.

Or alternatively:

> [...], it gets normalized to the original class.
> If `X` is a `Union`, the order of `(Y, Z, ...)` is arbitrary.

The second version is shorter but it's not completely accurate (since the order is actually not arbitrary).
msg380762 - (view) Author: Ken Jin (kj) * (Python triager) Date: 2020-11-11 14:01
You're right, currently this happens for 2 reasons:

1. _SpecialGenericAlias (used by List), caches its __getitem__. (As you already pointed out :) )

2. _UnionGenericAlias (Union)'s __hash__ is `hash(frozenset(self.__args__))`. i.e. Unions with different args orders but same unique args produce the same hash result. Causing the same cache hit.

I find it mildly sad however that:

>>> get_args(Union[int, str])
[int, str]

>>> get_args(Union[str, int])
[str, int]

Which is slightly inconsistent with its behavior when nested in List. I don't think there's an easy way to fix this without breaking the cache (and also it makes sense that Unions' args aren't order dependent). So I'm all for updating the docs with your addition (slightly edited):

> If `X` is a `Union`, the order of `(Y, Z, ...)` may be different from the order of the original arguments `[Y, Z, ...]`.
msg380784 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-11-11 18:10
Agreed it's mildly sad, and I wish the cache could preserve the order in List[Union[int, str]], but for that to work we'd have to change how the cache works, which feels complex, or we'd have to chance things so that Union[int, str] != Union[str, int], which seems wrong as well (and we've had them equal for many releases so this would break code).

Fixing the cache would require adding a new comparison method to all generic type objects, and that just doesn't seem worth the effort (but I'd be open to this solution in the future).

So for now, let's document that get_args() may swap Union arguments.
msg380831 - (view) Author: Ken Jin (kj) * (Python triager) Date: 2020-11-12 15:48
Dominik, would you like to submit a PR for this :) ?
msg380850 - (view) Author: Dominik Vilsmeier (Dominik V.) * Date: 2020-11-12 21:25
Thinking more about it, I came to realize that it's not the Union that sits at the root of this behavior, but rather the caching performed by generic types in general. So if we consider

L1 = List[Union[int, str]]
L2 = List[Union[str, int]]

then `get_args(L1)[0] is get_args(L2)[0]` and so `get_args` has no influence on the order of arguments of the Union objects (they are already the same for L1 and L2).

So I think it would be more accurate to add the following sentence instead:

> If `X` is a generic type, the returned objects `(Y, Z, ...)` might not be identical to the ones used in the form `X[Y, Z, ...]` due to type caching.

Everything else follows from there (including flattening of nested Unions).
msg380851 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-11-12 21:28
msg381051 - (view) Author: miss-islington (miss-islington) Date: 2020-11-16 01:31
New changeset c3b9592244a9112d8af9610ff1c4e1e4cd4bfaca by Dominik1123 in branch 'master':
bpo-42317: Improve docs of typing.get_args concerning Union (GH-23254)
msg381053 - (view) Author: miss-islington (miss-islington) Date: 2020-11-16 01:52
New changeset 2369759a47c5292bacf2eef17b4e2388b7d36675 by Miss Islington (bot) in branch '3.9':
bpo-42317: Improve docs of typing.get_args concerning Union (GH-23254)
msg381054 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-11-16 01:54
Date User Action Args
2020-11-16 01:54:24gvanrossumsetstatus: open -> closed
resolution: fixed
messages: + msg381054

stage: patch review -> resolved
2020-11-16 01:52:32miss-islingtonsetmessages: + msg381053
2020-11-16 01:31:14miss-islingtonsetpull_requests: + pull_request22199
2020-11-16 01:31:03miss-islingtonsetnosy: + miss-islington
messages: + msg381051
2020-11-12 21:28:21gvanrossumsetmessages: + msg380851
2020-11-12 21:25:08Dominik V.setmessages: + msg380850
2020-11-12 20:38:13Dominik V.setkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request22150
2020-11-12 15:48:05kjsetmessages: + msg380831
2020-11-11 18:10:05gvanrossumsetmessages: + msg380784
stage: needs patch
2020-11-11 14:01:02kjsetnosy: + kj, gvanrossum, levkivskyi

messages: + msg380762
versions: + Python 3.10
2020-11-10 19:35:37Dominik V.create