This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.11, Python 3.10
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: brandtbucher Nosy List: Mark.Shannon, brandtbucher, gvanrossum, pablogsal, quentel, steven.daprano
Priority: high Keywords:

Created on 2021-07-26 14:23 by quentel, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (12)
msg398226 - (view) Author: Pierre Quentel (quentel) * Date: 2021-07-26 14:23
This code

    match range(42):
        case [x, *w, y]:
            z = 0

sets w to a list with 40 items : the length of the subject, minus the number of non-star subpatterns.

But this code (adapted from test_patma_186) enters an infinite loop

    import collections.abc

    class Seq(collections.abc.Sequence):
        def __getitem__(self, i):
            print('get item', i)
            return i
        def __len__(self):
            return 42

    match Seq():
        case [x, *w, y]:
            z = 0

__getitem__ gets called forever, instead of stopping when the expected number of items in w is reached.
msg398244 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2021-07-26 17:24
(Raising the priority to "high" because any decision on this should ideally be made before the 3.10 RCs.)

Hm, interesting. This is because we use UNPACK_EX for these patterns, so the destructuring is basically the same as if you had written:

[x, *w, y] = Seq()

...which also hangs.

We actually have three implementations for destructuring sequences:
- Using UNPACK_EX, when there is a starred name.
- Using BINARY_SUBSCR, when there is a starred wildcard.
- Using UNPACK_SEQUENCE, when there is no star.

When using your Seq class:
- The first fails by falling into an infinite loop.
- The second works as expected.
- The third fails with a ValueError at runtime (for a length mismatch).

*If* we decide that this is a big enough problem to fix, then I think the easiest way of doing it is to use the BINARY_SUBSCR implementation for all three paths (we'll just also need to use BUILD_LIST / LIST_APPEND to actually collect the starred items). It will simplify the pattern compiler a bit, but I imagine it will also come with a performance penalty as well.

In my opinion, I don't think we should rewrite everything to support Seq (though my mind isn't made up entirely). Sequences are supposed to be sized and iterable, but Seq doesn't really support the iteration protocol correctly (it expects the iterating code to stop once the length is exhausted, rather than raising StopIteration).

I'm curious to hear opinions on whether we want to actually fix this, though. It seems that it will always be possible to write classes like Seq the hack our pattern-matching implementation with dubious sequences and mappings, so it really comes down to: is supporting classes like Seq worth potentially slowing down all other sequence patterns?
msg398245 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-07-26 19:01
I don't think there is anything technically wrong here. The Seq class is relying on the legacy version of the iterator protocol, which specifies that you need to raise IndexError there to stop the iteration.

For instance, this version works:

import collections.abc

class Seq(collections.abc.Sequence):
    def __getitem__(self, i):
        if i >= len(self):
            raise IndexError
        return i
    def __len__(self):
        return 42

match Seq():
    case [x, *w, y]:
        z = 0
        print(z)


Also, one could argue that Seq has exactly the same problem all over the language. For instance

>> list(Seq())

will hang

>> [x,*y] = Seq()

will hang

and so on and so forth. So, if I am a user and I am trying to **predict** how this would behave based on these two experiments my conclusion would be:

"Anything that tries to iterate on this thing will hang"

I think that whatever we do, we should make it *predictable* and consistency across the language, and IMHO only fixing it in pattern matching doesn't make it predictable. Being said that, I don't think these use cases justifies a major overhaul of how this behaves everywhere.
msg398246 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-07-26 19:01
In general I would propose to close this as "not a bug"
msg398248 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2021-07-26 19:17
I agree, Pablo.

The only thing that makes me pause is that, depending on the pattern, *sometimes* we use iteration and *sometimes* we use indexing. That might be sort of surprising to classes like Seq if you change patterns or major Python versions and this thing starts hanging. Hopefully, though, the reason will be relatively easy to debug and fix.

Closing as "not a bug".
msg398266 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-07-26 22:51
How is this not a regression? And a very serious one by the sound of it. All the examples that are said to go into an infinite loop work fine in 3.9.

(Obviously I can't check the match statement example in 3.9.)

If the Sequence Protocol really has been dropped from Python's execution model, shouldn't there be a PEP for such a major change in behaviour? Legacy or not, it is still a part of the language.

At the very least, the examples shouldn't swallow the IndexError and go into an infinite loop.

Brandt said:

> the destructuring is basically the same as if you had written:
> 
> [x, *w, y] = Seq()
> 
> ...which also hangs.


It works fine in 3.9, with or without inheriting from the Sequence ABC.

# Python 3.9
>>> class A:
...     def __len__(self):
...             return 5
...     def __getitem__(self, i):
...             print(i)
...             if i < 5:
...                     return i
...             raise IndexError
... 
>>> [x, *y, z] = A()
0
1
2
3
4
5
>>> x, y, z
(0, [1, 2, 3], 4)


Likewise for list(A()), etc.

The __len__ method isn't needed for iteration to work correctly. I just included it to match Pierre's example.

Inheriting from collections.abc.Sequence does not change the behaviour.

Still in 3.9:

>>> list(A())
0
1
2
3
4
5
[0, 1, 2, 3, 4]


I think that closing this is as Not A Bug was premature. If this isn't a bug, then I have no idea what counts as a bug any longer :-(
msg398268 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2021-07-26 23:02
> All the examples that are said to go into an infinite loop work fine in 3.9. [...] At the very least, the examples shouldn't swallow the IndexError and go into an infinite loop.

Note that the original example posted did not include a "raise IndexError" branch in its __getitem__. That was class I was referring to, since the lack of an IndexError causes the iteration to continue forever.

As far as I know, nothing has changed about the iteration protocol in 3.10.
msg398271 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-07-26 23:35
Ah, I missed that and totally misinterrpreted other comments. 
Confirmation bias at work: I saw the code I expected to see, not the 
code that was actually there :-(

All good, I agree that it is a bug in the class, not in the language. 
Sorry for the noise.
msg398280 - (view) Author: Pierre Quentel (quentel) * Date: 2021-07-27 06:59
Thanks for the explanations, but I feel unconfortable with the fact that variable-length sequence patterns are implemented the same as unpacking. (sorry if this has been discussed before, I can't find references to the discussions that lead to the current version of the PEP).

There are obvious similarities between

    [x, *y, z] = A

and

    match A:
        case [x, *y, z]:
            print('ok')

but a big difference, put forward in PEP 634 : the classes supported for pattern matching are limited (unpacking a generator expression is possible, but they are not supported as subjects for sequence patterns), and the PEP explicitely refers to them having a len().

It seems to me that this implies that the algorithms should be different:

- for unpacking, the iterable must be completely consumed before binding the names on the left-hand side. If it is infinite, unpacking fails

- for variable-length sequence pattern matching (this is how I understand the last paragraph about them in PEP 634):
    . get the subject length
    . iterate one by one before the star pattern
    . iterate (len(subject) - number of non-star patterns) times for the star pattern
    . iterate one by one after the star pattern

In the second case, even if the subject never raises StopIteration, the match succeeds.

Does this make sense ?
msg398301 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-07-27 14:21
Given that the class used to demonstrate this behavior has a bug (__len__ is inconsistent with __getitem__), I don't think it matters much -- if doing it your way is (as I suspect) slower than the current implementation for other (well-behaved) sequences, we should not change the implementation.
msg398309 - (view) Author: Pierre Quentel (quentel) * Date: 2021-07-27 16:34
Oh, I did not invent this class, it is in the test script for pattern matching : https://github.com/python/cpython/blob/6948964ecf94e858448dd28eea634317226d2913/Lib/test/test_patma.py#L1932

With this class, [x, *_, y] matches, but not [x, *w, y] : this is what made me create this issue. Maybe it would be a good idea to change this class in test_patma.py ?

OTOH, if the current implementation remains the same, why does the PEP insist on subjects having a len() ? Could sequence patterns match a wider range of subjects that can be unpacked ?
msg398461 - (view) Author: Pierre Quentel (quentel) * Date: 2021-07-29 07:03
I found why len() is required, it's to avoid trying to match the subject (thus consuming a part of it) if its length is less than the number of non-star patterns, as explained in the PEP.

My mistake, sorry.
History
Date User Action Args
2022-04-11 14:59:47adminsetgithub: 88904
2021-07-29 07:03:04quentelsetmessages: + msg398461
2021-07-27 16:34:24quentelsetmessages: + msg398309
2021-07-27 14:21:05gvanrossumsetmessages: + msg398301
2021-07-27 06:59:16quentelsetmessages: + msg398280
2021-07-26 23:35:27steven.dapranosetmessages: + msg398271
2021-07-26 23:02:59brandtbuchersetmessages: + msg398268
2021-07-26 22:51:02steven.dapranosetnosy: + steven.daprano
messages: + msg398266
2021-07-26 19:17:38brandtbuchersetstatus: open -> closed
type: crash -> behavior
messages: + msg398248

resolution: not a bug
stage: resolved
2021-07-26 19:01:47pablogsalsetmessages: + msg398246
2021-07-26 19:01:02pablogsalsetmessages: + msg398245
2021-07-26 17:24:01brandtbuchersetpriority: normal -> high

nosy: + gvanrossum, Mark.Shannon, pablogsal
messages: + msg398244

assignee: brandtbucher
2021-07-26 16:56:59brandtbuchersetnosy: + brandtbucher
2021-07-26 14:23:39quentelcreate