classification
Title: Make match patterns explicit in the AST
Type: enhancement Stage: resolved
Components: Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brandtbucher Nosy List: BTaskaya, Guido.van.Rossum, Mark.Shannon, brandtbucher, freundTech, ncoghlan, pablogsal
Priority: normal Keywords: patch

Created on 2021-04-20 11:59 by ncoghlan, last changed 2021-04-30 00:19 by brandtbucher. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 25521 closed ncoghlan, 2021-04-22 09:52
PR 25585 merged ncoghlan, 2021-04-25 08:43
PR 25735 merged brandtbucher, 2021-04-29 23:02
Messages (26)
msg391430 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-20 11:59
In the SC submission ticket for PEP 642 (https://github.com/python/steering-council/issues/43 ), Guido indicated he was in favour of the more explicit pattern matching AST changes suggested in that PEP.

This ticket covers adapting the pattern matching implementation to explicitly separate pattern nodes from expr nodes in the AST, so the code generator doesn't need to be context dependent.
msg391433 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-20 12:14
Given that the AST is a public interface, I would advise to implement this before beta freeze of 3.10 to avoid making this a breaking change in the future for linters and the like
msg391437 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-20 13:53
Agreed. I had wanted the AST to be part of the PEPs specifically *because* it's a public API, but didn't check until today whether or not the original PEP 634 implementation had been merged as-is, or with a cleaned up AST definition.

https://github.com/ncoghlan/cpython/pull/8/files is the initial implementation with the AST and Grammar file updated.

I'll only create the CPython PR once the tests are passing, but I'll try to give Brandt at least a week to review it (I'll also ping him directly via email).
msg391438 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-20 14:01
It's specifically the definition of "match_case" in the AST that is affected:

match_case = (pattern pattern, expr? guard, stmt* body)

    pattern = MatchAlways
         | MatchValue(expr value)
         | MatchConstant(constant value)
         | MatchSequence(pattern* patterns)
         | MatchMapping(expr* keys, pattern* patterns)
         | MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)

         | MatchRestOfSequence(identifier? target)
         -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys

         | MatchAs(pattern? pattern, identifier target)
         | MatchOr(pattern* patterns)
          attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)

Relative to the PEP 642 AST, the notion of a "matchop" is gone - MatchValue always compares by equality, and there's a MatchConstant node for the identity comparisons against None, True, and False.

The grammar, code generator, AST validator, and unparser are then updated to produce or consume the new AST nodes.
msg391448 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2021-04-20 16:12
Hm, for some reason I was thinking that this was safe to do after the feature freeze. Let's get to it then!

Batuhan, perhaps we should change the linked issue for your AST validator PR to this one. That way we can close the old catch-all PEP 634 implementation issue and keep the discussion focused here.

Nick, you might find the discussion on this mypy PR (https://github.com/python/mypy/pull/10191) helpful. In particular, Adrian raises some points about ways we could make type inference in the AST a bit neater. For example: not all patterns can be used for mapping keys. Perhaps there is a way to organize parts of the AST hierarchy that makes this a bit more concrete (though we should be careful to not limit reasonable syntax extensions in the future by doing so).

A few notes, just from skimming the outline here:

> MatchValue(expr value)

MatchValue technically contains an expression node, although the actual expressions it can contain are quite limited (dotted names and negative/complex numbers, if I understand correctly). Can we gain anything by splitting this up into nodes for each of these (MatchValue(identifier* parts), etc...) instead?

> MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)

If I remember correctly (this part was implemented a while ago), collecting the different positional and keyword arguments in the parser is sort of simple since we can pass around sequences of keyword nodes easily. I *think* that the new proposed design would require hacks like creating dummy MatchClass nodes with *only* the keyword parts and later combining them with the positional arguments and a class name, since it keeps the keyword names separate from their arguments. Maybe some of the parser folks on this thread can chime in on that.

Also, continuing the theme above, I think we can be more specific with "expr cls" here (maybe "identifier* cls"). 

> MatchRestOfSequence(identifier? target)

In the interest of brevity, maybe "MatchStar" or something?

> A NULL entry in the MatchMapping key list handles capturing extra mapping keys

I think we could just give MatchMapping a "identifier? rest" member, right? That way it becomes a bit easier to use, and we don't need to worry if it's last or not. It also keeps us from having to handle empty entries in the keys.

Oh, also, the ast unparser will need to be updated as well.
msg391457 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-20 17:45
Also, I would like some more extensive argument or why not having context dependent nodes justifies the extra maintainance cost (I'm not against that of course, and I am sympathetic, but I think this needs some more extensive coverage or why we are doing this). Notice that we already have context dependent nodes and attributes, as well as implicit meaning of some values like None.
msg391471 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2021-04-20 19:36
> Batuhan, perhaps we should change the linked issue for your AST validator PR to this one. That way we can close the old catch-all PEP 634 implementation issue and keep the discussion focused here.

I think it might be better as a separate issue that has a dependency on this one. I've created issue 43897 for this.
msg391485 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-20 23:48
https://www.python.org/dev/peps/pep-0642/#representing-patterns-explicitly-in-the-abstract-syntax-tree covers the rationale for why it is potentially problematic to reuse expression nodes for match patterns: it doesn't semantically align with the fact that patterns are *not* expressions, so the AST ends up being closer to a concrete syntax tree than we would like.

For the specifics of the AST nodes:

* restricting sub expressions directly in the AST itself is tricky without side effects on the AST for non-pattern-matching code, so I'm intending to add any such restrictions that we want to enforce to the validator (other statements and expressions like yield, await, return, break, and continue already have separately enforced restrictions like that)
* cls needs to be an expression node to allow for attribute lookups
* the linked draft PR already updates the grammar to emit the new AST. MatchClass is able to collect the keyword args and their corresponding patterns in roughly the same way that MatchMapping collects its key:pattern pairs
msg391599 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-22 13:30
The draft PR moved because I messed up the previous branch name: https://github.com/ncoghlan/cpython/pull/9/files

It's to the point where everything except symtable.c compiles.

I put an initial skeleton of a validator in, but only enough to check that the parallel sequence lengths in the class and mapping pattern nodes are consistent - there's still plenty to be done in Batuhan's PR (e.g. because the AST constant folding now just delegates to the expression folding functions for MatchValue values and MatchMapping keys, this iteration only enforces the restrictions on permitted subexpressions in the surface syntax).


In addition to Brandt's MatchStar suggestion, I've also tweaked the MatchClassparameter names to better match PEP 634 (the old names were based on PEP 642's attribute pattern concept, so "extra_*" made sense, but "kwd_*" makes more sense for PEP 634):

    pattern = MatchAlways
         | MatchValue(expr value)
         | MatchConstant(constant value)
         | MatchSequence(pattern* patterns)
         | MatchMapping(expr* keys, pattern* patterns)
         | MatchClass(expr cls, pattern* patterns, identifier* kwd_attrs, pattern* kwd_patterns)

         | MatchStar(identifier? target)
         -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys

         | MatchAs(pattern? pattern, identifier target)
         | MatchOr(pattern* patterns)
          attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)

I think the idea of making the MatchMapping node signature "MatchMapping(expr* keys, pattern* patterns, identifier? rest)" has merit, but I'd like to get this initial iteration reviewed first to minimise the diff in the compiler_pattern_mapping implementation (right now it is fairly clear that the code generation for mapping patterns hasn't changed, but that will become significantly less obvious if/when the "**rest" handling changes away from working the same way _PyAST_Dict works)
msg391610 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2021-04-22 16:14
>          | MatchStar(identifier? target)

+1 on MatchStar(). Much more similiar to the existing node names, and also less semantically-named. 

>           attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)

I'm not really sure whether we should continue following this old tradition of int? end_* attributes for new sum types like pattern, considering that it was done for supporting the creation of old nodes 3.8< without any change on 3.8>=. pattern subclasses will only be created on 3.10>= so we should be safe by requiring end_* attributes.
msg391622 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-22 18:46
> I'm not really sure whether we should continue following this old tradition of int? end_* attributes for new sum types like pattern, considering that it was done for supporting the creation of old nodes 3.8< without any change on 3.8>=. pattern subclasses will only be created on 3.10>= so we should be safe by requiring end_* attributes.

Yeah, it will be a requirement very soon actually.
msg391668 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-23 09:06
I've removed the "?" from the end attributes for pattern nodes.

The draft PR branch now compiles, but I've clearly made a mistake somewhere as it segfaults when trying to compile test_patma.py.
msg391672 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-23 09:11
My proposed MatchConstant node name was bothering me, as most constants are actually matched by equality in MatchValue, the same way variables are.

So I'm switching my proposed name for that node to be MatchSingleton, since that better reflects the key difference with MatchValue (i.e. comparing by identity rather than by value, and specifically comparing with the builtin singletons).
msg391674 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-23 09:56
Segfaults are fixed (I had a couple of cases where I wasn't checking for NULL, and another where I was looking up MatchMapping attributes on a MatchClass node).

test_patma passes except for test cases 240, 241, and 242, which look like genuine regressions - the logic to reject illegal MatchValue nodes is missing from the code generator side, and these particular values make it through the parser OK. I'll need to add back some of the code I took out so these still get rejected.
msg391687 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-23 12:18
To get the "0+0" matching case rejected again I had to flesh out a bit more of the AST validator. This initial piece of the validator replaces the previously pattern-matching-specific AST optimisation code that refused to fold BinOp expressions that weren't complex numbers, allowing the compiler to reject them.

I also added two new tests (240b and 241b) to ensure that 0+0 & f"" were also rejected as mapping keys. 241b passes with the current PR, as f"" gets rejected by both the check in the AST validator *and* by the "constant or attribute lookup" restriction in the compiler, so the latter check covers for the leaky validator.

240b is disabled for now, as it won't pass until the AST validator checks all the subexpressions being used as keys in a mapping pattern (because it gets constant folded to "0", the check in the compiler accepts it).

That leaves fixing the unparser tests as the key thing to do before I create the main PR.

Before that, though, I'll look at potentially simplifying the MatchMapping code by changing the signature as Brandt suggested.
msg391694 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-23 13:16
https://github.com/ncoghlan/cpython/pull/9/commits/54940a3817df3046da3f9c51d4d426a73b2ec786 implements Brandt's suggestion of disallowing NULL keys in MatchMapping and instead passing in an explicit "rest" identifier to capture the extra keys.

This did require moving to a multi-branch pattern definition in the grammar, similar to class patterns, in order to enforce the separating comma when both ordinary key:pattern pairs and a double-star target are present. I think it's a nice ergonomic improvement on the AST node itself though, as it makes it trivial to check if a mapping patterns captures extra keys or not (vs the relatively non-obvious check to see if the last key is NULL/None).

That just leaves updating the unparser to handle the new node types.

Capturing the latest node definitions from the draft PR:

    pattern = MatchAlways
         | MatchValue(expr value)
         | MatchSingleton(constant value)
         | MatchSequence(pattern* patterns)
         | MatchMapping(expr* keys, pattern* patterns, identifier? rest)
         | MatchClass(expr cls, pattern* patterns, identifier* kwd_attrs, pattern* kwd_patterns)

         | MatchStar(identifier? target)
         -- The optional "rest" MatchMapping parameter handles capturing extra mapping keys

         | MatchAs(pattern? pattern, identifier target)
         | MatchOr(pattern* patterns)
          attributes (int lineno, int col_offset, int end_lineno, int end_col_offset)
msg391695 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-23 13:19
Setting clear timing expectations, given beta is so close: I'm not expecting to have time to finish the unparser work over the weekend, but I do expect to have time to finish it on Monday evening my time (UTC+10).
msg391712 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2021-04-23 17:15
I can probably find time to take a pass at the unparser, if you want.

> cls needs to be an expression node to allow for attribute lookups

That's why I suggested "identifier*"! Each identifier in the list would be one part of the dotted name, so we wouldn't need to worry about having arbitrary expressions:

Foo() -> MatchClass(["Foo"], [], [], [])
foo.bar.Baz() -> MatchClass(["foo", "bar", "Baz"], [], [], [])

Not a big deal, though.
msg391842 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-25 06:57
Interesting idea - I had missed that you were suggested "identifier*" for the cls node. It's simple enough to check for Name-or-Attribute in the AST validator though, and sticking with a standard expr_ty node means getting a lot of standard handling in the rest of the compiler and the unparser.

My commitments today finished earlier than I expected, so I'm about to tackle the unparser now.
msg391844 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-25 07:10
Working on the unparser picked up an inconsistency I had previously missed: aside from named expressions, "target" attributes in the AST are generally full expression subnodes, rather than identifier attributes. Identifier attributes on nodes are typically called "name", including in the originally implemented MatchAs node.

Accordingly, I've switched both MatchStar and MatchAs over to having "name" attributes instead of "target" attributes.
msg391845 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-25 08:56
PR against the main repo is now up: https://github.com/python/cpython/pull/25585

A couple of things in the unparser tripped me up (ast_unparse.c only handles annotation expressions, the way ast._Unparser.visit() is defined means NodeVisitor.generic_visit can't be used), so the PR also includes some clarifying comments for future editors.
msg391943 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-26 15:27
PR is green now, and I don't have any further changes I want to make, so Brandt, please feel free to merge if you're happy with the proposed AST nodes, and the way I've gone about implementing them.

If you'd like some adjustments, it probably makes the most sense to just amend the PR directly - otherwise the 24-hour turnaround on comments due to the time zone difference will potentially give us trouble with the beta deadline.
msg391950 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2021-04-26 17:01
Sounds good. I'll probably be able to make time to review it today, or tomorrow at the latest.
msg392279 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2021-04-29 05:59
New changeset 1e7b858575d0ad782939f86aae4a2fa1c29e9f14 by Nick Coghlan in branch 'master':
bpo-43892: Make match patterns explicit in the AST (GH-25585)
https://github.com/python/cpython/commit/1e7b858575d0ad782939f86aae4a2fa1c29e9f14
msg392300 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2021-04-29 11:27
After the next docs build, the documentation for the explicit AST nodes should be available https://docs.python.org/dev/library/ast.html#pattern-matching
msg392368 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2021-04-30 00:19
New changeset dbe60ee09dc5a624cfb78dff61ecf050a5b3f105 by Brandt Bucher in branch 'master':
bpo-43892: Validate the first term of complex literal value patterns (GH-25735)
https://github.com/python/cpython/commit/dbe60ee09dc5a624cfb78dff61ecf050a5b3f105
History
Date User Action Args
2021-04-30 00:19:37brandtbuchersetmessages: + msg392368
2021-04-29 23:02:22brandtbuchersetpull_requests: + pull_request24426
2021-04-29 11:27:27ncoghlansetstatus: open -> closed
resolution: fixed
messages: + msg392300

stage: commit review -> resolved
2021-04-29 05:59:03brandtbuchersetmessages: + msg392279
2021-04-26 17:01:27brandtbuchersetmessages: + msg391950
2021-04-26 15:28:41ncoghlansetassignee: ncoghlan -> brandtbucher
2021-04-26 15:28:02ncoghlansetstage: patch review -> commit review
2021-04-26 15:27:53ncoghlansetmessages: + msg391943
2021-04-25 08:56:33ncoghlansetmessages: + msg391845
2021-04-25 08:43:54ncoghlansetstage: needs patch -> patch review
pull_requests: + pull_request24305
2021-04-25 07:10:49ncoghlansetmessages: + msg391844
2021-04-25 06:57:42ncoghlansetmessages: + msg391842
2021-04-23 17:15:46brandtbuchersetmessages: + msg391712
2021-04-23 13:19:05ncoghlansetmessages: + msg391695
2021-04-23 13:16:29ncoghlansetmessages: + msg391694
2021-04-23 12:18:32ncoghlansetmessages: + msg391687
2021-04-23 09:56:12ncoghlansetmessages: + msg391674
2021-04-23 09:11:48ncoghlansetmessages: + msg391672
2021-04-23 09:06:18ncoghlansetmessages: + msg391668
2021-04-22 18:46:30pablogsalsetmessages: + msg391622
2021-04-22 16:14:09BTaskayasetmessages: + msg391610
2021-04-22 13:30:36ncoghlansetmessages: + msg391599
2021-04-22 10:01:39ncoghlansetstage: patch review -> needs patch
2021-04-22 09:52:04ncoghlansetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request24241
2021-04-20 23:48:56ncoghlansetmessages: + msg391485
2021-04-20 19:36:10BTaskayasetmessages: + msg391471
2021-04-20 19:35:46BTaskayalinkissue43897 dependencies
2021-04-20 18:08:44freundTechsetnosy: + freundTech
2021-04-20 17:45:52pablogsalsetmessages: + msg391457
2021-04-20 16:12:23brandtbuchersetnosy: + BTaskaya
messages: + msg391448
2021-04-20 14:21:53Guido.van.Rossumsetnosy: + Mark.Shannon
2021-04-20 14:16:27Guido.van.Rossumsetnosy: + Guido.van.Rossum
2021-04-20 14:01:31ncoghlansetmessages: + msg391438
2021-04-20 13:53:34ncoghlansetmessages: + msg391437
2021-04-20 12:14:20pablogsalsetnosy: + pablogsal
messages: + msg391433
2021-04-20 11:59:54ncoghlancreate