Make match patterns explicit in the AST #88058

ncoghlan · 2021-04-20T11:59:55Z

BPO	43892
Nosy	@ncoghlan, @markshannon, @gvanrossum, @pablogsal, @brandtbucher, @isidentical, @freundTech
PRs	WIP bpo-43892: Make match patterns explicit in AST #25521 bpo-43892: Make match patterns explicit in AST #25585 bpo-43892: Validate the first term of complex literal value patterns #25735

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/brandtbucher'
closed_at = <Date 2021-04-29.11:27:27.671>
created_at = <Date 2021-04-20.11:59:54.593>
labels = ['type-feature', '3.10']
title = 'Make match patterns explicit in the AST'
updated_at = <Date 2021-04-30.00:19:37.153>
user = 'https://github.com/ncoghlan'

bugs.python.org fields:

activity = <Date 2021-04-30.00:19:37.153>
actor = 'brandtbucher'
assignee = 'brandtbucher'
closed = True
closed_date = <Date 2021-04-29.11:27:27.671>
closer = 'ncoghlan'
components = []
creation = <Date 2021-04-20.11:59:54.593>
creator = 'ncoghlan'
dependencies = []
files = []
hgrepos = []
issue_num = 43892
keywords = ['patch']
message_count = 26.0
messages = ['391430', '391433', '391437', '391438', '391448', '391457', '391471', '391485', '391599', '391610', '391622', '391668', '391672', '391674', '391687', '391694', '391695', '391712', '391842', '391844', '391845', '391943', '391950', '392279', '392300', '392368']
nosy_count = 7.0
nosy_names = ['ncoghlan', 'Mark.Shannon', 'Guido.van.Rossum', 'pablogsal', 'brandtbucher', 'BTaskaya', 'freundTech']
pr_nums = ['25521', '25585', '25735']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue43892'
versions = ['Python 3.10']

ncoghlan · 2021-04-20T11:59:54Z

In the SC submission ticket for PEP-642 (python/steering-council#43 ), Guido indicated he was in favour of the more explicit pattern matching AST changes suggested in that PEP.

This ticket covers adapting the pattern matching implementation to explicitly separate pattern nodes from expr nodes in the AST, so the code generator doesn't need to be context dependent.

pablogsal · 2021-04-20T12:14:20Z

Given that the AST is a public interface, I would advise to implement this before beta freeze of 3.10 to avoid making this a breaking change in the future for linters and the like

ncoghlan · 2021-04-20T13:53:35Z

Agreed. I had wanted the AST to be part of the PEPs specifically *because* it's a public API, but didn't check until today whether or not the original PEP-634 implementation had been merged as-is, or with a cleaned up AST definition.

https://github.com/ncoghlan/cpython/pull/8/files is the initial implementation with the AST and Grammar file updated.

I'll only create the CPython PR once the tests are passing, but I'll try to give Brandt at least a week to review it (I'll also ping him directly via email).

ncoghlan · 2021-04-20T14:01:31Z

It's specifically the definition of "match_case" in the AST that is affected:

match_case = (pattern pattern, expr? guard, stmt* body)

    pattern = MatchAlways
         | MatchValue(expr value)
         | MatchConstant(constant value)
         | MatchSequence(pattern* patterns)
         | MatchMapping(expr* keys, pattern* patterns)
         | MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)

     | MatchRestOfSequence(identifier? target)
     -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys

     | MatchAs(pattern? pattern, identifier target)
     | MatchOr(pattern* patterns)
      attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)

Relative to the PEP-642 AST, the notion of a "matchop" is gone - MatchValue always compares by equality, and there's a MatchConstant node for the identity comparisons against None, True, and False.

The grammar, code generator, AST validator, and unparser are then updated to produce or consume the new AST nodes.

brandtbucher · 2021-04-20T16:12:23Z

Hm, for some reason I was thinking that this was safe to do after the feature freeze. Let's get to it then!

Batuhan, perhaps we should change the linked issue for your AST validator PR to this one. That way we can close the old catch-all PEP-634 implementation issue and keep the discussion focused here.

Nick, you might find the discussion on this mypy PR (python/mypy#10191) helpful. In particular, Adrian raises some points about ways we could make type inference in the AST a bit neater. For example: not all patterns can be used for mapping keys. Perhaps there is a way to organize parts of the AST hierarchy that makes this a bit more concrete (though we should be careful to not limit reasonable syntax extensions in the future by doing so).

A few notes, just from skimming the outline here:

MatchValue(expr value)

MatchValue technically contains an expression node, although the actual expressions it can contain are quite limited (dotted names and negative/complex numbers, if I understand correctly). Can we gain anything by splitting this up into nodes for each of these (MatchValue(identifier* parts), etc...) instead?

MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)

If I remember correctly (this part was implemented a while ago), collecting the different positional and keyword arguments in the parser is sort of simple since we can pass around sequences of keyword nodes easily. I *think* that the new proposed design would require hacks like creating dummy MatchClass nodes with *only* the keyword parts and later combining them with the positional arguments and a class name, since it keeps the keyword names separate from their arguments. Maybe some of the parser folks on this thread can chime in on that.

Also, continuing the theme above, I think we can be more specific with "expr cls" here (maybe "identifier* cls").

MatchRestOfSequence(identifier? target)

In the interest of brevity, maybe "MatchStar" or something?

A NULL entry in the MatchMapping key list handles capturing extra mapping keys

I think we could just give MatchMapping a "identifier? rest" member, right? That way it becomes a bit easier to use, and we don't need to worry if it's last or not. It also keeps us from having to handle empty entries in the keys.

Oh, also, the ast unparser will need to be updated as well.

pablogsal · 2021-04-20T17:45:52Z

Also, I would like some more extensive argument or why not having context dependent nodes justifies the extra maintainance cost (I'm not against that of course, and I am sympathetic, but I think this needs some more extensive coverage or why we are doing this). Notice that we already have context dependent nodes and attributes, as well as implicit meaning of some values like None.

isidentical · 2021-04-20T19:36:11Z

Batuhan, perhaps we should change the linked issue for your AST validator PR to this one. That way we can close the old catch-all PEP-634 implementation issue and keep the discussion focused here.

I think it might be better as a separate issue that has a dependency on this one. I've created bpo-43897 for this.

ncoghlan · 2021-04-20T23:48:56Z

https://www.python.org/dev/peps/pep-0642/#representing-patterns-explicitly-in-the-abstract-syntax-tree covers the rationale for why it is potentially problematic to reuse expression nodes for match patterns: it doesn't semantically align with the fact that patterns are *not* expressions, so the AST ends up being closer to a concrete syntax tree than we would like.

For the specifics of the AST nodes:

restricting sub expressions directly in the AST itself is tricky without side effects on the AST for non-pattern-matching code, so I'm intending to add any such restrictions that we want to enforce to the validator (other statements and expressions like yield, await, return, break, and continue already have separately enforced restrictions like that)
cls needs to be an expression node to allow for attribute lookups
the linked draft PR already updates the grammar to emit the new AST. MatchClass is able to collect the keyword args and their corresponding patterns in roughly the same way that MatchMapping collects its key:pattern pairs

ncoghlan · 2021-04-22T13:30:36Z

The draft PR moved because I messed up the previous branch name: https://github.com/ncoghlan/cpython/pull/9/files

It's to the point where everything except symtable.c compiles.

I put an initial skeleton of a validator in, but only enough to check that the parallel sequence lengths in the class and mapping pattern nodes are consistent - there's still plenty to be done in Batuhan's PR (e.g. because the AST constant folding now just delegates to the expression folding functions for MatchValue values and MatchMapping keys, this iteration only enforces the restrictions on permitted subexpressions in the surface syntax).

In addition to Brandt's MatchStar suggestion, I've also tweaked the MatchClassparameter names to better match PEP-634 (the old names were based on PEP-642's attribute pattern concept, so "extra_" made sense, but "kwd_" makes more sense for PEP-634):

    pattern = MatchAlways
         | MatchValue(expr value)
         | MatchConstant(constant value)
         | MatchSequence(pattern* patterns)
         | MatchMapping(expr* keys, pattern* patterns)
         | MatchClass(expr cls, pattern* patterns, identifier* kwd_attrs, pattern* kwd_patterns)

     | MatchStar(identifier? target)
     -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys

     | MatchAs(pattern? pattern, identifier target)
     | MatchOr(pattern* patterns)
      attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)

I think the idea of making the MatchMapping node signature "MatchMapping(expr* keys, pattern* patterns, identifier? rest)" has merit, but I'd like to get this initial iteration reviewed first to minimise the diff in the compiler_pattern_mapping implementation (right now it is fairly clear that the code generation for mapping patterns hasn't changed, but that will become significantly less obvious if/when the "**rest" handling changes away from working the same way _PyAST_Dict works)

isidentical · 2021-04-22T16:14:09Z

     | MatchStar(identifier? target)

+1 on MatchStar(). Much more similiar to the existing node names, and also less semantically-named.

      attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)

I'm not really sure whether we should continue following this old tradition of int? end_* attributes for new sum types like pattern, considering that it was done for supporting the creation of old nodes 3.8< without any change on 3.8>=. pattern subclasses will only be created on 3.10>= so we should be safe by requiring end_* attributes.

pablogsal · 2021-04-22T18:46:30Z

I'm not really sure whether we should continue following this old tradition of int? end_* attributes for new sum types like pattern, considering that it was done for supporting the creation of old nodes 3.8< without any change on 3.8>=. pattern subclasses will only be created on 3.10>= so we should be safe by requiring end_* attributes.

Yeah, it will be a requirement very soon actually.

ncoghlan · 2021-04-23T09:06:18Z

I've removed the "?" from the end attributes for pattern nodes.

The draft PR branch now compiles, but I've clearly made a mistake somewhere as it segfaults when trying to compile test_patma.py.

ncoghlan · 2021-04-23T09:11:48Z

My proposed MatchConstant node name was bothering me, as most constants are actually matched by equality in MatchValue, the same way variables are.

So I'm switching my proposed name for that node to be MatchSingleton, since that better reflects the key difference with MatchValue (i.e. comparing by identity rather than by value, and specifically comparing with the builtin singletons).

ncoghlan · 2021-04-23T09:56:13Z

Segfaults are fixed (I had a couple of cases where I wasn't checking for NULL, and another where I was looking up MatchMapping attributes on a MatchClass node).

test_patma passes except for test cases 240, 241, and 242, which look like genuine regressions - the logic to reject illegal MatchValue nodes is missing from the code generator side, and these particular values make it through the parser OK. I'll need to add back some of the code I took out so these still get rejected.

ncoghlan · 2021-04-23T12:18:32Z

To get the "0+0" matching case rejected again I had to flesh out a bit more of the AST validator. This initial piece of the validator replaces the previously pattern-matching-specific AST optimisation code that refused to fold BinOp expressions that weren't complex numbers, allowing the compiler to reject them.

I also added two new tests (240b and 241b) to ensure that 0+0 & f"" were also rejected as mapping keys. 241b passes with the current PR, as f"" gets rejected by both the check in the AST validator *and* by the "constant or attribute lookup" restriction in the compiler, so the latter check covers for the leaky validator.

240b is disabled for now, as it won't pass until the AST validator checks all the subexpressions being used as keys in a mapping pattern (because it gets constant folded to "0", the check in the compiler accepts it).

That leaves fixing the unparser tests as the key thing to do before I create the main PR.

Before that, though, I'll look at potentially simplifying the MatchMapping code by changing the signature as Brandt suggested.

ncoghlan · 2021-04-23T13:16:29Z

ncoghlan@54940a3 implements Brandt's suggestion of disallowing NULL keys in MatchMapping and instead passing in an explicit "rest" identifier to capture the extra keys.

This did require moving to a multi-branch pattern definition in the grammar, similar to class patterns, in order to enforce the separating comma when both ordinary key:pattern pairs and a double-star target are present. I think it's a nice ergonomic improvement on the AST node itself though, as it makes it trivial to check if a mapping patterns captures extra keys or not (vs the relatively non-obvious check to see if the last key is NULL/None).

That just leaves updating the unparser to handle the new node types.

Capturing the latest node definitions from the draft PR:

    pattern = MatchAlways
         | MatchValue(expr value)
         | MatchSingleton(constant value)
         | MatchSequence(pattern* patterns)
         | MatchMapping(expr* keys, pattern* patterns, identifier? rest)
         | MatchClass(expr cls, pattern* patterns, identifier* kwd_attrs, pattern* kwd_patterns)

     | MatchStar(identifier? target)
     -- The optional "rest" MatchMapping parameter handles capturing extra mapping keys

     | MatchAs(pattern? pattern, identifier target)
     | MatchOr(pattern* patterns)
      attributes (int lineno, int col_offset, int end_lineno, int end_col_offset)

ncoghlan · 2021-04-23T13:19:05Z

Setting clear timing expectations, given beta is so close: I'm not expecting to have time to finish the unparser work over the weekend, but I do expect to have time to finish it on Monday evening my time (UTC+10).

brandtbucher · 2021-04-23T17:15:46Z

I can probably find time to take a pass at the unparser, if you want.

cls needs to be an expression node to allow for attribute lookups

That's why I suggested "identifier*"! Each identifier in the list would be one part of the dotted name, so we wouldn't need to worry about having arbitrary expressions:

Foo() -> MatchClass(["Foo"], [], [], [])
foo.bar.Baz() -> MatchClass(["foo", "bar", "Baz"], [], [], [])

Not a big deal, though.

ncoghlan · 2021-04-25T06:57:42Z

Interesting idea - I had missed that you were suggested "identifier*" for the cls node. It's simple enough to check for Name-or-Attribute in the AST validator though, and sticking with a standard expr_ty node means getting a lot of standard handling in the rest of the compiler and the unparser.

My commitments today finished earlier than I expected, so I'm about to tackle the unparser now.

ncoghlan · 2021-04-25T07:10:49Z

Working on the unparser picked up an inconsistency I had previously missed: aside from named expressions, "target" attributes in the AST are generally full expression subnodes, rather than identifier attributes. Identifier attributes on nodes are typically called "name", including in the originally implemented MatchAs node.

Accordingly, I've switched both MatchStar and MatchAs over to having "name" attributes instead of "target" attributes.

ncoghlan · 2021-04-25T08:56:33Z

PR against the main repo is now up: #25585

A couple of things in the unparser tripped me up (ast_unparse.c only handles annotation expressions, the way ast._Unparser.visit() is defined means NodeVisitor.generic_visit can't be used), so the PR also includes some clarifying comments for future editors.

ncoghlan · 2021-04-26T15:27:54Z

PR is green now, and I don't have any further changes I want to make, so Brandt, please feel free to merge if you're happy with the proposed AST nodes, and the way I've gone about implementing them.

If you'd like some adjustments, it probably makes the most sense to just amend the PR directly - otherwise the 24-hour turnaround on comments due to the time zone difference will potentially give us trouble with the beta deadline.

brandtbucher · 2021-04-26T17:01:27Z

Sounds good. I'll probably be able to make time to review it today, or tomorrow at the latest.

brandtbucher · 2021-04-29T05:59:04Z

New changeset 1e7b858 by Nick Coghlan in branch 'master':
bpo-43892: Make match patterns explicit in the AST (GH-25585)
1e7b858

ncoghlan · 2021-04-29T11:27:28Z

After the next docs build, the documentation for the explicit AST nodes should be available https://docs.python.org/dev/library/ast.html#pattern-matching

brandtbucher · 2021-04-30T00:19:37Z

New changeset dbe60ee by Brandt Bucher in branch 'master':
bpo-43892: Validate the first term of complex literal value patterns (GH-25735)
dbe60ee

ncoghlan self-assigned this Apr 20, 2021

ncoghlan added 3.10 only security fixes type-feature A feature request or enhancement labels Apr 20, 2021

ncoghlan self-assigned this Apr 20, 2021

ncoghlan added 3.10 only security fixes type-feature A feature request or enhancement labels Apr 20, 2021

ncoghlan assigned brandtbucher and unassigned ncoghlan Apr 26, 2021

ncoghlan closed this as completed Apr 29, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

jacobtylerwalls mentioned this issue Jun 27, 2023

AST nodes for PEP 695 type param syntax do not require end_lineno nor end_col_offset #106145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make match patterns explicit in the AST #88058

Make match patterns explicit in the AST #88058

ncoghlan commented Apr 20, 2021

ncoghlan commented Apr 20, 2021

pablogsal commented Apr 20, 2021

ncoghlan commented Apr 20, 2021

ncoghlan commented Apr 20, 2021

brandtbucher commented Apr 20, 2021

pablogsal commented Apr 20, 2021

isidentical commented Apr 20, 2021

ncoghlan commented Apr 20, 2021

ncoghlan commented Apr 22, 2021

isidentical commented Apr 22, 2021

pablogsal commented Apr 22, 2021

ncoghlan commented Apr 23, 2021

ncoghlan commented Apr 23, 2021

ncoghlan commented Apr 23, 2021

ncoghlan commented Apr 23, 2021

ncoghlan commented Apr 23, 2021

ncoghlan commented Apr 23, 2021

brandtbucher commented Apr 23, 2021

ncoghlan commented Apr 25, 2021

ncoghlan commented Apr 25, 2021

ncoghlan commented Apr 25, 2021

ncoghlan commented Apr 26, 2021

brandtbucher commented Apr 26, 2021

brandtbucher commented Apr 29, 2021

ncoghlan commented Apr 29, 2021

brandtbucher commented Apr 30, 2021

Make match patterns explicit in the AST #88058

Make match patterns explicit in the AST #88058

Comments

ncoghlan commented Apr 20, 2021

ncoghlan commented Apr 20, 2021

pablogsal commented Apr 20, 2021

ncoghlan commented Apr 20, 2021

ncoghlan commented Apr 20, 2021

brandtbucher commented Apr 20, 2021

pablogsal commented Apr 20, 2021

isidentical commented Apr 20, 2021

ncoghlan commented Apr 20, 2021

ncoghlan commented Apr 22, 2021

isidentical commented Apr 22, 2021

pablogsal commented Apr 22, 2021

ncoghlan commented Apr 23, 2021

ncoghlan commented Apr 23, 2021

ncoghlan commented Apr 23, 2021

ncoghlan commented Apr 23, 2021

ncoghlan commented Apr 23, 2021

ncoghlan commented Apr 23, 2021

brandtbucher commented Apr 23, 2021

ncoghlan commented Apr 25, 2021

ncoghlan commented Apr 25, 2021

ncoghlan commented Apr 25, 2021

ncoghlan commented Apr 26, 2021

brandtbucher commented Apr 26, 2021

brandtbucher commented Apr 29, 2021

ncoghlan commented Apr 29, 2021

brandtbucher commented Apr 30, 2021