Issue43892
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-04-20 11:59 by ncoghlan, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 25521 | closed | ncoghlan, 2021-04-22 09:52 | |
PR 25585 | merged | ncoghlan, 2021-04-25 08:43 | |
PR 25735 | merged | brandtbucher, 2021-04-29 23:02 |
Messages (26) | |||
---|---|---|---|
msg391430 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-20 11:59 | |
In the SC submission ticket for PEP 642 (https://github.com/python/steering-council/issues/43 ), Guido indicated he was in favour of the more explicit pattern matching AST changes suggested in that PEP. This ticket covers adapting the pattern matching implementation to explicitly separate pattern nodes from expr nodes in the AST, so the code generator doesn't need to be context dependent. |
|||
msg391433 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2021-04-20 12:14 | |
Given that the AST is a public interface, I would advise to implement this before beta freeze of 3.10 to avoid making this a breaking change in the future for linters and the like |
|||
msg391437 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-20 13:53 | |
Agreed. I had wanted the AST to be part of the PEPs specifically *because* it's a public API, but didn't check until today whether or not the original PEP 634 implementation had been merged as-is, or with a cleaned up AST definition. https://github.com/ncoghlan/cpython/pull/8/files is the initial implementation with the AST and Grammar file updated. I'll only create the CPython PR once the tests are passing, but I'll try to give Brandt at least a week to review it (I'll also ping him directly via email). |
|||
msg391438 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-20 14:01 | |
It's specifically the definition of "match_case" in the AST that is affected: match_case = (pattern pattern, expr? guard, stmt* body) pattern = MatchAlways | MatchValue(expr value) | MatchConstant(constant value) | MatchSequence(pattern* patterns) | MatchMapping(expr* keys, pattern* patterns) | MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns) | MatchRestOfSequence(identifier? target) -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys | MatchAs(pattern? pattern, identifier target) | MatchOr(pattern* patterns) attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) Relative to the PEP 642 AST, the notion of a "matchop" is gone - MatchValue always compares by equality, and there's a MatchConstant node for the identity comparisons against None, True, and False. The grammar, code generator, AST validator, and unparser are then updated to produce or consume the new AST nodes. |
|||
msg391448 - (view) | Author: Brandt Bucher (brandtbucher) * | Date: 2021-04-20 16:12 | |
Hm, for some reason I was thinking that this was safe to do after the feature freeze. Let's get to it then! Batuhan, perhaps we should change the linked issue for your AST validator PR to this one. That way we can close the old catch-all PEP 634 implementation issue and keep the discussion focused here. Nick, you might find the discussion on this mypy PR (https://github.com/python/mypy/pull/10191) helpful. In particular, Adrian raises some points about ways we could make type inference in the AST a bit neater. For example: not all patterns can be used for mapping keys. Perhaps there is a way to organize parts of the AST hierarchy that makes this a bit more concrete (though we should be careful to not limit reasonable syntax extensions in the future by doing so). A few notes, just from skimming the outline here: > MatchValue(expr value) MatchValue technically contains an expression node, although the actual expressions it can contain are quite limited (dotted names and negative/complex numbers, if I understand correctly). Can we gain anything by splitting this up into nodes for each of these (MatchValue(identifier* parts), etc...) instead? > MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns) If I remember correctly (this part was implemented a while ago), collecting the different positional and keyword arguments in the parser is sort of simple since we can pass around sequences of keyword nodes easily. I *think* that the new proposed design would require hacks like creating dummy MatchClass nodes with *only* the keyword parts and later combining them with the positional arguments and a class name, since it keeps the keyword names separate from their arguments. Maybe some of the parser folks on this thread can chime in on that. Also, continuing the theme above, I think we can be more specific with "expr cls" here (maybe "identifier* cls"). > MatchRestOfSequence(identifier? target) In the interest of brevity, maybe "MatchStar" or something? > A NULL entry in the MatchMapping key list handles capturing extra mapping keys I think we could just give MatchMapping a "identifier? rest" member, right? That way it becomes a bit easier to use, and we don't need to worry if it's last or not. It also keeps us from having to handle empty entries in the keys. Oh, also, the ast unparser will need to be updated as well. |
|||
msg391457 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2021-04-20 17:45 | |
Also, I would like some more extensive argument or why not having context dependent nodes justifies the extra maintainance cost (I'm not against that of course, and I am sympathetic, but I think this needs some more extensive coverage or why we are doing this). Notice that we already have context dependent nodes and attributes, as well as implicit meaning of some values like None. |
|||
msg391471 - (view) | Author: Batuhan Taskaya (BTaskaya) * | Date: 2021-04-20 19:36 | |
> Batuhan, perhaps we should change the linked issue for your AST validator PR to this one. That way we can close the old catch-all PEP 634 implementation issue and keep the discussion focused here. I think it might be better as a separate issue that has a dependency on this one. I've created issue 43897 for this. |
|||
msg391485 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-20 23:48 | |
https://www.python.org/dev/peps/pep-0642/#representing-patterns-explicitly-in-the-abstract-syntax-tree covers the rationale for why it is potentially problematic to reuse expression nodes for match patterns: it doesn't semantically align with the fact that patterns are *not* expressions, so the AST ends up being closer to a concrete syntax tree than we would like. For the specifics of the AST nodes: * restricting sub expressions directly in the AST itself is tricky without side effects on the AST for non-pattern-matching code, so I'm intending to add any such restrictions that we want to enforce to the validator (other statements and expressions like yield, await, return, break, and continue already have separately enforced restrictions like that) * cls needs to be an expression node to allow for attribute lookups * the linked draft PR already updates the grammar to emit the new AST. MatchClass is able to collect the keyword args and their corresponding patterns in roughly the same way that MatchMapping collects its key:pattern pairs |
|||
msg391599 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-22 13:30 | |
The draft PR moved because I messed up the previous branch name: https://github.com/ncoghlan/cpython/pull/9/files It's to the point where everything except symtable.c compiles. I put an initial skeleton of a validator in, but only enough to check that the parallel sequence lengths in the class and mapping pattern nodes are consistent - there's still plenty to be done in Batuhan's PR (e.g. because the AST constant folding now just delegates to the expression folding functions for MatchValue values and MatchMapping keys, this iteration only enforces the restrictions on permitted subexpressions in the surface syntax). In addition to Brandt's MatchStar suggestion, I've also tweaked the MatchClassparameter names to better match PEP 634 (the old names were based on PEP 642's attribute pattern concept, so "extra_*" made sense, but "kwd_*" makes more sense for PEP 634): pattern = MatchAlways | MatchValue(expr value) | MatchConstant(constant value) | MatchSequence(pattern* patterns) | MatchMapping(expr* keys, pattern* patterns) | MatchClass(expr cls, pattern* patterns, identifier* kwd_attrs, pattern* kwd_patterns) | MatchStar(identifier? target) -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys | MatchAs(pattern? pattern, identifier target) | MatchOr(pattern* patterns) attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) I think the idea of making the MatchMapping node signature "MatchMapping(expr* keys, pattern* patterns, identifier? rest)" has merit, but I'd like to get this initial iteration reviewed first to minimise the diff in the compiler_pattern_mapping implementation (right now it is fairly clear that the code generation for mapping patterns hasn't changed, but that will become significantly less obvious if/when the "**rest" handling changes away from working the same way _PyAST_Dict works) |
|||
msg391610 - (view) | Author: Batuhan Taskaya (BTaskaya) * | Date: 2021-04-22 16:14 | |
> | MatchStar(identifier? target) +1 on MatchStar(). Much more similiar to the existing node names, and also less semantically-named. > attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) I'm not really sure whether we should continue following this old tradition of int? end_* attributes for new sum types like pattern, considering that it was done for supporting the creation of old nodes 3.8< without any change on 3.8>=. pattern subclasses will only be created on 3.10>= so we should be safe by requiring end_* attributes. |
|||
msg391622 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2021-04-22 18:46 | |
> I'm not really sure whether we should continue following this old tradition of int? end_* attributes for new sum types like pattern, considering that it was done for supporting the creation of old nodes 3.8< without any change on 3.8>=. pattern subclasses will only be created on 3.10>= so we should be safe by requiring end_* attributes. Yeah, it will be a requirement very soon actually. |
|||
msg391668 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-23 09:06 | |
I've removed the "?" from the end attributes for pattern nodes. The draft PR branch now compiles, but I've clearly made a mistake somewhere as it segfaults when trying to compile test_patma.py. |
|||
msg391672 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-23 09:11 | |
My proposed MatchConstant node name was bothering me, as most constants are actually matched by equality in MatchValue, the same way variables are. So I'm switching my proposed name for that node to be MatchSingleton, since that better reflects the key difference with MatchValue (i.e. comparing by identity rather than by value, and specifically comparing with the builtin singletons). |
|||
msg391674 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-23 09:56 | |
Segfaults are fixed (I had a couple of cases where I wasn't checking for NULL, and another where I was looking up MatchMapping attributes on a MatchClass node). test_patma passes except for test cases 240, 241, and 242, which look like genuine regressions - the logic to reject illegal MatchValue nodes is missing from the code generator side, and these particular values make it through the parser OK. I'll need to add back some of the code I took out so these still get rejected. |
|||
msg391687 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-23 12:18 | |
To get the "0+0" matching case rejected again I had to flesh out a bit more of the AST validator. This initial piece of the validator replaces the previously pattern-matching-specific AST optimisation code that refused to fold BinOp expressions that weren't complex numbers, allowing the compiler to reject them. I also added two new tests (240b and 241b) to ensure that 0+0 & f"" were also rejected as mapping keys. 241b passes with the current PR, as f"" gets rejected by both the check in the AST validator *and* by the "constant or attribute lookup" restriction in the compiler, so the latter check covers for the leaky validator. 240b is disabled for now, as it won't pass until the AST validator checks all the subexpressions being used as keys in a mapping pattern (because it gets constant folded to "0", the check in the compiler accepts it). That leaves fixing the unparser tests as the key thing to do before I create the main PR. Before that, though, I'll look at potentially simplifying the MatchMapping code by changing the signature as Brandt suggested. |
|||
msg391694 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-23 13:16 | |
https://github.com/ncoghlan/cpython/pull/9/commits/54940a3817df3046da3f9c51d4d426a73b2ec786 implements Brandt's suggestion of disallowing NULL keys in MatchMapping and instead passing in an explicit "rest" identifier to capture the extra keys. This did require moving to a multi-branch pattern definition in the grammar, similar to class patterns, in order to enforce the separating comma when both ordinary key:pattern pairs and a double-star target are present. I think it's a nice ergonomic improvement on the AST node itself though, as it makes it trivial to check if a mapping patterns captures extra keys or not (vs the relatively non-obvious check to see if the last key is NULL/None). That just leaves updating the unparser to handle the new node types. Capturing the latest node definitions from the draft PR: pattern = MatchAlways | MatchValue(expr value) | MatchSingleton(constant value) | MatchSequence(pattern* patterns) | MatchMapping(expr* keys, pattern* patterns, identifier? rest) | MatchClass(expr cls, pattern* patterns, identifier* kwd_attrs, pattern* kwd_patterns) | MatchStar(identifier? target) -- The optional "rest" MatchMapping parameter handles capturing extra mapping keys | MatchAs(pattern? pattern, identifier target) | MatchOr(pattern* patterns) attributes (int lineno, int col_offset, int end_lineno, int end_col_offset) |
|||
msg391695 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-23 13:19 | |
Setting clear timing expectations, given beta is so close: I'm not expecting to have time to finish the unparser work over the weekend, but I do expect to have time to finish it on Monday evening my time (UTC+10). |
|||
msg391712 - (view) | Author: Brandt Bucher (brandtbucher) * | Date: 2021-04-23 17:15 | |
I can probably find time to take a pass at the unparser, if you want. > cls needs to be an expression node to allow for attribute lookups That's why I suggested "identifier*"! Each identifier in the list would be one part of the dotted name, so we wouldn't need to worry about having arbitrary expressions: Foo() -> MatchClass(["Foo"], [], [], []) foo.bar.Baz() -> MatchClass(["foo", "bar", "Baz"], [], [], []) Not a big deal, though. |
|||
msg391842 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-25 06:57 | |
Interesting idea - I had missed that you were suggested "identifier*" for the cls node. It's simple enough to check for Name-or-Attribute in the AST validator though, and sticking with a standard expr_ty node means getting a lot of standard handling in the rest of the compiler and the unparser. My commitments today finished earlier than I expected, so I'm about to tackle the unparser now. |
|||
msg391844 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-25 07:10 | |
Working on the unparser picked up an inconsistency I had previously missed: aside from named expressions, "target" attributes in the AST are generally full expression subnodes, rather than identifier attributes. Identifier attributes on nodes are typically called "name", including in the originally implemented MatchAs node. Accordingly, I've switched both MatchStar and MatchAs over to having "name" attributes instead of "target" attributes. |
|||
msg391845 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-25 08:56 | |
PR against the main repo is now up: https://github.com/python/cpython/pull/25585 A couple of things in the unparser tripped me up (ast_unparse.c only handles annotation expressions, the way ast._Unparser.visit() is defined means NodeVisitor.generic_visit can't be used), so the PR also includes some clarifying comments for future editors. |
|||
msg391943 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-26 15:27 | |
PR is green now, and I don't have any further changes I want to make, so Brandt, please feel free to merge if you're happy with the proposed AST nodes, and the way I've gone about implementing them. If you'd like some adjustments, it probably makes the most sense to just amend the PR directly - otherwise the 24-hour turnaround on comments due to the time zone difference will potentially give us trouble with the beta deadline. |
|||
msg391950 - (view) | Author: Brandt Bucher (brandtbucher) * | Date: 2021-04-26 17:01 | |
Sounds good. I'll probably be able to make time to review it today, or tomorrow at the latest. |
|||
msg392279 - (view) | Author: Brandt Bucher (brandtbucher) * | Date: 2021-04-29 05:59 | |
New changeset 1e7b858575d0ad782939f86aae4a2fa1c29e9f14 by Nick Coghlan in branch 'master': bpo-43892: Make match patterns explicit in the AST (GH-25585) https://github.com/python/cpython/commit/1e7b858575d0ad782939f86aae4a2fa1c29e9f14 |
|||
msg392300 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-04-29 11:27 | |
After the next docs build, the documentation for the explicit AST nodes should be available https://docs.python.org/dev/library/ast.html#pattern-matching |
|||
msg392368 - (view) | Author: Brandt Bucher (brandtbucher) * | Date: 2021-04-30 00:19 | |
New changeset dbe60ee09dc5a624cfb78dff61ecf050a5b3f105 by Brandt Bucher in branch 'master': bpo-43892: Validate the first term of complex literal value patterns (GH-25735) https://github.com/python/cpython/commit/dbe60ee09dc5a624cfb78dff61ecf050a5b3f105 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:44 | admin | set | github: 88058 |
2021-04-30 00:19:37 | brandtbucher | set | messages: + msg392368 |
2021-04-29 23:02:22 | brandtbucher | set | pull_requests: + pull_request24426 |
2021-04-29 11:27:27 | ncoghlan | set | status: open -> closed resolution: fixed messages: + msg392300 stage: commit review -> resolved |
2021-04-29 05:59:03 | brandtbucher | set | messages: + msg392279 |
2021-04-26 17:01:27 | brandtbucher | set | messages: + msg391950 |
2021-04-26 15:28:41 | ncoghlan | set | assignee: ncoghlan -> brandtbucher |
2021-04-26 15:28:02 | ncoghlan | set | stage: patch review -> commit review |
2021-04-26 15:27:53 | ncoghlan | set | messages: + msg391943 |
2021-04-25 08:56:33 | ncoghlan | set | messages: + msg391845 |
2021-04-25 08:43:54 | ncoghlan | set | stage: needs patch -> patch review pull_requests: + pull_request24305 |
2021-04-25 07:10:49 | ncoghlan | set | messages: + msg391844 |
2021-04-25 06:57:42 | ncoghlan | set | messages: + msg391842 |
2021-04-23 17:15:46 | brandtbucher | set | messages: + msg391712 |
2021-04-23 13:19:05 | ncoghlan | set | messages: + msg391695 |
2021-04-23 13:16:29 | ncoghlan | set | messages: + msg391694 |
2021-04-23 12:18:32 | ncoghlan | set | messages: + msg391687 |
2021-04-23 09:56:12 | ncoghlan | set | messages: + msg391674 |
2021-04-23 09:11:48 | ncoghlan | set | messages: + msg391672 |
2021-04-23 09:06:18 | ncoghlan | set | messages: + msg391668 |
2021-04-22 18:46:30 | pablogsal | set | messages: + msg391622 |
2021-04-22 16:14:09 | BTaskaya | set | messages: + msg391610 |
2021-04-22 13:30:36 | ncoghlan | set | messages: + msg391599 |
2021-04-22 10:01:39 | ncoghlan | set | stage: patch review -> needs patch |
2021-04-22 09:52:04 | ncoghlan | set | keywords:
+ patch stage: needs patch -> patch review pull_requests: + pull_request24241 |
2021-04-20 23:48:56 | ncoghlan | set | messages: + msg391485 |
2021-04-20 19:36:10 | BTaskaya | set | messages: + msg391471 |
2021-04-20 19:35:46 | BTaskaya | link | issue43897 dependencies |
2021-04-20 18:08:44 | freundTech | set | nosy:
+ freundTech |
2021-04-20 17:45:52 | pablogsal | set | messages: + msg391457 |
2021-04-20 16:12:23 | brandtbucher | set | nosy:
+ BTaskaya messages: + msg391448 |
2021-04-20 14:21:53 | Guido.van.Rossum | set | nosy:
+ Mark.Shannon |
2021-04-20 14:16:27 | Guido.van.Rossum | set | nosy:
+ Guido.van.Rossum |
2021-04-20 14:01:31 | ncoghlan | set | messages: + msg391438 |
2021-04-20 13:53:34 | ncoghlan | set | messages: + msg391437 |
2021-04-20 12:14:20 | pablogsal | set | nosy:
+ pablogsal messages: + msg391433 |
2021-04-20 11:59:54 | ncoghlan | create |