classification
Title: Make lib2to3 grammar better match Python, support the := walrus
Type: behavior Stage: commit review
Components: 2to3 (2.x to 3.x conversion tool) Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: BTaskaya, Peter Ludemann, benjamin.peterson, fireattack, georg.brandl, gregory.p.smith, lisroach, lukasz.langa, miss-islington, pablogsal, thatch
Priority: normal Keywords: patch

Created on 2019-04-06 01:41 by thatch, last changed 2020-12-14 18:13 by gregory.p.smith. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12702 merged python-dev, 2019-04-06 01:42
PR 12703 closed thatch, 2019-04-06 01:55
PR 19315 merged miss-islington, 2020-04-02 22:36
PR 19317 merged thatch, 2020-04-02 23:43
PR 23759 merged gregory.p.smith, 2020-12-14 08:20
PR 23768 merged miss-islington, 2020-12-14 17:10
PR 23769 merged miss-islington, 2020-12-14 17:10
Messages (21)
msg339522 - (view) Author: Tim Hatch (thatch) * Date: 2019-04-06 01:41
The grammar in lib2to3 is out of date and can't parse `:=` nor `f(**not x)` from running on real code.  I've done a cursory `diff -uw Grammar/Grammar Lib/lib2to3/grammar.txt`, and would like to fix lib2to3 so we can merge into both fissix and blib2to3, to avoid further divergence of the forks.

I'm unsure if I need a separate bug per pull request, but need at least one to get started.
msg339669 - (view) Author: Tim Hatch (thatch) * Date: 2019-04-08 19:49
jreese reminded me of pep570, which will make more grammar changes.  I'm open to the idea of replacing the grammar with the live one, plus porting the 2isms forward like print, eval, except with comma.

My sincere hope is that everyone that depends on this structure will have tests (mine and lib2to3 do); the only big user I'm aware of is probably libfuturize.  Definitely worth a changelog entry if this is the way forward.
msg339796 - (view) Author: Tim Hatch (thatch) * Date: 2019-04-09 18:06
Here's approximately what it would look like to do the big change now: https://github.com/python/cpython/compare/master...thatch:lib2to3-update-grammar (one test failing, and some helpers may need more test coverage)
msg340791 - (view) Author: Lisa Roach (lisroach) * (Python committer) Date: 2019-04-24 16:47
I agree we should get lib2to3 up to date.

Looks like for *args and **kwargs there is issue33348 (this has a PR) and issue32496 (no PR) and related closed issue24791 and issue24176. 

Adding `:=` seems straighforward to me, as for the big change maybe @benjamin.peterson would be interested in commenting?
msg340802 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-04-24 19:37
For the changes of PEP570, please wait until I merge the implementation to do the grammar changes in lib2to3 for that.
msg340848 - (view) Author: Tim Hatch (thatch) * Date: 2019-04-25 16:38
My strong preference would be getting the lib2to3 grammar to be the python grammar + additions, to make future changes easier to merge.  The strongest argument against doing that is the backwards-incompatibility of patterns -- some won't compile, while others will compile but do something unexpected).

It's good to hear (or at least infer) that parsing modern code is also a goal of lib2to3.
msg355108 - (view) Author: Peter Ludemann (Peter Ludemann) Date: 2019-10-21 22:34
Re: breakage due to changes in structure (https://bugs.python.org/issue36541#msg339669) ... this has already happened in the past (e.g., type annotations and async). 

It's probably a good idea to add some documentation that structure changes can be expected with each release of Python.
msg355252 - (view) Author: Peter Ludemann (Peter Ludemann) Date: 2019-10-23 19:05
Also the Grammar.txt diffs look about the same size as I've seen with other upgrades to lib2to3 when the Python grammar changed.
msg365716 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-04-03 19:14
New changeset 96c5f5a3a3fabf43e8114d0dbc30bed409da1ba6 by Tim Hatch in branch '3.7':
[3.7] bpo-36541: lib2to3: Support named assignment expressions (GH-12702) (GH-19317)
https://github.com/python/cpython/commit/96c5f5a3a3fabf43e8114d0dbc30bed409da1ba6
msg365717 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-04-03 19:18
master/3.9 changeset:
https://github.com/python/cpython/commit/3c3aa4516c70753de06bb142b6793d01330fcf0f

3.8 changeset: https://github.com/python/cpython/commit/1098671e4e5ec1513247f05598158eaa3428c5be
msg365718 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-04-03 19:18
Support for `:=` is in, are we still lacking `f(**not x)` support?
msg379062 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-10-19 23:41
Parsing support for `f(**mapping)` support is indeed still missing.

as lib2to3 is pending deprecation at this point, i'm not going to work on this.  anyone is welcome to pick it up.  modifying the lib2to3 grammar, and any related code, and adding a test is what's needed to parse that syntax.
msg382516 - (view) Author: Peter Ludemann (Peter Ludemann) Date: 2020-12-04 17:56
I made a suggestion for augmenting ast.parse with some of lib2to3's features; but nobody seemed interested. 

RIP lib2to3. Like many pieces of software, it was used for far more than for what it was originally intended.

https://mail.python.org/archives/list/python-ideas@python.org/thread/X2HJ6I6XLIGRZDB27HRHIVQC3RXNZAY4/
msg382594 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-12-06 12:01
I don't see the point of augmenting the ast.parse, since we already have variants of proper CST implementations outside the core python. Such as github.com/davidhalter/parso/ or LibCST. 

Also for basic refactorings, it is so easy to use tokens for the refactoring and AST for the analysis! Even the ast.unparse() can be partially used (like first finding the related segment of the code through AST analysis, building the corresponding variant, unparsing it, finding the region of related tokens in the source code and replacing them). There are also quite a few libraries for using tokenize in different purposes (or wrappers) such as https://github.com/asottile/tokenize-rt or github.com/isidentical/brm.
msg382606 - (view) Author: Peter Ludemann (Peter Ludemann) Date: 2020-12-06 20:50
Every piece of code that uses either lib2to3 or a parser derived from it (including parso and LibCST) will eventually not be able to upgrade the parser because PEG can handle grammars that LL(k) can't. That's why I proposed adding some functionality to ast.parse, to make the whitespace and token information easily available - this seems to be what @BTaskaya says is "easy" (maybe they mean it's easy using LibCST? It seems to be fiddly using ast.parse). The alternative is that all these projects (black, LibCST, yapf, etc.) will have to roll their own solutions, which doesn't seem a very productive use of people's time and makes version upgrades slow.

If people are interested in using ast.parse extensions as a replacement for lib2to3, I suggest discussing at https://mail.python.org/archives/list/python-ideas@python.org/thread/X2HJ6I6XLIGRZDB27HRHIVQC3RXNZAY4/
msg382608 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-12-06 21:12
> Every piece of code that uses either lib2to3 or a parser derived from it (including parso and LibCST) will eventually not be able to upgrade the parser because PEG can handle grammars that LL(k) can't.

Since these projects are external, depending on the functionality they are free-to-roll their own parser implementations or make hacks to pass away things. Or fork the Grammar/python.gram to preserve all tokens and generate a Python parser from it.


> If people are interested in using ast.parse extensions as a replacement for lib2to3, I suggest discussing at

I don't quite get what you are proposing here, 

>I propose implementing an optional pass over the parse tree that records lib2to3's "prefix" with each leaf node. The interface would be something like:

How would you do that? By augmenting the AST with the information retrieved from tokens? If so, check this out; https://github.com/leo-editor/leo-editor/blob/master/leo/core/leoAst.py and asttokens.

Also, please move the discussion to somewhere else (like discuss.python.org etc.) since this is not the ideal place to discuss and people might be distracted! (feel free to cc me where you move the discussion)
msg382609 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-12-06 21:14
> Parsing support for `f(**mapping)` support is indeed still missing.
>
> as lib2to3 is pending deprecation at this point, i'm not going to work on this.  anyone is welcome to pick it up.  modifying the lib2to3 grammar, and any related code, and adding a test is what's needed to parse that syntax.

I'd also agree, and not supporting to add features from now on. If someone really needs this to be added [and backported], please re-open the issue.
msg382962 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-12-14 08:30
While I said i didn't care... and don't really want to... I found a reason to at least not omit pep-570 positional only arg parsing support give things like yapf still use it rather than forking their own copy.  PR testing.
msg382994 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-12-14 17:10
New changeset 42c9f0fd0a5e67d4ae0022bfd7370cb9725a5b01 by Gregory P. Smith in branch 'master':
bpo-36541: Add lib2to3 grammar PEP-570 pos-only arg parsing (GH-23759)
https://github.com/python/cpython/commit/42c9f0fd0a5e67d4ae0022bfd7370cb9725a5b01
msg382996 - (view) Author: miss-islington (miss-islington) Date: 2020-12-14 17:30
New changeset 06bfd033e847bedb6e123d131dcf46393a4555df by Miss Islington (bot) in branch '3.8':
bpo-36541: Add lib2to3 grammar PEP-570 pos-only arg parsing (GH-23759)
https://github.com/python/cpython/commit/06bfd033e847bedb6e123d131dcf46393a4555df
msg382997 - (view) Author: miss-islington (miss-islington) Date: 2020-12-14 17:38
New changeset 20bc40ef44b820733848d5838e803b5fe4350b93 by Miss Islington (bot) in branch '3.9':
bpo-36541: Add lib2to3 grammar PEP-570 pos-only arg parsing (GH-23759)
https://github.com/python/cpython/commit/20bc40ef44b820733848d5838e803b5fe4350b93
History
Date User Action Args
2020-12-14 18:13:48gregory.p.smithsetstatus: open -> closed
resolution: out of date -> fixed
stage: patch review -> commit review
2020-12-14 17:38:25miss-islingtonsetmessages: + msg382997
2020-12-14 17:30:14miss-islingtonsetmessages: + msg382996
2020-12-14 17:10:37miss-islingtonsetpull_requests: + pull_request22625
2020-12-14 17:10:27miss-islingtonsetstage: resolved -> patch review
pull_requests: + pull_request22624
2020-12-14 17:10:21gregory.p.smithsetmessages: + msg382994
2020-12-14 08:30:08gregory.p.smithsetstatus: closed -> open
assignee: gregory.p.smith
messages: + msg382962
2020-12-14 08:20:46gregory.p.smithsetpull_requests: + pull_request22615
2020-12-06 21:14:35BTaskayasetstatus: open -> closed
resolution: out of date
messages: + msg382609

stage: needs patch -> resolved
2020-12-06 21:12:35BTaskayasetmessages: + msg382608
2020-12-06 20:50:10Peter Ludemannsetmessages: + msg382606
2020-12-06 12:01:27BTaskayasetnosy: + BTaskaya
messages: + msg382594
2020-12-04 17:56:35Peter Ludemannsetmessages: + msg382516
2020-10-19 23:41:24gregory.p.smithsetassignee: gregory.p.smith -> (no value)
stage: patch review -> needs patch
messages: + msg379062
versions: + Python 3.10, - Python 3.7
2020-04-03 19:18:58gregory.p.smithsetmessages: + msg365718
2020-04-03 19:18:17gregory.p.smithsetmessages: + msg365717
2020-04-03 19:14:18gregory.p.smithsetmessages: + msg365716
2020-04-02 23:43:31thatchsetpull_requests: + pull_request18682
2020-04-02 22:36:27miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request18680
2020-04-02 02:09:45fireattacksetnosy: + fireattack
2019-10-24 06:04:11gregory.p.smithsettitle: Make lib2to3 grammar more closely match Python -> Make lib2to3 grammar better match Python, support the := walrus
2019-10-24 06:01:55gregory.p.smithsetassignee: gregory.p.smith

nosy: + gregory.p.smith
2019-10-24 05:49:24gregory.p.smithsetversions: + Python 3.9, - Python 3.6
2019-10-23 19:05:53Peter Ludemannsetmessages: + msg355252
2019-10-21 22:34:06Peter Ludemannsetnosy: + Peter Ludemann
messages: + msg355108
2019-06-15 12:16:09xtreaklinkissue37248 superseder
2019-04-25 16:38:46thatchsetmessages: + msg340848
2019-04-24 19:37:22pablogsalsetmessages: + msg340802
2019-04-24 16:47:01lisroachsetnosy: + lisroach
messages: + msg340791
2019-04-09 23:17:27xtreaksetnosy: + benjamin.peterson
2019-04-09 18:06:15thatchsetmessages: + msg339796
2019-04-08 19:49:45thatchsetmessages: + msg339669
2019-04-06 18:46:55pablogsalsetnosy: + pablogsal
2019-04-06 01:55:27thatchsetpull_requests: + pull_request12627
2019-04-06 01:42:20python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request12626
2019-04-06 01:41:20thatchcreate