Message 373198 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gvanrossum
Recipients	BTaskaya, Peter Ludemann, carljm, corona10, eric.snow, gregory.p.smith, gvanrossum, hroncok, vstinner
Date	2020-07-06.23:55:00
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1594079700.84.0.0316466894372.issue40360@roundup.psfhosted.org>
In-reply-to

Content
There's no python-dev discussion; if you want more feedback I recommend starting on python-ideas first (on either forum you may expect pushback because this is not about a proposed change to Python or its workflow). The Lib/ast.py module will continue to be the official API for the standard AST. It is a simple wrapper around the builtin parser (at least in CPython -- I don't actually know to what extent other Python implementations support it, but they certainly could). And in 3.9 and later the AST is already being produced using the new parser. We want to deprecate lib2to3 because nobody is interested in maintaining it., Having it in the stdlib, with its strict backwards compatibility requirements, makes it difficult to do a good job at updating it. This is why it's been forked repeatedly -- once forked, the owner of the fork can make changes easily, preserving the API perfectly (if so desired) and maintaining compatibility with older Python versions. My own thoughts are that libraries like LibCST and parso have two sides: an API for the AST, and a way to parse source code into an AST. Usually the parsing API is incredibly simple -- e.g. a function to parse a file and another function to parse a string. And there's no reason for the AST API to change just because the parsing algorithm has changed. Finally, we already have a (rough) Python implementation of the PEG parser too -- in fact it's included in Tools/peg_generator (and used to regenerate the metaparser). This reads the same grammar format (i.e. Grammar/python.gram) and generates Python code instead of C code to do the parsing. It's easy to retarget the tokenizer of the generated Python code. So a decent way forward might be to pick one of the 3rd party libraries (perhaps parso, which is itself a fork of lib2to3 and what LibCST builds on) and update its parser to use a PEG parser generated using the PEG generator from Tools/peg_generator (which people are welcome to fork). This might be a summer-of-code-sized project.

There's no python-dev discussion; if you want more feedback I recommend starting on python-ideas first (on either forum you may expect pushback because this is not about a proposed change to Python or its workflow).

The Lib/ast.py module will continue to be the official API for the standard AST. It is a simple wrapper around the builtin parser (at least in CPython -- I don't actually know to what extent other Python implementations support it, but they certainly *could*). And in 3.9 and later the AST is already being produced using the *new* parser.

We want to deprecate lib2to3 because nobody is interested in maintaining it., Having it in the stdlib, with its strict backwards compatibility requirements, makes it difficult to do a good job at updating it. This is why it's been forked repeatedly -- once forked, the owner of the fork can make changes easily, preserving the API perfectly (if so desired) and maintaining compatibility with older Python versions.

My own thoughts are that libraries like LibCST and parso have two sides: an API for the AST, and a way to parse source code into an AST. Usually the parsing API is incredibly simple -- e.g. a function to parse a file and another function to parse a string. And there's no reason for the AST API to change just because the parsing algorithm has changed.

Finally, we already have a (rough) Python implementation of the PEG parser too -- in fact it's included in Tools/peg_generator (and used to regenerate the metaparser). This reads the same grammar format (i.e. Grammar/python.gram) and generates Python code instead of C code to do the parsing. It's easy to retarget the tokenizer of the generated Python code.

So a decent way forward might be to pick one of the 3rd party libraries (perhaps parso, which is itself a fork of lib2to3 and what LibCST builds on) and update its parser to use a PEG parser generated using the PEG generator from Tools/peg_generator (which people are welcome to fork).

This might be a summer-of-code-sized project.

History
Date	User	Action	Args
2020-07-06 23:55:00	gvanrossum	set	recipients: + gvanrossum, gregory.p.smith, vstinner, carljm, eric.snow, hroncok, corona10, BTaskaya, Peter Ludemann
2020-07-06 23:55:00	gvanrossum	set	messageid: <1594079700.84.0.0316466894372.issue40360@roundup.psfhosted.org>
2020-07-06 23:55:00	gvanrossum	link	issue40360 messages
2020-07-06 23:55:00	gvanrossum	create