Message 315649 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lukasz.langa
Recipients	benjamin.peterson, gregory.p.smith, gvanrossum, lukasz.langa, serhiy.storchaka
Date	2018-04-23.08:01:20
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1524470481.51.0.682650639539.issue33337@psf.upfronthosting.co.za>
In-reply-to

Content
> These modification are applied only before bytecodecode generation. The AST presented to user is not modified. This bit me when implementing PEP 563 but I was then on the compile path, right. Still, the latest docstring folding would qualify as an example here, too, no? > Is this a problem? 2.7 is a dead end, its support will be ended in less than 2 years. Even 3.6 will be moved to a security only fixes stage short time after releasing 3.8. Yes, it is a problem. We will support Python 2 until 2020 but people will be running Python 2 code for a decade at least. We need to provide those people a way to move their code forward. Static analysis tools like formatters, linters, type checkers, or 2to3-style translators, are all soon going to run on Python 3. It would be a shame if those programs were barred from helping users that are still struggling on Python 2. A closer example is async/await. It would be a shame if running on Python 3.7 meant you can't write a tool that renames (or even just detects) invalid uses of async/await. I firmly believe that the version of the runtime should be indepedent of the version it's able to analyze. > I'm in favor of updating Lib/lib2to3/pgen2/tokenize.py, but I don't understand why Lib/tokenize.py should parse 2.7. Hopefully I sufficiently explained that above. > I'm in favor of reimplementing pgen in Python if this will simplify the code and the building process. Python code is simpler than C code, this code is not performance critical, and in any case we need an external Python when modify grammar of bytecode. Well, I didn't think about abandoning pgen. I admit that's mostly because my knee-jerk reaction was that it would be too slow. But you're right that this is not performance critical because every `pip install` runs `compileall`. I guess we could parse in "strict" mode for Python itself but allow for multiple grammars for standard library use (as I explained in the reply to Guido). And this would most likely give us opportunity to iterate on grammar improvements in the future. And yet, I'm cautious here. Even ignoring performance, that sounds like a more ambitious task from what I'm attempting. Unless I find partners in crime for this, I wouldn't attempt that. And I would need thumbs up from the BDFL and performance-wary contributors. > For what purposes the CST is needed besides 2to3? Anywhere where you need the full view of the code which includes non-semantic pieces. Those include: - whitespace; - comments; - parentheses; - commas; - strings prefixes. The main use case is linters and refactoring tools. For example mypy is using a modified AST to support type comments. YAPF and Black are based on lib2to3 because as formatters they can't lose comments, string prefixes, and organizational parentheses either. JEDI is using Parso, a lib2to3 fork, for similar reasons.

> These modification are applied only before bytecodecode generation. The AST presented to user is not modified.

This bit me when implementing PEP 563 but I was then on the compile path, right.  Still, the latest docstring folding would qualify as an example here, too, no?


> Is this a problem? 2.7 is a dead end, its support will be ended in less than 2 years. Even 3.6 will be moved to a security only fixes stage short time after releasing 3.8.

Yes, it is a problem.  We will support Python 2 until 2020 but people will be running Python 2 code for a decade *at least*.  We need to provide those people a way to move their code forward.  Static analysis tools like formatters, linters, type checkers, or 2to3-style translators, are all soon going to run on Python 3.  It would be a shame if those programs were barred from helping users that are still struggling on Python 2.

A closer example is async/await.  It would be a shame if running on Python 3.7 meant you can't write a tool that renames (or even just *detects*) invalid uses of async/await.  I firmly believe that the version of the runtime should be indepedent of the version it's able to analyze.


> I'm in favor of updating Lib/lib2to3/pgen2/tokenize.py, but I don't understand why Lib/tokenize.py should parse 2.7.

Hopefully I sufficiently explained that above.


> I'm in favor of reimplementing pgen in Python if this will simplify the code and the building process. Python code is simpler than C code, this code is not performance critical, and in any case we need an external Python when modify grammar of bytecode.

Well, I didn't think about abandoning pgen.  I admit that's mostly because my knee-jerk reaction was that it would be too slow.  But you're right that this is not performance critical because every `pip install` runs `compileall`.

I guess we could parse in "strict" mode for Python itself but allow for multiple grammars for standard library use (as I explained in the reply to Guido).  And this would most likely give us opportunity to iterate on grammar improvements in the future.

And yet, I'm cautious here.  Even ignoring performance, that sounds like a more ambitious task from what I'm attempting.  Unless I find partners in crime for this, I wouldn't attempt that.  And I would need thumbs up from the BDFL and performance-wary contributors.


> For what purposes the CST is needed besides 2to3?

Anywhere where you need the full view of the code which includes non-semantic pieces.  Those include:
- whitespace;
- comments;
- parentheses;
- commas;
- strings prefixes.

The main use case is linters and refactoring tools.  For example mypy is using a modified AST to support type comments.  YAPF and Black are based on lib2to3 because as formatters they can't lose comments, string prefixes, and organizational parentheses either.  JEDI is using Parso, a lib2to3 fork, for similar reasons.

History
Date	User	Action	Args
2018-04-23 08:01:21	lukasz.langa	set	recipients: + lukasz.langa, gvanrossum, gregory.p.smith, benjamin.peterson, serhiy.storchaka
2018-04-23 08:01:21	lukasz.langa	set	messageid: <1524470481.51.0.682650639539.issue33337@psf.upfronthosting.co.za>
2018-04-23 08:01:21	lukasz.langa	link	issue33337 messages
2018-04-23 08:01:20	lukasz.langa	create