This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lukasz.langa
Recipients benjamin.peterson, gregory.p.smith, gvanrossum, lukasz.langa, serhiy.storchaka
Date 2018-04-23.01:04:05
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1524445447.99.0.682650639539.issue33337@psf.upfronthosting.co.za>
In-reply-to
Content
Python includes a set of batteries that enable parsing of Python code.  This
includes its own AST (provided in the standard library under the `ast` module),
as well as a pure Python tokenizer (provided in the standard library under
`tokenize` and `token`).  It also provides an undocumented CST under lib2to3,
which contains its own outdated and patched copies of `tokenize` and `token`.

This situation causes the following issues for users of Python:
- the built-in AST does not preserve comments or whitespace;
- the built-in AST increasingly modifies the tree before presenting it to user
  code (constant folding moved to the AST in Python 3.7);
- the built-in tokenize.py can only be used to parse Python 3.7+ code;
- the version in lib2to3 is partially customized and partially outdated,
  leaving bits of new grammar not supported; new bits of grammar very often get
  overlooked in lib2to3.
- lib2to3 is not documented.

So if users want to write tools that manipulate Python code, the standard
library doesn't provide them with great options.

I suggest the following plan:

1. Bring Lib/lib2to3/pgen2/tokenize.py to the same state as Lib/tokenize.py
   (leaving the bits that allow for parsing of Python 3.6 and older files).

2. Merge the two tokenizers in Python 3.8 so that Lib/tokenize.py now
   officially supports tokenizing Python 2.7 - 3.7 code.

3. Update Lib/lib2to3/pgen2 and move it under Lib/pgen.  Document it as the
   built-in CST provided by Python for use in applications which require code
   modification.  Make it still officially support parsing of Python 2.7 - 3.7
   code.

All three changes are made in a backwards-compatible fashion, existing code
should NOT break.  That being said, the parser under Lib/pgen might grow some
new behavior compared to the compatibility mode for lib2to3, I specifically
seek to improve handling of comments and error recovery.
History
Date User Action Args
2018-04-23 01:04:08lukasz.langasetrecipients: + lukasz.langa, gvanrossum, gregory.p.smith, benjamin.peterson, serhiy.storchaka
2018-04-23 01:04:07lukasz.langasetmessageid: <1524445447.99.0.682650639539.issue33337@psf.upfronthosting.co.za>
2018-04-23 01:04:07lukasz.langalinkissue33337 messages
2018-04-23 01:04:05lukasz.langacreate