classification
Title: Deprecate lib2to3 (and 2to3) for future removal
Type: enhancement Stage:
Components: 2to3 (2.x to 3.x conversion tool) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, Peter Ludemann, carljm, corona10, davidhalter, eric.snow, gregory.p.smith, gvanrossum, hroncok, miss-islington, xtreak
Priority: normal Keywords: patch

Created on 2020-04-22 04:40 by gregory.p.smith, last changed 2020-10-19 20:42 by gregory.p.smith.

Pull Requests
URL Status Linked Edit
PR 19645 closed gregory.p.smith, 2020-04-22 05:01
PR 19663 merged carljm, 2020-04-22 20:50
PR 19898 merged hroncok, 2020-05-04 10:02
PR 21694 merged xtreak, 2020-07-31 03:26
PR 21696 closed miss-islington, 2020-07-31 10:51
PR 21697 merged xtreak, 2020-07-31 11:19
Messages (37)
msg366973 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-04-22 04:40
Based on the PEP 617 acceptance thread on python-dev, lib2to3 is eventually going to run into trouble parsing modern syntax a few releases from now.

It would be better off maintained outside of the standard library.  It gets used by a lot of things and is generally useful, but would make a lot more sense as a PyPI project than as something only quasi-maintained within the stdlib (it only gained the ability to parse a couple modern syntax features in via bugfix contributions to the stdlib the past month or two...  meaning a lot of versions of it out there cannot)

Black has already forked it.

goal:  PendingDeprecationWarning and documentation as such in 3.9.  Move to DeprecationWarning in 3.10 or 3.11 and remove it by ~3.12.  Subject to our existing deprecation process guidelines.
msg367005 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-04-22 14:26
I am in favor of this. We could promote LibCST, which is based on Parso, which uses a forked version of pgen2 (the parser in lib2to3). I believe one of these could switch to a fork of pegen as its parser, so it will be able to handle new PEG based syntax in 3.10+.

Removal by 3.12 might be feasible.
msg367031 - (view) Author: Carl Meyer (carljm) * Date: 2020-04-22 17:31
I volunteered in the python-dev thread to write a patch to the docs clarifying future status of lib2to3; happy to include the PendingDeprecationWarning as well.

Re linking to alternatives, we want to make sure we link to alternatives that are committed to updating to support newer Python versions' syntax. This definitely includes LibCST; I can inquire with the parso maintainer about whether it also includes parso. In future it could also include a third-party-maintained copy of lib2to3, if someone picks that up.
msg367051 - (view) Author: Carl Meyer (carljm) * Date: 2020-04-22 21:15
I opened a PR. It deprecates the lib2to3 library to discourage future use of it for Python3, but not the 2to3 tool. This of course means that the lib2to3 module will in practice stick around in the stdlib as long as 2to3 is still bundled with Python.

It seems like the idea in this issue is to deprecate and remove both. I'm not sure what we typically do to deprecate a command-line utility bundled with Python. Given warnings are silent by default, the deprecation warning for lib2to3 won't be visible to users of 2to3. Should I add something to its `--help` output? Or something more aggressive; an unconditionally-printed warning?
msg367208 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-04-24 18:19
New changeset 503de7149d03bdcc671dcbbb5b64f761bb192b4d by Carl Meyer in branch 'master':
bpo-40360: Deprecate lib2to3 module in light of PEP 617 (GH-19663)
https://github.com/python/cpython/commit/503de7149d03bdcc671dcbbb5b64f761bb192b4d
msg367209 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-04-24 18:21
Okay,the pending deprecation is in.  Keeping open as a reminder to turn that into a real DeprecationWarning in 3.10 after the 3.9 branch is cut.

We'll then want to track reminding us to remove it in 3.12.
msg367230 - (view) Author: Carl Meyer (carljm) * Date: 2020-04-24 21:15
@gregory.p.smith

What do you think about the question I raised above about how to make this deprecation visible to users of the 2to3 CLI tool, assuming the plan is to remove both?
msg367235 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-04-24 22:55
I think what we're doing with the documentation update is fine.  We can add a warning on stderr to the tool in 3.11.  But I don't expect people will be using the tool _from_ the latest CPython 3.x by then.

2to3 is already included with Python 2.7 and the only real use for it is for people who still have code they maintain on 2.7 so they've got a copy already.  There is no value in running a 2to3 shipped with Python 3 vs the latest 2.7.  Meaningful updates to it were already back ported to 2.7 over time as it was intentionally exempt from feature freeze.

We should have sorted out a PyPI home for lib2to3 by 3.11 time and can also create a PyPI package for the 2to3 tool itself at that point.

I _think_ there is support for running 2to3 on sources at package install time from setup.py?  But I don't expect anything actually maintained and widely used to require that by the time this deprecation lands.  If it does, that becomes a plumbing issue within package tools to know that requiring 2to3 at either build or install time adds an implicit tool dependency on the new pypi package to get it.

Maybe I'm just in a good mood about all of this, but none of this seems worrisome.
msg367716 - (view) Author: Peter Ludemann (Peter Ludemann) Date: 2020-04-29 23:51
The documentation change gives two possible successors:

https://libcst.readthedocs.io/ (https://github.com/Instagram/LibCST)
https://parso.readthedocs.io/

And I've also seen this mentioned: https://github.com/pyga/awpa

Is it possible to settle on one of these as the successor to the lib2to3 parser? It would be nice to avoid a 2nd deprecation in the future ...
msg367726 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-04-30 01:39
It's typically not up to the core devs to pick a winning third party library; we tend to recommend libraries that are already essentially category winners, like requests. In a sense pointing to LibCST *and* parso is redundant because LibCST builds on parso. Comparing stars on GitHub:
- LibCST: 423
- parso: 296
- awpa: 10
msg367730 - (view) Author: Carl Meyer (carljm) * Date: 2020-04-30 03:14
Right, although I think it still makes sense to link both LibCST and parso since they provide different levels of abstraction that would be suitable for different types of tools (e.g. I would rather write an auto-formatter on top of parso, because LibCST's careful parsing and assignment of whitespace would mostly just get in the way, but I'd rather write any kind of refactoring tooling on top of LibCST.)

Another tool that escaped my mind when writing the PR that should probably be linked also is Baron/RedBaron (https://github.com/PyCQA/redbaron); 457 stars makes it slightly more popular than LibCST (but it's also been around a lot longer.)
msg367743 - (view) Author: Miro Hrončok (hroncok) * Date: 2020-04-30 07:27
Coul you please add a what's new entry for this change?
msg367744 - (view) Author: Miro Hrončok (hroncok) * Date: 2020-04-30 07:35
I don't understand why there is a PendingDeprecationWarning and not a DeprecationWarning.


See https://discuss.python.org/t/pendingdeprecationwarning-is-really-useful/1038/4 and issue36404
msg367766 - (view) Author: Carl Meyer (carljm) * Date: 2020-04-30 17:59
> Coul you please add a what's new entry for this change?

The committed change already included an entry in NEWS. Is a "What's New" entry something different?

> I don't understand why there is a PendingDeprecationWarning and not a DeprecationWarning.

Purely because I was following gps' recommendation in the first comment on this issue. Getting rid of PendingDeprecationWarning seems like an orthogonal decision; if it happens, this can trivially be upgraded to DeprecationWarning as part of a removal sweep.
msg367767 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-04-30 18:31
A "What's New" entry would go into Doc/whatsnew/3.9.rst and is much more visible to users looking for exciting bits in the new release (the NEWS file is very large, see e.g. https://docs.python.org/3/whatsnew/changelog.html#changelog.

The What's New doc typically has a section collecting all the deprecations, e.g. https://docs.python.org/3/whatsnew/3.8.html#deprecated.
msg367770 - (view) Author: Miro Hrončok (hroncok) * Date: 2020-04-30 18:44
> Getting rid of PendingDeprecationWarning seems like an orthogonal decision; if it happens, this can trivially be upgraded to DeprecationWarning as part of a removal sweep.

My thought was that the decision was already made to do so. Hence adding new PendingDeprecationWarnings goes against that decision.

But maybe I misunderstand and that decision was not made.
msg367771 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-04-30 19:08
IIRC PendingDeprecationError does not mean that the decision hasn't been made yet. It just means it's less urgent for folks to worry about. I believe we tend to change PendingDeprecationError to DeprecationError in the last release before something is removed.
msg367884 - (view) Author: Miro Hrončok (hroncok) * Date: 2020-05-01 20:28
Thanks for the explanation.

I plan to send a PR to add this to the What's new in 3.9 page early next week. Anyone, feel free to beat me to it.
msg368077 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-05-04 19:02
New changeset 18f1c60a1625d341a905c7e07367c32c08f222df by Miro Hrončok in branch 'master':
bpo-40360: Add a What's New entry for lib2to3 pending deprecation (GH-19898)
https://github.com/python/cpython/commit/18f1c60a1625d341a905c7e07367c32c08f222df
msg368388 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-07 23:27
FYI the autopep8 project uses lib2to3.
msg373185 - (view) Author: Peter Ludemann (Peter Ludemann) Date: 2020-07-06 22:11
Looking at the suggested successor tools (redbaron, libCST, parso, awpa) ... all of them appear to use some variant of pgen2. But at some point Python will be using a PEG approach (PEP 617), and therefor the pgen2 approach apparently won't work.

For a number of projects, it's important to have a parse tree that contains all the "whitespace" information (indent, dedent, comment, newline, etc.) As far as I can tell, the new PEG parser won't provide that, and it seems that none of the successor tools will be able to handle future versions of Python syntax.

So, three questions:
1. Am I right that all proposed replacements (redbaron, libCST, parso, awpa) use some variation of the LL(1) and therefore will have trouble in the future?
2. Are there any plans (either part of the core development or as a project) for one of these replacements that is PEG-based? (Or a new project?)
3. Is Lib/ast.py going to continue being supported? (I infer that it will, with the change from LL(1) to PEG being mostly transparent - https://mail.python.org/archives/list/python-dev@python.org/thread/HOZ2RI3FXUEMAT4XAX4UHFN4PKG5J5GR/#4D3B2NM2JMV2UKIT6EV5Q2A6XK2HXDEH )

If Lib/ast.py continues to be supported, I think I can see a way of providing functionality similar to lib2to3 (in terms of an AST-ish thing with "whitespace" from the source, sufficient for tools such as yapf, black, pykythe, pytype, mypy, etc.) as a kind of wrapper to ast.py. 
I suppose I should discuss this idea on python-dev? Is there an ongoing discussion? (I couldn't find any but might have been using the wrong search terms)
msg373198 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-07-06 23:55
There's no python-dev discussion; if you want more feedback I recommend starting on python-ideas first (on either forum you may expect pushback because this is not about a proposed change to Python or its workflow).

The Lib/ast.py module will continue to be the official API for the standard AST. It is a simple wrapper around the builtin parser (at least in CPython -- I don't actually know to what extent other Python implementations support it, but they certainly *could*). And in 3.9 and later the AST is already being produced using the *new* parser.

We want to deprecate lib2to3 because nobody is interested in maintaining it., Having it in the stdlib, with its strict backwards compatibility requirements, makes it difficult to do a good job at updating it. This is why it's been forked repeatedly -- once forked, the owner of the fork can make changes easily, preserving the API perfectly (if so desired) and maintaining compatibility with older Python versions.

My own thoughts are that libraries like LibCST and parso have two sides: an API for the AST, and a way to parse source code into an AST. Usually the parsing API is incredibly simple -- e.g. a function to parse a file and another function to parse a string. And there's no reason for the AST API to change just because the parsing algorithm has changed.

Finally, we already have a (rough) Python implementation of the PEG parser too -- in fact it's included in Tools/peg_generator (and used to regenerate the metaparser). This reads the same grammar format (i.e. Grammar/python.gram) and generates Python code instead of C code to do the parsing. It's easy to retarget the tokenizer of the generated Python code.

So a decent way forward might be to pick one of the 3rd party libraries (perhaps parso, which is itself a fork of lib2to3 and what LibCST builds on) and update its parser to use a PEG parser generated using the PEG generator from Tools/peg_generator (which people are welcome to fork).

This might be a summer-of-code-sized project.
msg373327 - (view) Author: Peter Ludemann (Peter Ludemann) Date: 2020-07-08 17:38
I've written up a proposal for adding "whitespace" handling to the ast module:
https://mail.python.org/archives/list/python-ideas@python.org/thread/X2HJ6I6XLIGRZDB27HRHIVQC3RXNZAY4/

I don't think it's a "summer-of-code-sized project", mainly because I already have various bits of code that handle the fiddly byte/str offset conversions.
msg373332 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-07-08 18:59
Can that be done as a 3rd party wrapper? Then you would be able to support older Python versions, and typed_ast (which can parse older Python grammars with a newer Python that's older than 3.8). Plus it would be much easier to get your code released -- no waiting for core devs to review it or waiting for the next CPython (bugfix or feature) release to get a bug fixed or small feature added.
msg373334 - (view) Author: Peter Ludemann (Peter Ludemann) Date: 2020-07-08 19:05
Yes, I'm thinking of doing this as a wrapper, in such a way that it could be incorporated into Lib/ast.py eventually. (Also, any lib2to3-ish capabilities would probably not be suitable for inclusion in the stdlib, at least not initially ... but I have no plans to work on something to replace lib2to3's fixers.)
msg373444 - (view) Author: David Halter (davidhalter) Date: 2020-07-10 07:17
I'm the maintainer of parso. Feel free to addd me to the Nosy List if we have these discussions in the future.

Parso is indeed a lib2to3 fork with error recovery, round tripping and incremental parsing as its features. Most pgen2 code has been rewritten since for various reasons, but it's essentially still LL(1). We're currently trying to think how to proceed with Non-LL(1). For very simple cases a few hacks could suffice, but for larger stuff we will probably need to implement some form of PEG parsing.

I'm mostly worried about incremental parsing breaking if we replace the PEG parser, not about writing the PEG parser. But I guess we'll find a way, because I don't want to abandon Jedi (which depends on parso).
msg373538 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-07-11 20:57
Thanks for joining in! How do you do incremental parsing with LL1 currently? FWIW I found https://ohmlang.github.io/pubs/sle2017/incremental-packrat-parsing.pdf which may have some useful ideas.
msg374348 - (view) Author: David Halter (davidhalter) Date: 2020-07-26 22:42
Parso's incremental parser is a terrible idea. It also works and is pretty fast, but the design is pretty terrible (it took me a lot of fuzzing to make sure that it works decently well).

The basic problem is that it's reusing nodes in a mutable way. If I were to redo it, I would probably choose a similar approach to Roslyn's red/green trees. It's probably also possible to use these approaches in Python, but they might be quite a bit slower than what I'm using (because recreating nodes can be quite expensive).

I imagine that one of the biggest issues with parsing PEG in parso would be to do it with error recovery AND incremental parsing. That combination can be quite annoying, but it's definitely still possible.

I'm not really sure about the future of parso with PEG. I'm definitely going to have to find a way to parse 3.10+ (so Jedi is going to keep working), however I feel like that it's hard to achieve a fast parser in pure Python. Parso is like 20% faster, but still more than ten times slower than the CPython parser...
msg374349 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-07-26 22:56
I guess the design space is wide open.

Does parso have to be pure Python? If not, we could generate C code like we do for CPython's parser. Now, that doesn't work for incremental parsing, but I did an alternative implementation that uses a stack machine, also in C, that's only 2x slower than the PEG parser. Maybe that could be adapted to incremental parsing (because it's a stack machine). Error recovery is still a research project (at least for me -- I'm actually reading papers :-).
msg374566 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-07-29 09:13
After this patch test_lib2to3 generates a PendingDeprecationWarning. It can be silenced as it's intentional to avoid test failures while running tests with -Werror.

./python.exe -Wall -m test test_lib2to3
0:00:00 load avg: 2.31 Run tests sequentially
0:00:00 load avg: 2.31 [1/1] test_lib2to3
/Users/kasingar/stuff/python/cpython/Lib/test/test_lib2to3.py:1: PendingDeprecationWarning: lib2to3 package is deprecated and may not be able to parse Python 3.10+
  from lib2to3.tests import load_tests

== Tests result: SUCCESS ==

1 test OK.

Total duration: 9.0 sec
Tests result: SUCCESS
msg374572 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-07-29 13:46
Which patch are you referring to? Is it already merged?
msg374573 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-07-29 14:23
I was referring to PR https://github.com/python/cpython/pull/19663 (commit-503de7149d03bdcc671dcbbb5b64f761bb192b4d) that was merged as part of this issue. It started emitting PendingDeprecationWarning but was not silenced in the test.
msg374574 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-07-29 14:41
Okay, so if you know what to do please do it. ;-)
msg374611 - (view) Author: David Halter (davidhalter) Date: 2020-07-30 11:11
@gvanrossum

> Does parso have to be pure Python? If not, we could generate C code like we do for CPython's parser. 

I would rather write the parser either in C or Rust. So no, parso does not need to be pure Python.

> Now, that doesn't work for incremental parsing, but I did an alternative implementation that uses a stack machine, also in C, that's only 2x slower than the PEG parser. Maybe that could be adapted to incremental parsing (because it's a stack machine). Error recovery is still a research project (at least for me -- I'm actually reading papers :-).

Makes sense! I was also thinking about GLL parsing. Obviously GLL does not cover all cases where PEG could potentially work, but I doubt that Python ever moves to a place where GLL would not be sufficient.

I'm also doing a bit of research on Rust parsers and trying to find a solution for my parsing needs in the future. (I'd rather have a Rust parser than a C one, because I like the language better and both should still work in Python).

Please let me know if you're making a lot of progress with PEG parsers and error recovery/incremental parsing. I'm definitely interested in copying an approach if it works :).
msg374634 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-07-31 10:51
New changeset cadda52d974937069eeebea1cca4229e2bd400df by Karthikeyan Singaravelan in branch 'master':
bpo-40360: Handle PendingDeprecationWarning in test_lib2to3. (GH-21694)
https://github.com/python/cpython/commit/cadda52d974937069eeebea1cca4229e2bd400df
msg374643 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-07-31 14:17
New changeset fe928b32daca184e16ccc0ebdc20314cfa776b98 by Karthikeyan Singaravelan in branch '3.9':
[3.9] bpo-40360: Handle PendingDeprecationWarning in test_lib2to3. (GH-21694) (GH-21697)
https://github.com/python/cpython/commit/fe928b32daca184e16ccc0ebdc20314cfa776b98
msg379014 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-10-19 20:41
status: lib2to3 PendingDeprecationWarning shipped in 3.9.  Since we don't have a specific release planned for the final deprecation, I'll leave this issue open while we figure that out.  Once we do, we should promote this to a regular DeprecationWarning in whichever release is next at that time.
History
Date User Action Args
2020-10-19 20:42:14gregory.p.smithsetassignee: gregory.p.smith ->
stage: patch review ->
2020-10-19 20:41:53gregory.p.smithsetmessages: + msg379014
versions: - Python 3.9
2020-07-31 14:17:26gvanrossumsetmessages: + msg374643
2020-07-31 11:19:14xtreaksetpull_requests: + pull_request20841
2020-07-31 10:51:14xtreaksetmessages: + msg374634
2020-07-31 10:51:13miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request20840
2020-07-31 03:26:42xtreaksetpull_requests: + pull_request20838
2020-07-30 11:11:47davidhaltersetmessages: + msg374611
2020-07-29 14:41:40gvanrossumsetmessages: + msg374574
2020-07-29 14:23:14xtreaksetmessages: + msg374573
2020-07-29 13:46:13gvanrossumsetmessages: + msg374572
2020-07-29 09:13:42xtreaksetnosy: + xtreak
messages: + msg374566
2020-07-26 22:56:43gvanrossumsetmessages: + msg374349
2020-07-26 22:42:01davidhaltersetmessages: + msg374348
2020-07-15 06:47:01wyz23x2setversions: + Python 3.10
2020-07-11 20:57:09gvanrossumsetmessages: + msg373538
2020-07-10 07:17:22davidhaltersetnosy: + davidhalter
messages: + msg373444
2020-07-08 20:46:58vstinnersetnosy: - vstinner
2020-07-08 19:05:48Peter Ludemannsetmessages: + msg373334
2020-07-08 18:59:47gvanrossumsetmessages: + msg373332
2020-07-08 17:38:12Peter Ludemannsetmessages: + msg373327
2020-07-06 23:55:00gvanrossumsetmessages: + msg373198
2020-07-06 22:11:18Peter Ludemannsetmessages: + msg373185
2020-05-07 23:27:54vstinnersetnosy: + vstinner
messages: + msg368388
2020-05-04 19:02:08gregory.p.smithsetmessages: + msg368077
2020-05-04 10:02:37hroncoksetpull_requests: + pull_request19209
2020-05-01 20:28:08hroncoksetmessages: + msg367884
2020-04-30 19:08:37gvanrossumsetmessages: + msg367771
2020-04-30 18:44:30hroncoksetmessages: + msg367770
2020-04-30 18:31:34gvanrossumsetmessages: + msg367767
2020-04-30 17:59:53carljmsetmessages: + msg367766
2020-04-30 07:35:50hroncoksetmessages: + msg367744
2020-04-30 07:27:56hroncoksetnosy: + hroncok
messages: + msg367743
2020-04-30 03:14:18carljmsetmessages: + msg367730
2020-04-30 01:39:50gvanrossumsetmessages: + msg367726
2020-04-29 23:51:42Peter Ludemannsetmessages: + msg367716
2020-04-27 16:41:12Peter Ludemannsetnosy: + Peter Ludemann
2020-04-25 10:06:02corona10setpull_requests: - pull_request19032
2020-04-25 08:11:24corona10setnosy: + corona10
pull_requests: + pull_request19032
2020-04-24 22:55:35gregory.p.smithsetmessages: + msg367235
2020-04-24 21:15:50carljmsetmessages: + msg367230
2020-04-24 18:21:21gregory.p.smithsetmessages: + msg367209
2020-04-24 18:19:54gregory.p.smithsetmessages: + msg367208
2020-04-22 21:15:02carljmsetmessages: + msg367051
2020-04-22 20:50:05carljmsetpull_requests: + pull_request18987
2020-04-22 18:10:08eric.snowsetnosy: + eric.snow
2020-04-22 17:57:20BTaskayasetnosy: + BTaskaya
2020-04-22 17:31:39carljmsetnosy: + carljm
messages: + msg367031
2020-04-22 14:26:04gvanrossumsetnosy: + gvanrossum
messages: + msg367005
2020-04-22 05:01:15gregory.p.smithsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request18969
2020-04-22 04:41:28gregory.p.smithsetcomponents: + 2to3 (2.x to 3.x conversion tool)
stage: needs patch
2020-04-22 04:40:54gregory.p.smithcreate