classification
Title: Replace the ASDL parser carried with CPython
Type: enhancement Stage: resolved
Components: Build Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eli.bendersky Nosy List: benjamin.peterson, brett.cannon, eli.bendersky, eric.snow, larry, ncoghlan, python-dev, rhettinger, scoder, serhiy.storchaka, skrah, techtonik
Priority: normal Keywords: patch

Created on 2013-11-19 14:01 by eli.bendersky, last changed 2014-05-10 18:29 by scoder. This issue is now closed.

Files
File name Uploaded Description Edit
new-asdl-parser.issue19655.1.patch eli.bendersky, 2014-03-31 13:12 review
new-asdl-parser.issue19655.2.patch eli.bendersky, 2014-04-05 21:49 review
Messages (34)
msg203376 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-11-19 14:01
It was mentioned in one of the recent python-dev threads that making the Python code-base simpler to encourage involvement of contributors is a goal, so I figured this may be relevant.

I've recently written a new parser for the ASDL specification language from scratch and think it may be beneficial to replace the existing parser we use:

* The existing parser uses an external parser-generator (Spark) that we carry around in the Parser/ directory; the new parser is a very simple stand-alone recursive descent, which makes it easier to maintain and doesn't require a familiarity with Spark.
* The new code is significantly smaller. ~400 LOC for the whole stand-alone parser (asdl.py) as opposed to >1200 LOC for the existing parser+Spark.
* The existing asdl.py is old code and was only superficially ported to Python 3.x - this shows, and isn't a good example of using modern Python techniques. My asdl.py uses Python 3.4 with modern idioms like dict comprehensions, generators and enums.

For a start, it may be easier to review the parser separately and not as a patch file. I split it to a stand-alone project here: https://github.com/eliben/asdl_parser

The asdl.py there is a drop-in replacement for Parser/asdl.py; asdl_c.py is for Parser/asdl_c.py - with tiny modifications to interface the new parser (mainly getting rid of some Spark-induced quirks). The AST .c and .h files produced are identical. The repo also has some tests for the parser, which we may find useful in adding to the test suite or the Parser directory.
msg203381 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-11-19 14:30
FWIW, asdl_c.py could use some "modernization", but I'll defer this to a later cleanup in order to do things gradually.

The same can be said for the Makefile rules - they can be simpler and more efficient (no need to invoke asdl_c / parse the ASDL twice, for example).

Incidentally, where would be a good place to put the ASDL tests? Can/should we reach into the Parser/ directory when running the Python regression tests (will this even work in an installed Python)? Or should I just plop asdl_test.py alongside asdl.py and mention that it should be run when asdl.py changes?
msg203382 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-11-19 14:37
A week before beta?  How confident are you in this new parser?
msg203384 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-11-19 14:40
Larry, ease your worries: note that I only tagged this on version 3.5!

That said, this parser runs during the build and produces a .h file and .c file - these partake in the build; I verified that the generated code is *identical* to before, so there's not much to worry about. Still, I don't see a reason to land this before the beta branches. I'll do it onto the default branch after this weekend (assuming the reviews come through).
msg203385 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2013-11-19 14:40
Are we at beta for 3.5 already? :)
msg203429 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-11-19 21:43
We could take the opportunity to ast scripts to a Tools/ subdir. Then you could use whatever it is test_tools.py uses.
msg204069 - (view) Author: anatoly techtonik (techtonik) Date: 2013-11-23 18:20
+1 for initiative, points that are nice to be addressed are below.

1. "Python 3.4 with modern idioms" - too Python-specific code raises the barrier. I'd prefer simplicity and portability over modernism. Like how hard is it to port the parser into JS with PythonJS, for example?

2. ASDL specification is mostly offline. One PDF survived, but IR browser and source for did not, which is a pity, because visual tools are one of the most attractive. In any case, it may worth to contact university - they might have backups and resurrect browser in Python (GCI, GSoC).

3. File organization. This is bad:
 Grammar/Grammar
 Parser/
 Python/
This is good:
 Core/README.md
 Core/Grammar
 Core/Parser/
 Core/Processor/ (builds AST)
 Core/Interpreter/
 Core/Tests/
I wonder what is PyPy layout? It may worth to steal it for consistency.

4. Specific problem with ASDL parsing - currently, by ASDL syntax all `expr` are allowed on the left side of assign node. This is not true for  real app. It makes sense to clarify in README.md these borders (who checks what) and modify ASDL to reflect the restriction.
msg204225 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-11-24 15:39
All code going into Python should be idiomatic unless it's meant to be released externally and there are backwards-compatibility concerns. It's Eli's call as to whether he wants to maintain a PyPI project for this after integration.

Points 2-4 are off-topic for this issue.
msg204240 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-11-24 17:38
Does anyone have comments on the code or can I prepare a patch for default?

Would it make sense to wait with this until the 3.4 branch is created or can I just commit to default? Note that this change is not a new feature and is essentially a no-op as far as the resulting CPython executable - it's just tweaking the build process to generate the same data in a different way.
msg204266 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-11-24 20:51
It's Larry's call in the end but I personally don't care either way, especially since this isn't user-facing code.
msg204267 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-11-24 20:55
Are the generated files *byte for byte* the same as produced by the existing parser generation process?
msg204279 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-11-24 22:53
Larry Hastings added the comment:

>
> Are the generated files *byte for byte* the same as produced by the
> existing parser generation process?
>

Correct. The generator runs during the build (in the Makefile), but only if
the files were out-of-date. It takes Python.asdl and produces Python-ast.h
and Python-ast.c; the latter two are compiled into the CPython executable.
The .h/.c files produced by my alternative generator are exactly identical
to the ones in there now.

I don't feel strongly about this, but I may need a refresher in the release
process rules. From today and until Feb 23rd, 2014 - are we not supposed to
be adding new features/PEPs or are we also not supposed to do any code
refactorings and/or any changes at all unless those are directly related to
fixing a specific bug/issue?
msg204281 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-11-24 22:58
The rule is, no new features.  Bug and security fixes from now on.

It isn't always clear whether or not something is a new "feature".  In the case of such ambiguity, the decision is up to the sole discretion of the release manager.

If you seriously want to check in this thing (*sigh*), I'll take a look at it.  But you'll have to point me at either a patch that applies cleanly to trunk, or a repository somewhere online with the patch already applied to trunk.

Something like a code refactor is probably okay during betas, but during RCs I would really prefer we keep checkins strictly to a minimum.
msg204282 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-11-24 23:03
On Sun, Nov 24, 2013 at 2:58 PM, Larry Hastings <report@bugs.python.org>wrote:

>
> Larry Hastings added the comment:
>
> The rule is, no new features.  Bug and security fixes from now on.
>
> It isn't always clear whether or not something is a new "feature".  In the
> case of such ambiguity, the decision is up to the sole discretion of the
> release manager.
>
> If you seriously want to check in this thing (*sigh*), I'll take a look at
> it.  But you'll have to point me at either a patch that applies cleanly to
> trunk, or a repository somewhere online with the patch already applied to
> trunk.
>

There really is no urgency here. I don't won't to add needless work onto
your place, and am fine with just leaving it be until the 3.4 branch is cut
and landing it on the default branch aimed for 3.5 after that.
msg204368 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2013-11-25 17:17
The concern, as far as I understand it, is that any change might introduce regressions or even new bugs in the next beta.  Something like swapping out the parser generator implementation isn't a bug fix, but it isn't a new language (incl. stdlib) feature either.

I don't have a problem with the change.  Furthermore, it likely doesn't matter since there probably won't be any grammar changes during the beta that would trigger the tool.
msg204377 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-11-25 18:11
Let's just go with Eli's latest idea and just save it for 3.5 since it won't make any visible improvement in 3.4.
msg215210 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2014-03-30 22:16
There were no serious objections bar the pre-release timing. Now that we're safely in 3.5 territory, can I go ahead and create a patch?

Note that for purposes of review, the Github project linked in the original message is more convenient, IMHO
msg215217 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-03-31 00:58
It certainly seems more compact and simple that the current parser, so +1 from me.
msg215219 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-03-31 04:36
+1 for moving this forward as early in the 3.5 cycle as possible.
msg215234 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2014-03-31 13:12
Attaching patch that implements this. To make it easier, the patch only replaces the ASDL parser - not touching anything else and leaving the output intact.

With this patch applied, when the Makefile is rerun it regenerates the actual AST code in:

  Include/Python-ast.h
  Python/Python-ast.c

However, as the new parser generates exactly the same files, the code above is identical to the originals (hg diff shows nothing). In practice this means that no one building Python should notice this change, unless she's playing with the ASDL input file itself. The Makefile will not even regenerate these files unless the parser or the ASDL file were modified.

I have some additional ideas - to be delayed for subsequent patches:

1. The existing Spark-based parser didn't have tests, but my parser does. I'll do something similar to test_tools.py, as Benjamin suggested (to test the parser only in source builds).
2. The code of asdl_c.py could be modernized and be made even cleaner.
3. The same is true of the generated C code for the AST.
msg215447 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2014-04-03 13:41
Since there has mostly been support for this, I'll wait a couple more days and commit it unless someones objects or asks for more time for review.
msg215518 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-04 13:10
Now make fails when system Python is older than 3.4.
msg215530 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2014-04-04 15:30
On Fri, Apr 4, 2014 at 6:10 AM, Serhiy Storchaka <report@bugs.python.org>wrote:

>
> Serhiy Storchaka added the comment:
>
> Now make fails when system Python is older than 3.4.
>
>
This is why the .h & .c files are checked in - someone just building Python
doesn't need them regenerated at all. Only if one wants to *modify the
AST*, he'll need an up-to-date Python. Otherwise we'll have to stick to the
"oldest Python possible" for every script we use internally. I think this
was discussed on the mailing list(s) at some point.
msg215574 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-04-04 22:45
IIRC, that discussion was just about Python 2 vs Python 3. Can we get the
AST rebuild requirement dropped back to "python3" being 3.3+ for the time
being so we don't break Fedora?
msg215613 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2014-04-05 14:05
Nick, it shouldn't be hard to drop to 3.3, but I'm curious why would the 3.4 requirement break Fedora, or anything for that matter? Does Fedora regenerate the C implementation of the AST for some reason on every build? AFAIU, building Python from source with "make" should not do that.
msg215615 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-04-05 14:11
It won't break the build, but requiring 3.4 to be installed (rather than
3.3) makes it more annoying for me (and other Fedora users) to work on the
compiler before F21 is released :)
msg215616 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-04-05 14:15
I was told to keep Argument Clinic compatible with 3.3.  I think it's a good idea for the tools to not require absolutely current Python.  Would it be a big deal to support 3.3?
msg215635 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2014-04-05 21:49
Updated patch attached:

1. Python 3.3+ supported (I suspect 3.2 will work too)
2. Incorporated Serhiy's suggestions (thanks for the review!)
msg218207 - (view) Author: Roundup Robot (python-dev) Date: 2014-05-10 00:58
New changeset b769352e2922 by Eli Bendersky in branch 'default':
Issue #19655: Replace the ASDL parser carried with CPython
http://hg.python.org/cpython/rev/b769352e2922
msg218210 - (view) Author: Roundup Robot (python-dev) Date: 2014-05-10 02:03
New changeset 604e1b1a7777 by Eli Bendersky in branch 'default':
Issue #19655: Add tests for the new asdl parser.
http://hg.python.org/cpython/rev/604e1b1a7777
msg218213 - (view) Author: Stefan Behnel (scoder) * Date: 2014-05-10 08:41
The "avoid rebuilding" part doesn't seem to work for me. Source build currently fails as follows:

"""
/bin/mkdir -p Include
python ./Parser/asdl_c.py -h Include ./Parser/Python.asdl
# Substitution happens here, as the completely-expanded BINDIR
# is not available in configure
sed -e "s,@EXENAME@,.../INSTALL/py3km/bin/python3.5dm," < ./Misc/python-config.in >python-config.py
# Replace makefile compat. variable references with shell script compat. ones;  -> 
sed -e 's,\$(\([A-Za-z0-9_]*\)),\$\{\1\},g' < Misc/python-config.sh >python-config
# On Darwin, always use the python version of the script, the shell
# version doesn't use the compiler customizations that are provided
# in python (_osx_support.py).
if test `uname -s` = Darwin; then \
		cp python-config.py python-config; \
	fi
Traceback (most recent call last):
  File "./Parser/asdl_c.py", line 1312, in <module>
    main(args[0], dump_module)
  File "./Parser/asdl_c.py", line 1251, in main
    if not asdl.check(mod):
  File ".../cpython/Parser/asdl.py", line 183, in check
    v = Check()
  File ".../cpython/Parser/asdl.py", line 140, in __init__
    super().__init__()
TypeError: super() takes at least 1 argument (0 given)
"""

Python installation is 2.7 (originally 2.5 at the system level, but recent build changes broke that), no Py3 available.
msg218215 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2014-05-10 10:24
Stefan, you need to run `make touch` if you want to avoid rebuilding. See #15964 for more details.

[all bots run `make touch` before building now]
msg218216 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2014-05-10 10:28
This is also described in the Dev Guide: https://docs.python.org/devguide/setup.html
msg218232 - (view) Author: Stefan Behnel (scoder) * Date: 2014-05-10 18:29
That fixes it. Thanks!
History
Date User Action Args
2014-05-10 18:29:05scodersetmessages: + msg218232
2014-05-10 10:28:57eli.benderskysetmessages: + msg218216
2014-05-10 10:24:03eli.benderskysetmessages: + msg218215
2014-05-10 08:41:56scodersetnosy: + scoder
messages: + msg218213
2014-05-10 02:03:24python-devsetmessages: + msg218210
2014-05-10 00:59:18eli.benderskysetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2014-05-10 00:58:22python-devsetnosy: + python-dev
messages: + msg218207
2014-04-05 21:49:28eli.benderskysetfiles: + new-asdl-parser.issue19655.2.patch

messages: + msg215635
2014-04-05 14:15:50larrysetmessages: + msg215616
2014-04-05 14:11:27ncoghlansetmessages: + msg215615
2014-04-05 14:05:11eli.benderskysetmessages: + msg215613
2014-04-04 22:45:18ncoghlansetmessages: + msg215574
2014-04-04 15:30:41eli.benderskysetmessages: + msg215530
2014-04-04 13:10:12serhiy.storchakasetmessages: + msg215518
2014-04-03 13:42:00eli.benderskysetmessages: + msg215447
2014-03-31 13:12:39eli.benderskysetfiles: + new-asdl-parser.issue19655.1.patch
keywords: + patch
messages: + msg215234
2014-03-31 04:36:19rhettingersetmessages: + msg215219
2014-03-31 03:50:39rhettingersetnosy: + rhettinger
2014-03-31 00:59:00benjamin.petersonsetmessages: + msg215217
2014-03-30 22:16:06eli.benderskysetmessages: + msg215210
2014-02-03 15:45:08BreamoreBoysetnosy: - BreamoreBoy
2013-11-25 18:11:24brett.cannonsetmessages: + msg204377
2013-11-25 17:17:08eric.snowsetmessages: + msg204368
2013-11-24 23:03:22eli.benderskysetmessages: + msg204282
2013-11-24 22:58:50larrysetmessages: + msg204281
2013-11-24 22:56:52serhiy.storchakasetnosy: + serhiy.storchaka
2013-11-24 22:53:03eli.benderskysetmessages: + msg204279
2013-11-24 20:55:40larrysetmessages: + msg204267
2013-11-24 20:51:16brett.cannonsetmessages: + msg204266
2013-11-24 17:38:39eli.benderskysetmessages: + msg204240
2013-11-24 15:39:47brett.cannonsetmessages: + msg204225
2013-11-23 18:20:47techtoniksetnosy: + techtonik
messages: + msg204069
2013-11-19 21:43:55benjamin.petersonsetmessages: + msg203429
2013-11-19 20:28:54eric.snowsetnosy: + benjamin.peterson
2013-11-19 16:39:08skrahsetnosy: + skrah
2013-11-19 14:40:39BreamoreBoysetnosy: + BreamoreBoy
messages: + msg203385
2013-11-19 14:40:32eli.benderskysetmessages: + msg203384
2013-11-19 14:37:00larrysetnosy: + larry
messages: + msg203382
2013-11-19 14:30:52eli.benderskysetmessages: + msg203381
2013-11-19 14:03:08eli.benderskysetnosy: + eric.snow
2013-11-19 14:01:36eli.benderskycreate