msg203376 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2013-11-19 14:01 |
It was mentioned in one of the recent python-dev threads that making the Python code-base simpler to encourage involvement of contributors is a goal, so I figured this may be relevant.
I've recently written a new parser for the ASDL specification language from scratch and think it may be beneficial to replace the existing parser we use:
* The existing parser uses an external parser-generator (Spark) that we carry around in the Parser/ directory; the new parser is a very simple stand-alone recursive descent, which makes it easier to maintain and doesn't require a familiarity with Spark.
* The new code is significantly smaller. ~400 LOC for the whole stand-alone parser (asdl.py) as opposed to >1200 LOC for the existing parser+Spark.
* The existing asdl.py is old code and was only superficially ported to Python 3.x - this shows, and isn't a good example of using modern Python techniques. My asdl.py uses Python 3.4 with modern idioms like dict comprehensions, generators and enums.
For a start, it may be easier to review the parser separately and not as a patch file. I split it to a stand-alone project here: https://github.com/eliben/asdl_parser
The asdl.py there is a drop-in replacement for Parser/asdl.py; asdl_c.py is for Parser/asdl_c.py - with tiny modifications to interface the new parser (mainly getting rid of some Spark-induced quirks). The AST .c and .h files produced are identical. The repo also has some tests for the parser, which we may find useful in adding to the test suite or the Parser directory.
|
msg203381 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2013-11-19 14:30 |
FWIW, asdl_c.py could use some "modernization", but I'll defer this to a later cleanup in order to do things gradually.
The same can be said for the Makefile rules - they can be simpler and more efficient (no need to invoke asdl_c / parse the ASDL twice, for example).
Incidentally, where would be a good place to put the ASDL tests? Can/should we reach into the Parser/ directory when running the Python regression tests (will this even work in an installed Python)? Or should I just plop asdl_test.py alongside asdl.py and mention that it should be run when asdl.py changes?
|
msg203382 - (view) |
Author: Larry Hastings (larry) *  |
Date: 2013-11-19 14:37 |
A week before beta? How confident are you in this new parser?
|
msg203384 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2013-11-19 14:40 |
Larry, ease your worries: note that I only tagged this on version 3.5!
That said, this parser runs during the build and produces a .h file and .c file - these partake in the build; I verified that the generated code is *identical* to before, so there's not much to worry about. Still, I don't see a reason to land this before the beta branches. I'll do it onto the default branch after this weekend (assuming the reviews come through).
|
msg203385 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2013-11-19 14:40 |
Are we at beta for 3.5 already? :)
|
msg203429 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2013-11-19 21:43 |
We could take the opportunity to ast scripts to a Tools/ subdir. Then you could use whatever it is test_tools.py uses.
|
msg204069 - (view) |
Author: anatoly techtonik (techtonik) |
Date: 2013-11-23 18:20 |
+1 for initiative, points that are nice to be addressed are below.
1. "Python 3.4 with modern idioms" - too Python-specific code raises the barrier. I'd prefer simplicity and portability over modernism. Like how hard is it to port the parser into JS with PythonJS, for example?
2. ASDL specification is mostly offline. One PDF survived, but IR browser and source for did not, which is a pity, because visual tools are one of the most attractive. In any case, it may worth to contact university - they might have backups and resurrect browser in Python (GCI, GSoC).
3. File organization. This is bad:
Grammar/Grammar
Parser/
Python/
This is good:
Core/README.md
Core/Grammar
Core/Parser/
Core/Processor/ (builds AST)
Core/Interpreter/
Core/Tests/
I wonder what is PyPy layout? It may worth to steal it for consistency.
4. Specific problem with ASDL parsing - currently, by ASDL syntax all `expr` are allowed on the left side of assign node. This is not true for real app. It makes sense to clarify in README.md these borders (who checks what) and modify ASDL to reflect the restriction.
|
msg204225 - (view) |
Author: Brett Cannon (brett.cannon) *  |
Date: 2013-11-24 15:39 |
All code going into Python should be idiomatic unless it's meant to be released externally and there are backwards-compatibility concerns. It's Eli's call as to whether he wants to maintain a PyPI project for this after integration.
Points 2-4 are off-topic for this issue.
|
msg204240 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2013-11-24 17:38 |
Does anyone have comments on the code or can I prepare a patch for default?
Would it make sense to wait with this until the 3.4 branch is created or can I just commit to default? Note that this change is not a new feature and is essentially a no-op as far as the resulting CPython executable - it's just tweaking the build process to generate the same data in a different way.
|
msg204266 - (view) |
Author: Brett Cannon (brett.cannon) *  |
Date: 2013-11-24 20:51 |
It's Larry's call in the end but I personally don't care either way, especially since this isn't user-facing code.
|
msg204267 - (view) |
Author: Larry Hastings (larry) *  |
Date: 2013-11-24 20:55 |
Are the generated files *byte for byte* the same as produced by the existing parser generation process?
|
msg204279 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2013-11-24 22:53 |
Larry Hastings added the comment:
>
> Are the generated files *byte for byte* the same as produced by the
> existing parser generation process?
>
Correct. The generator runs during the build (in the Makefile), but only if
the files were out-of-date. It takes Python.asdl and produces Python-ast.h
and Python-ast.c; the latter two are compiled into the CPython executable.
The .h/.c files produced by my alternative generator are exactly identical
to the ones in there now.
I don't feel strongly about this, but I may need a refresher in the release
process rules. From today and until Feb 23rd, 2014 - are we not supposed to
be adding new features/PEPs or are we also not supposed to do any code
refactorings and/or any changes at all unless those are directly related to
fixing a specific bug/issue?
|
msg204281 - (view) |
Author: Larry Hastings (larry) *  |
Date: 2013-11-24 22:58 |
The rule is, no new features. Bug and security fixes from now on.
It isn't always clear whether or not something is a new "feature". In the case of such ambiguity, the decision is up to the sole discretion of the release manager.
If you seriously want to check in this thing (*sigh*), I'll take a look at it. But you'll have to point me at either a patch that applies cleanly to trunk, or a repository somewhere online with the patch already applied to trunk.
Something like a code refactor is probably okay during betas, but during RCs I would really prefer we keep checkins strictly to a minimum.
|
msg204282 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2013-11-24 23:03 |
On Sun, Nov 24, 2013 at 2:58 PM, Larry Hastings <report@bugs.python.org>wrote:
>
> Larry Hastings added the comment:
>
> The rule is, no new features. Bug and security fixes from now on.
>
> It isn't always clear whether or not something is a new "feature". In the
> case of such ambiguity, the decision is up to the sole discretion of the
> release manager.
>
> If you seriously want to check in this thing (*sigh*), I'll take a look at
> it. But you'll have to point me at either a patch that applies cleanly to
> trunk, or a repository somewhere online with the patch already applied to
> trunk.
>
There really is no urgency here. I don't won't to add needless work onto
your place, and am fine with just leaving it be until the 3.4 branch is cut
and landing it on the default branch aimed for 3.5 after that.
|
msg204368 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2013-11-25 17:17 |
The concern, as far as I understand it, is that any change might introduce regressions or even new bugs in the next beta. Something like swapping out the parser generator implementation isn't a bug fix, but it isn't a new language (incl. stdlib) feature either.
I don't have a problem with the change. Furthermore, it likely doesn't matter since there probably won't be any grammar changes during the beta that would trigger the tool.
|
msg204377 - (view) |
Author: Brett Cannon (brett.cannon) *  |
Date: 2013-11-25 18:11 |
Let's just go with Eli's latest idea and just save it for 3.5 since it won't make any visible improvement in 3.4.
|
msg215210 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2014-03-30 22:16 |
There were no serious objections bar the pre-release timing. Now that we're safely in 3.5 territory, can I go ahead and create a patch?
Note that for purposes of review, the Github project linked in the original message is more convenient, IMHO
|
msg215217 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2014-03-31 00:58 |
It certainly seems more compact and simple that the current parser, so +1 from me.
|
msg215219 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2014-03-31 04:36 |
+1 for moving this forward as early in the 3.5 cycle as possible.
|
msg215234 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2014-03-31 13:12 |
Attaching patch that implements this. To make it easier, the patch only replaces the ASDL parser - not touching anything else and leaving the output intact.
With this patch applied, when the Makefile is rerun it regenerates the actual AST code in:
Include/Python-ast.h
Python/Python-ast.c
However, as the new parser generates exactly the same files, the code above is identical to the originals (hg diff shows nothing). In practice this means that no one building Python should notice this change, unless she's playing with the ASDL input file itself. The Makefile will not even regenerate these files unless the parser or the ASDL file were modified.
I have some additional ideas - to be delayed for subsequent patches:
1. The existing Spark-based parser didn't have tests, but my parser does. I'll do something similar to test_tools.py, as Benjamin suggested (to test the parser only in source builds).
2. The code of asdl_c.py could be modernized and be made even cleaner.
3. The same is true of the generated C code for the AST.
|
msg215447 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2014-04-03 13:41 |
Since there has mostly been support for this, I'll wait a couple more days and commit it unless someones objects or asks for more time for review.
|
msg215518 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-04-04 13:10 |
Now make fails when system Python is older than 3.4.
|
msg215530 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2014-04-04 15:30 |
On Fri, Apr 4, 2014 at 6:10 AM, Serhiy Storchaka <report@bugs.python.org>wrote:
>
> Serhiy Storchaka added the comment:
>
> Now make fails when system Python is older than 3.4.
>
>
This is why the .h & .c files are checked in - someone just building Python
doesn't need them regenerated at all. Only if one wants to *modify the
AST*, he'll need an up-to-date Python. Otherwise we'll have to stick to the
"oldest Python possible" for every script we use internally. I think this
was discussed on the mailing list(s) at some point.
|
msg215574 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2014-04-04 22:45 |
IIRC, that discussion was just about Python 2 vs Python 3. Can we get the
AST rebuild requirement dropped back to "python3" being 3.3+ for the time
being so we don't break Fedora?
|
msg215613 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2014-04-05 14:05 |
Nick, it shouldn't be hard to drop to 3.3, but I'm curious why would the 3.4 requirement break Fedora, or anything for that matter? Does Fedora regenerate the C implementation of the AST for some reason on every build? AFAIU, building Python from source with "make" should not do that.
|
msg215615 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2014-04-05 14:11 |
It won't break the build, but requiring 3.4 to be installed (rather than
3.3) makes it more annoying for me (and other Fedora users) to work on the
compiler before F21 is released :)
|
msg215616 - (view) |
Author: Larry Hastings (larry) *  |
Date: 2014-04-05 14:15 |
I was told to keep Argument Clinic compatible with 3.3. I think it's a good idea for the tools to not require absolutely current Python. Would it be a big deal to support 3.3?
|
msg215635 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2014-04-05 21:49 |
Updated patch attached:
1. Python 3.3+ supported (I suspect 3.2 will work too)
2. Incorporated Serhiy's suggestions (thanks for the review!)
|
msg218207 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2014-05-10 00:58 |
New changeset b769352e2922 by Eli Bendersky in branch 'default':
Issue #19655: Replace the ASDL parser carried with CPython
http://hg.python.org/cpython/rev/b769352e2922
|
msg218210 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2014-05-10 02:03 |
New changeset 604e1b1a7777 by Eli Bendersky in branch 'default':
Issue #19655: Add tests for the new asdl parser.
http://hg.python.org/cpython/rev/604e1b1a7777
|
msg218213 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-05-10 08:41 |
The "avoid rebuilding" part doesn't seem to work for me. Source build currently fails as follows:
"""
/bin/mkdir -p Include
python ./Parser/asdl_c.py -h Include ./Parser/Python.asdl
# Substitution happens here, as the completely-expanded BINDIR
# is not available in configure
sed -e "s,@EXENAME@,.../INSTALL/py3km/bin/python3.5dm," < ./Misc/python-config.in >python-config.py
# Replace makefile compat. variable references with shell script compat. ones; ->
sed -e 's,\$(\([A-Za-z0-9_]*\)),\$\{\1\},g' < Misc/python-config.sh >python-config
# On Darwin, always use the python version of the script, the shell
# version doesn't use the compiler customizations that are provided
# in python (_osx_support.py).
if test `uname -s` = Darwin; then \
cp python-config.py python-config; \
fi
Traceback (most recent call last):
File "./Parser/asdl_c.py", line 1312, in <module>
main(args[0], dump_module)
File "./Parser/asdl_c.py", line 1251, in main
if not asdl.check(mod):
File ".../cpython/Parser/asdl.py", line 183, in check
v = Check()
File ".../cpython/Parser/asdl.py", line 140, in __init__
super().__init__()
TypeError: super() takes at least 1 argument (0 given)
"""
Python installation is 2.7 (originally 2.5 at the system level, but recent build changes broke that), no Py3 available.
|
msg218215 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2014-05-10 10:24 |
Stefan, you need to run `make touch` if you want to avoid rebuilding. See #15964 for more details.
[all bots run `make touch` before building now]
|
msg218216 - (view) |
Author: Eli Bendersky (eli.bendersky) *  |
Date: 2014-05-10 10:28 |
This is also described in the Dev Guide: https://docs.python.org/devguide/setup.html
|
msg218232 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-05-10 18:29 |
That fixes it. Thanks!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:53 | admin | set | github: 63854 |
2014-05-10 18:29:05 | scoder | set | messages:
+ msg218232 |
2014-05-10 10:28:57 | eli.bendersky | set | messages:
+ msg218216 |
2014-05-10 10:24:03 | eli.bendersky | set | messages:
+ msg218215 |
2014-05-10 08:41:56 | scoder | set | nosy:
+ scoder messages:
+ msg218213
|
2014-05-10 02:03:24 | python-dev | set | messages:
+ msg218210 |
2014-05-10 00:59:18 | eli.bendersky | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2014-05-10 00:58:22 | python-dev | set | nosy:
+ python-dev messages:
+ msg218207
|
2014-04-05 21:49:28 | eli.bendersky | set | files:
+ new-asdl-parser.issue19655.2.patch
messages:
+ msg215635 |
2014-04-05 14:15:50 | larry | set | messages:
+ msg215616 |
2014-04-05 14:11:27 | ncoghlan | set | messages:
+ msg215615 |
2014-04-05 14:05:11 | eli.bendersky | set | messages:
+ msg215613 |
2014-04-04 22:45:18 | ncoghlan | set | messages:
+ msg215574 |
2014-04-04 15:30:41 | eli.bendersky | set | messages:
+ msg215530 |
2014-04-04 13:10:12 | serhiy.storchaka | set | messages:
+ msg215518 |
2014-04-03 13:42:00 | eli.bendersky | set | messages:
+ msg215447 |
2014-03-31 13:12:39 | eli.bendersky | set | files:
+ new-asdl-parser.issue19655.1.patch keywords:
+ patch messages:
+ msg215234
|
2014-03-31 04:36:19 | rhettinger | set | messages:
+ msg215219 |
2014-03-31 03:50:39 | rhettinger | set | nosy:
+ rhettinger
|
2014-03-31 00:59:00 | benjamin.peterson | set | messages:
+ msg215217 |
2014-03-30 22:16:06 | eli.bendersky | set | messages:
+ msg215210 |
2014-02-03 15:45:08 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2013-11-25 18:11:24 | brett.cannon | set | messages:
+ msg204377 |
2013-11-25 17:17:08 | eric.snow | set | messages:
+ msg204368 |
2013-11-24 23:03:22 | eli.bendersky | set | messages:
+ msg204282 |
2013-11-24 22:58:50 | larry | set | messages:
+ msg204281 |
2013-11-24 22:56:52 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka
|
2013-11-24 22:53:03 | eli.bendersky | set | messages:
+ msg204279 |
2013-11-24 20:55:40 | larry | set | messages:
+ msg204267 |
2013-11-24 20:51:16 | brett.cannon | set | messages:
+ msg204266 |
2013-11-24 17:38:39 | eli.bendersky | set | messages:
+ msg204240 |
2013-11-24 15:39:47 | brett.cannon | set | messages:
+ msg204225 |
2013-11-23 18:20:47 | techtonik | set | nosy:
+ techtonik messages:
+ msg204069
|
2013-11-19 21:43:55 | benjamin.peterson | set | messages:
+ msg203429 |
2013-11-19 20:28:54 | eric.snow | set | nosy:
+ benjamin.peterson
|
2013-11-19 16:39:08 | skrah | set | nosy:
+ skrah
|
2013-11-19 14:40:39 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg203385
|
2013-11-19 14:40:32 | eli.bendersky | set | messages:
+ msg203384 |
2013-11-19 14:37:00 | larry | set | nosy:
+ larry messages:
+ msg203382
|
2013-11-19 14:30:52 | eli.bendersky | set | messages:
+ msg203381 |
2013-11-19 14:03:08 | eli.bendersky | set | nosy:
+ eric.snow
|
2013-11-19 14:01:36 | eli.bendersky | create | |