Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the ASDL parser carried with CPython #63854

Closed
elibendersky mannequin opened this issue Nov 19, 2013 · 34 comments
Closed

Replace the ASDL parser carried with CPython #63854

elibendersky mannequin opened this issue Nov 19, 2013 · 34 comments
Labels
build The build process and cross-build type-feature A feature request or enhancement

Comments

@elibendersky
Copy link
Mannequin

elibendersky mannequin commented Nov 19, 2013

BPO 19655
Nosy @brettcannon, @rhettinger, @ncoghlan, @scoder, @larryhastings, @benjaminp, @skrah, @ericsnowcurrently, @serhiy-storchaka
Files
  • new-asdl-parser.issue19655.1.patch
  • new-asdl-parser.issue19655.2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-05-10.00:59:18.348>
    created_at = <Date 2013-11-19.14:01:36.034>
    labels = ['type-feature', 'build']
    title = 'Replace the ASDL parser carried with CPython'
    updated_at = <Date 2014-05-10.18:29:05.965>
    user = 'https://bugs.python.org/elibendersky'

    bugs.python.org fields:

    activity = <Date 2014-05-10.18:29:05.965>
    actor = 'scoder'
    assignee = 'eli.bendersky'
    closed = True
    closed_date = <Date 2014-05-10.00:59:18.348>
    closer = 'eli.bendersky'
    components = ['Build']
    creation = <Date 2013-11-19.14:01:36.034>
    creator = 'eli.bendersky'
    dependencies = []
    files = ['34679', '34742']
    hgrepos = []
    issue_num = 19655
    keywords = ['patch']
    message_count = 34.0
    messages = ['203376', '203381', '203382', '203384', '203385', '203429', '204069', '204225', '204240', '204266', '204267', '204279', '204281', '204282', '204368', '204377', '215210', '215217', '215219', '215234', '215447', '215518', '215530', '215574', '215613', '215615', '215616', '215635', '218207', '218210', '218213', '218215', '218216', '218232']
    nosy_count = 12.0
    nosy_names = ['brett.cannon', 'rhettinger', 'ncoghlan', 'scoder', 'larry', 'techtonik', 'benjamin.peterson', 'eli.bendersky', 'skrah', 'python-dev', 'eric.snow', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue19655'
    versions = ['Python 3.5']

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Nov 19, 2013

    It was mentioned in one of the recent python-dev threads that making the Python code-base simpler to encourage involvement of contributors is a goal, so I figured this may be relevant.

    I've recently written a new parser for the ASDL specification language from scratch and think it may be beneficial to replace the existing parser we use:

    • The existing parser uses an external parser-generator (Spark) that we carry around in the Parser/ directory; the new parser is a very simple stand-alone recursive descent, which makes it easier to maintain and doesn't require a familiarity with Spark.
    • The new code is significantly smaller. ~400 LOC for the whole stand-alone parser (asdl.py) as opposed to >1200 LOC for the existing parser+Spark.
    • The existing asdl.py is old code and was only superficially ported to Python 3.x - this shows, and isn't a good example of using modern Python techniques. My asdl.py uses Python 3.4 with modern idioms like dict comprehensions, generators and enums.

    For a start, it may be easier to review the parser separately and not as a patch file. I split it to a stand-alone project here: https://github.com/eliben/asdl_parser

    The asdl.py there is a drop-in replacement for Parser/asdl.py; asdl_c.py is for Parser/asdl_c.py - with tiny modifications to interface the new parser (mainly getting rid of some Spark-induced quirks). The AST .c and .h files produced are identical. The repo also has some tests for the parser, which we may find useful in adding to the test suite or the Parser directory.

    @elibendersky elibendersky mannequin self-assigned this Nov 19, 2013
    @elibendersky elibendersky mannequin added build The build process and cross-build type-feature A feature request or enhancement labels Nov 19, 2013
    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Nov 19, 2013

    FWIW, asdl_c.py could use some "modernization", but I'll defer this to a later cleanup in order to do things gradually.

    The same can be said for the Makefile rules - they can be simpler and more efficient (no need to invoke asdl_c / parse the ASDL twice, for example).

    Incidentally, where would be a good place to put the ASDL tests? Can/should we reach into the Parser/ directory when running the Python regression tests (will this even work in an installed Python)? Or should I just plop asdl_test.py alongside asdl.py and mention that it should be run when asdl.py changes?

    @larryhastings
    Copy link
    Contributor

    A week before beta? How confident are you in this new parser?

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Nov 19, 2013

    Larry, ease your worries: note that I only tagged this on version 3.5!

    That said, this parser runs during the build and produces a .h file and .c file - these partake in the build; I verified that the generated code is *identical* to before, so there's not much to worry about. Still, I don't see a reason to land this before the beta branches. I'll do it onto the default branch after this weekend (assuming the reviews come through).

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Nov 19, 2013

    Are we at beta for 3.5 already? :)

    @benjaminp
    Copy link
    Contributor

    We could take the opportunity to ast scripts to a Tools/ subdir. Then you could use whatever it is test_tools.py uses.

    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented Nov 23, 2013

    +1 for initiative, points that are nice to be addressed are below.

    1. "Python 3.4 with modern idioms" - too Python-specific code raises the barrier. I'd prefer simplicity and portability over modernism. Like how hard is it to port the parser into JS with PythonJS, for example?

    2. ASDL specification is mostly offline. One PDF survived, but IR browser and source for did not, which is a pity, because visual tools are one of the most attractive. In any case, it may worth to contact university - they might have backups and resurrect browser in Python (GCI, GSoC).

    3. File organization. This is bad:
      Grammar/Grammar
      Parser/
      Python/
      This is good:
      Core/README.md
      Core/Grammar
      Core/Parser/
      Core/Processor/ (builds AST)
      Core/Interpreter/
      Core/Tests/
      I wonder what is PyPy layout? It may worth to steal it for consistency.

    4. Specific problem with ASDL parsing - currently, by ASDL syntax all expr are allowed on the left side of assign node. This is not true for real app. It makes sense to clarify in README.md these borders (who checks what) and modify ASDL to reflect the restriction.

    @brettcannon
    Copy link
    Member

    All code going into Python should be idiomatic unless it's meant to be released externally and there are backwards-compatibility concerns. It's Eli's call as to whether he wants to maintain a PyPI project for this after integration.

    Points 2-4 are off-topic for this issue.

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Nov 24, 2013

    Does anyone have comments on the code or can I prepare a patch for default?

    Would it make sense to wait with this until the 3.4 branch is created or can I just commit to default? Note that this change is not a new feature and is essentially a no-op as far as the resulting CPython executable - it's just tweaking the build process to generate the same data in a different way.

    @brettcannon
    Copy link
    Member

    It's Larry's call in the end but I personally don't care either way, especially since this isn't user-facing code.

    @larryhastings
    Copy link
    Contributor

    Are the generated files *byte for byte* the same as produced by the existing parser generation process?

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Nov 24, 2013

    Larry Hastings added the comment:

    Are the generated files *byte for byte* the same as produced by the
    existing parser generation process?

    Correct. The generator runs during the build (in the Makefile), but only if
    the files were out-of-date. It takes Python.asdl and produces Python-ast.h
    and Python-ast.c; the latter two are compiled into the CPython executable.
    The .h/.c files produced by my alternative generator are exactly identical
    to the ones in there now.

    I don't feel strongly about this, but I may need a refresher in the release
    process rules. From today and until Feb 23rd, 2014 - are we not supposed to
    be adding new features/PEPs or are we also not supposed to do any code
    refactorings and/or any changes at all unless those are directly related to
    fixing a specific bug/issue?

    @larryhastings
    Copy link
    Contributor

    The rule is, no new features. Bug and security fixes from now on.

    It isn't always clear whether or not something is a new "feature". In the case of such ambiguity, the decision is up to the sole discretion of the release manager.

    If you seriously want to check in this thing (sigh), I'll take a look at it. But you'll have to point me at either a patch that applies cleanly to trunk, or a repository somewhere online with the patch already applied to trunk.

    Something like a code refactor is probably okay during betas, but during RCs I would really prefer we keep checkins strictly to a minimum.

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Nov 24, 2013

    On Sun, Nov 24, 2013 at 2:58 PM, Larry Hastings <report@bugs.python.org>wrote:

    Larry Hastings added the comment:

    The rule is, no new features. Bug and security fixes from now on.

    It isn't always clear whether or not something is a new "feature". In the
    case of such ambiguity, the decision is up to the sole discretion of the
    release manager.

    If you seriously want to check in this thing (sigh), I'll take a look at
    it. But you'll have to point me at either a patch that applies cleanly to
    trunk, or a repository somewhere online with the patch already applied to
    trunk.

    There really is no urgency here. I don't won't to add needless work onto
    your place, and am fine with just leaving it be until the 3.4 branch is cut
    and landing it on the default branch aimed for 3.5 after that.

    @ericsnowcurrently
    Copy link
    Member

    The concern, as far as I understand it, is that any change might introduce regressions or even new bugs in the next beta. Something like swapping out the parser generator implementation isn't a bug fix, but it isn't a new language (incl. stdlib) feature either.

    I don't have a problem with the change. Furthermore, it likely doesn't matter since there probably won't be any grammar changes during the beta that would trigger the tool.

    @brettcannon
    Copy link
    Member

    Let's just go with Eli's latest idea and just save it for 3.5 since it won't make any visible improvement in 3.4.

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Mar 30, 2014

    There were no serious objections bar the pre-release timing. Now that we're safely in 3.5 territory, can I go ahead and create a patch?

    Note that for purposes of review, the Github project linked in the original message is more convenient, IMHO

    @benjaminp
    Copy link
    Contributor

    It certainly seems more compact and simple that the current parser, so +1 from me.

    @rhettinger
    Copy link
    Contributor

    +1 for moving this forward as early in the 3.5 cycle as possible.

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Mar 31, 2014

    Attaching patch that implements this. To make it easier, the patch only replaces the ASDL parser - not touching anything else and leaving the output intact.

    With this patch applied, when the Makefile is rerun it regenerates the actual AST code in:

    Include/Python-ast.h
    Python/Python-ast.c

    However, as the new parser generates exactly the same files, the code above is identical to the originals (hg diff shows nothing). In practice this means that no one building Python should notice this change, unless she's playing with the ASDL input file itself. The Makefile will not even regenerate these files unless the parser or the ASDL file were modified.

    I have some additional ideas - to be delayed for subsequent patches:

    1. The existing Spark-based parser didn't have tests, but my parser does. I'll do something similar to test_tools.py, as Benjamin suggested (to test the parser only in source builds).
    2. The code of asdl_c.py could be modernized and be made even cleaner.
    3. The same is true of the generated C code for the AST.

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Apr 3, 2014

    Since there has mostly been support for this, I'll wait a couple more days and commit it unless someones objects or asks for more time for review.

    @serhiy-storchaka
    Copy link
    Member

    Now make fails when system Python is older than 3.4.

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Apr 4, 2014

    On Fri, Apr 4, 2014 at 6:10 AM, Serhiy Storchaka <report@bugs.python.org>wrote:

    Serhiy Storchaka added the comment:

    Now make fails when system Python is older than 3.4.

    This is why the .h & .c files are checked in - someone just building Python
    doesn't need them regenerated at all. Only if one wants to *modify the
    AST*, he'll need an up-to-date Python. Otherwise we'll have to stick to the
    "oldest Python possible" for every script we use internally. I think this
    was discussed on the mailing list(s) at some point.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 4, 2014

    IIRC, that discussion was just about Python 2 vs Python 3. Can we get the
    AST rebuild requirement dropped back to "python3" being 3.3+ for the time
    being so we don't break Fedora?

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Apr 5, 2014

    Nick, it shouldn't be hard to drop to 3.3, but I'm curious why would the 3.4 requirement break Fedora, or anything for that matter? Does Fedora regenerate the C implementation of the AST for some reason on every build? AFAIU, building Python from source with "make" should not do that.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 5, 2014

    It won't break the build, but requiring 3.4 to be installed (rather than
    3.3) makes it more annoying for me (and other Fedora users) to work on the
    compiler before F21 is released :)

    @larryhastings
    Copy link
    Contributor

    I was told to keep Argument Clinic compatible with 3.3. I think it's a good idea for the tools to not require absolutely current Python. Would it be a big deal to support 3.3?

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Apr 5, 2014

    Updated patch attached:

    1. Python 3.3+ supported (I suspect 3.2 will work too)
    2. Incorporated Serhiy's suggestions (thanks for the review!)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 10, 2014

    New changeset b769352e2922 by Eli Bendersky in branch 'default':
    Issue bpo-19655: Replace the ASDL parser carried with CPython
    http://hg.python.org/cpython/rev/b769352e2922

    @elibendersky elibendersky mannequin closed this as completed May 10, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 10, 2014

    New changeset 604e1b1a7777 by Eli Bendersky in branch 'default':
    Issue bpo-19655: Add tests for the new asdl parser.
    http://hg.python.org/cpython/rev/604e1b1a7777

    @scoder
    Copy link
    Contributor

    scoder commented May 10, 2014

    The "avoid rebuilding" part doesn't seem to work for me. Source build currently fails as follows:

    """
    /bin/mkdir -p Include
    python ./Parser/asdl_c.py -h Include ./Parser/Python.asdl
    # Substitution happens here, as the completely-expanded BINDIR
    # is not available in configure
    sed -e "s,@EXENAME@,.../INSTALL/py3km/bin/python3.5dm," < ./Misc/python-config.in >python-config.py
    # Replace makefile compat. variable references with shell script compat. ones;  -> 
    sed -e 's,\$(\([A-Za-z0-9_]*\)),\$\{\1\},g' < Misc/python-config.sh >python-config
    # On Darwin, always use the python version of the script, the shell
    # version doesn't use the compiler customizations that are provided
    # in python (_osx_support.py).
    if test `uname -s` = Darwin; then \
    		cp python-config.py python-config; \
    	fi
    Traceback (most recent call last):
      File "./Parser/asdl_c.py", line 1312, in <module>
        main(args[0], dump_module)
      File "./Parser/asdl_c.py", line 1251, in main
        if not asdl.check(mod):
      File ".../cpython/Parser/asdl.py", line 183, in check
        v = Check()
      File ".../cpython/Parser/asdl.py", line 140, in __init__
        super().__init__()
    TypeError: super() takes at least 1 argument (0 given)
    """

    Python installation is 2.7 (originally 2.5 at the system level, but recent build changes broke that), no Py3 available.

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented May 10, 2014

    Stefan, you need to run make touch if you want to avoid rebuilding. See bpo-15964 for more details.

    [all bots run make touch before building now]

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented May 10, 2014

    This is also described in the Dev Guide: https://docs.python.org/devguide/setup.html

    @scoder
    Copy link
    Contributor

    scoder commented May 10, 2014

    That fixes it. Thanks!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    build The build process and cross-build type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants