Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve / Clear ASDL generator #84708

Closed
isidentical opened this issue May 6, 2020 · 17 comments
Closed

Improve / Clear ASDL generator #84708

isidentical opened this issue May 6, 2020 · 17 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@isidentical
Copy link
Sponsor Member

BPO 40528
Nosy @serhiy-storchaka, @pablogsal, @isidentical
PRs
  • bpo-40528: Improve & clear a bunch of internal asdl thing #19952
  • bpo-40528: Improve AST generation script to do builds simultaneously #19968
  • bpo-40528: Implement a metadata system for ASDL Generator #20193
  • bpo-40528: move asdl identifier collection to the new metadata system #26858
  • bpo-40528: fix is_simple(sum)s behavior for attributes #26918
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-06-27.14:59:34.107>
    created_at = <Date 2020-05-06.11:17:00.247>
    labels = ['interpreter-core', '3.9']
    title = 'Improve / Clear ASDL generator'
    updated_at = <Date 2021-06-27.14:59:34.107>
    user = 'https://github.com/isidentical'

    bugs.python.org fields:

    activity = <Date 2021-06-27.14:59:34.107>
    actor = 'BTaskaya'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-06-27.14:59:34.107>
    closer = 'BTaskaya'
    components = ['Interpreter Core']
    creation = <Date 2020-05-06.11:17:00.247>
    creator = 'BTaskaya'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40528
    keywords = ['patch']
    message_count = 17.0
    messages = ['368235', '368238', '368239', '368240', '368245', '368247', '368248', '368250', '368251', '368252', '368257', '368365', '369281', '396353', '396477', '396583', '396584']
    nosy_count = 3.0
    nosy_names = ['serhiy.storchaka', 'pablogsal', 'BTaskaya']
    pr_nums = ['19952', '19968', '20193', '26858', '26918']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue40528'
    versions = ['Python 3.9']

    @isidentical
    Copy link
    Sponsor Member Author

    • Better error messages with punctuation
    • Py_UNREACHABLE() for unreachable states
    • Removal of several 'unused' and 'undocumented (in Zephyr ASDL spec) built-in types.

    @isidentical isidentical added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels May 6, 2020
    @serhiy-storchaka
    Copy link
    Member

    You just read my thoughts. That's what I was going to do tonight. ;)

    @isidentical
    Copy link
    Sponsor Member Author

    You just read my thoughts. That's what I was going to do tonight. ;)

    :) Awesome. Well I'll plan to continue this; @pablogsal and I was discussing the idea of extracting _ast.AST subclasses to python level (which would save a lot of time for new additions). So my primary goal is clearing bunch of old stuff / making it extendable and clean.

    @isidentical
    Copy link
    Sponsor Member Author

    Currently working on implementing some items from this TODO https://github.com/eliben/asdl_parser/blob/master/TODO

    @pablogsal
    Copy link
    Member

    I was also thinking about getting rid of "string" nodes and just use Constant for those. Having the strings raw make the AST inconsistent because nodes that are 'strings' do not have line numbers and other metadata, making tools that need the source to report stuff more complex than it should.

    What are your opinions on this @serhiy.storchaka and @BTaskaya?

    @isidentical
    Copy link
    Sponsor Member Author

    I was also thinking about getting rid of "string" nodes and just use Constant for those. Having the strings raw make the AST inconsistent because nodes that are 'strings' do not have line numbers and other metadata, making tools that need the source to report stuff more complex than it should.

    I'm +1 on that (especially for finding import aliases, as we discussed earlier), if necessary precautions happen to ensure minimum breakage (from what I understand, it is something that will break everyone's code. Maybe we can implement custom Constant.__str__? and other different things to ensure it breaks small).

    @serhiy-storchaka
    Copy link
    Member

    "string" is only used for

    • type_comment in a number of statements
    • kind in Constant
    • tag in type_ignore

    For which of them a line number is meaningful and useful?

    @pablogsal
    Copy link
    Member

    For which of them a line number is meaningful and useful?

    Apologies, I was thinking of 'identifier', which is basically a PyObject representing a string. For instance, when parsing function calls like

    x = ast.parse("f(x=435)")

    the 'arg' attribute of the keyword is just the string 'x', without any metadata about it (like column offset and such).

    @isidentical
    Copy link
    Sponsor Member Author

    Oh, I confuse with it identifier. IMHO replacing identifier with Constant would infer position in some cases.

    @isidentical
    Copy link
    Sponsor Member Author

    A real world example would be tools like unimport, that try to remove a certain part of import by looking start/end column offsets. Before (lib2to3), it was using tokens to manipulate source, and what I can tell is that having position information on multi line from imports would be a life saver for tool authors.

    @pablogsal
    Copy link
    Member

    New changeset 091951a by Batuhan Taskaya in branch 'master':
    bpo-40528: Improve and clear several aspects of the ASDL definition code for the AST (GH-19952)
    091951a

    @isidentical
    Copy link
    Sponsor Member Author

    Next is using asdl_int_seq for all simple sum types, and not for only cmpop. Looks like it is hardcoded to use it, and I plan to refactor this in a way that might save some time in python implementation of classes (also it will solve this bug). For not conflicting, I'll wait GH 19968.

    @pablogsal
    Copy link
    Member

    New changeset 63b8e0c by Batuhan Taskaya in branch 'master':
    bpo-40528: Improve AST generation script to do builds simultaneously (GH-19968)
    63b8e0c

    @isidentical
    Copy link
    Sponsor Member Author

    New changeset 35ad425 by Batuhan Taskaya in branch 'main':
    bpo-40528: Implement a metadata system for ASDL Generator (GH-20193)
    35ad425

    @isidentical
    Copy link
    Sponsor Member Author

    New changeset 6c76df2 by Batuhan Taskaya in branch 'main':
    bpo-40528: move asdl identifier collection to the new metadata system (GH-26858)
    6c76df2

    @isidentical
    Copy link
    Sponsor Member Author

    New changeset 107a2c5 by Batuhan Taskaya in branch 'main':
    bpo-40528: fix is_simple(sum)s behavior for attributes (GH-26918)
    107a2c5

    @isidentical
    Copy link
    Sponsor Member Author

    These were all the cleanups on my checklist, so can close the issue now. I'll probably create other issues if something additional comes up

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants