Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add endline and endcolumn to every AST node #77597

Closed
ilevkivskyi opened this issue May 3, 2018 · 15 comments
Closed

Add endline and endcolumn to every AST node #77597

ilevkivskyi opened this issue May 3, 2018 · 15 comments
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@ilevkivskyi
Copy link
Member

BPO 33416
Nosy @gvanrossum, @brettcannon, @rhettinger, @vstinner, @ambv, @aivarannamaa, @serhiy-storchaka, @ilevkivskyi, @miss-islington
PRs
  • bpo-33416: Add end positions to Python AST #11605
  • bpo-33416: Add end positions to Python AST #11605
  • bpo-33416: Add end positions to Python AST #11605
  • bpo-35224: PEP 572 Implementation #10497
  • bpo-33416: Document changes in PyNode_AddChild and PyParser_AddToken #14214
  • [3.8] bpo-33416: Document changes in PyNode_AddChild and PyParser_AddToken (GH-14214) #14215
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-06-19.00:34:46.255>
    created_at = <Date 2018-05-03.10:15:25.514>
    labels = ['interpreter-core', 'type-feature', '3.8']
    title = 'Add endline and endcolumn to every AST node'
    updated_at = <Date 2019-06-19.00:34:46.254>
    user = 'https://github.com/ilevkivskyi'

    bugs.python.org fields:

    activity = <Date 2019-06-19.00:34:46.254>
    actor = 'levkivskyi'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-06-19.00:34:46.255>
    closer = 'levkivskyi'
    components = ['Interpreter Core']
    creation = <Date 2018-05-03.10:15:25.514>
    creator = 'levkivskyi'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33416
    keywords = ['patch', 'patch', 'patch']
    message_count = 15.0
    messages = ['316117', '330180', '330207', '330287', '333575', '333583', '333587', '333608', '333617', '334205', '334228', '334300', '344507', '344519', '346015']
    nosy_count = 9.0
    nosy_names = ['gvanrossum', 'brett.cannon', 'rhettinger', 'vstinner', 'lukasz.langa', 'Aivar.Annamaa', 'serhiy.storchaka', 'levkivskyi', 'miss-islington']
    pr_nums = ['11605', '11605', '11605', '10497', '14214', '14215']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue33416'
    versions = ['Python 3.8']

    @ilevkivskyi
    Copy link
    Member Author

    Some Python tools (in particular I am interested in type checkers) will benefit from knowing where a given expression ends to indicate/highlight location of an error in the source code. Other tools and IDEs may have also some other benefits. Currently such tools need to use some hacks and/or custom parser to find the end line and end column of an expression, while it would be more straightforward to just add this information to every AST node by CPythons own parser.

    This will increase memory usage, but expectation is that this effect will be minor.

    What do you think?

    @ilevkivskyi ilevkivskyi added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels May 3, 2018
    @serhiy-storchaka
    Copy link
    Member

    I like this. This can help to generate more meaningful error messages at compile and run time.

    @serhiy-storchaka
    Copy link
    Member

    This may be a duplicate of bpo-22616.

    See also bpo-16795, bpo-21295, bpo-33211, bpo-34876.

    @ilevkivskyi
    Copy link
    Member Author

    bpo-22616 is a bit different (proposing line/column ranges instead of just endline/endcolumn). I am happy to close this one in favor of bpo-22616 if we agree that we will go with endline/endcolumn.

    I can't guarantee, but likely I will work on this during winter holidays, unless there are some objections, or unless someone will do this before :-)

    @ilevkivskyi
    Copy link
    Member Author

    FYI, I started working on this. I will have PR ready end of next week.

    Serhiy, I don't think we should keep both this and bpo-22616 open. Which one would you prefer to close?

    @rhettinger
    Copy link
    Contributor

    This idea seems reasonable. Most of the AST nodes have a span and it would be nice to know what that is. I'm sure we will find use cases though I doubt that many compiler syntax errors would benefit (since a syntax error means that we don't have a completely matched grammar rule).

    BTW, does this information have to be added by the parser or could there be am AST module function that deduces the end locations from the start location of next sibling node or from the parent node? If so, it may still be possible get the benefit without slowing down or complicating the parser logic.

    Do we know what other languages do (carry the full span info in the AST or deduce the span after the fact)? I don't know what the best practices are in this regard.

    One other thought: We should be careful not to impair existing AST capabilities that support code rewriting and movement (i.e. refactoring tools). The copy_location() and fix_missing_locations() functions would need to be updated as well.

    @serhiy-storchaka
    Copy link
    Member

    It is up to you Ivan.

    The end location can not be deduces from the start location of next sibling node or from the parent node. For example, the AST for the expression "foo.bar.baz" does not contain information for the end location of "foo.bar".

    @ilevkivskyi
    Copy link
    Member Author

    I'm sure we will find use cases though I doubt that many compiler syntax errors would benefit (since a syntax error means that we don't have a completely matched grammar rule).

    This is mostly useful for code analysis tools and IDEs.

    BTW, does this information have to be added by the parser or could there be am AST module function that deduces the end locations from the start location of next sibling node or from the parent node?

    There may be some other ways to do it, but currently I am adding both end_lineno and end_col_offset to AST, and n_end_lineno and n_end_col_offset in CST. The latter are needed to easily take care of situations like a + (b ), and other corner cases. I could imagine it is possible to live with only end position information in AST to save some memory, but then the code will be much harder to maintain. We can discuss details in the PR.

    Do we know what other languages do (carry the full span info in the AST or deduce the span after the fact)?

    I didn't do real research, but quick browsing shows carrying the end position info is quite common. But more importantly as Serhiy noted, deducing might be just not possible in some situations.

    It is up to you Ivan.

    OK, I will close the other one as superseded by this issue.

    @aivarannamaa
    Copy link
    Mannequin

    aivarannamaa mannequin commented Jan 14, 2019

    I strongly support this feature, because my IDE (https://thonny.org) needs to highlight AST nodes in the source code.

    There would be many interested parties if you count the stars of this project: https://github.com/gristlabs/asttokens

    @ilevkivskyi
    Copy link
    Member Author

    New changeset 9932a22 by Ivan Levkivskyi in branch 'master':
    bpo-33416: Add end positions to Python AST (GH-11605)
    9932a22

    @ilevkivskyi
    Copy link
    Member Author

    Buildbots are green, this can be now closed.

    @ilevkivskyi
    Copy link
    Member Author

    Re-opening to track renaming of potentially public PyNode_AddChild() and PyParser_AddToken().

    @ilevkivskyi ilevkivskyi reopened this Jan 24, 2019
    @ilevkivskyi
    Copy link
    Member Author

    So what was the conclusion about PyCode_New()? Situation is quite similar here (except that these functions are used less often). Should we just document the changes in What's New?

    @gvanrossum
    Copy link
    Member

    It seems we're keeping the new PyCode_New() signature, so we can do the same here. Just document it; a versionchanged (if there isn't one yet) and a mention in What's New sound appropriate.

    @miss-islington
    Copy link
    Contributor

    New changeset 45da743 by Miss Islington (bot) in branch '3.8':
    [3.8] bpo-33416: Document changes in PyNode_AddChild and PyParser_AddToken (GH-14214) (GH-14215)
    45da743

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants