Add endline and endcolumn to every AST node #77597

ilevkivskyi · 2018-05-03T10:15:26Z

BPO	33416
Nosy	@gvanrossum, @brettcannon, @rhettinger, @vstinner, @ambv, @aivarannamaa, @serhiy-storchaka, @ilevkivskyi, @miss-islington
PRs	bpo-33416: Add end positions to Python AST #11605 bpo-33416: Add end positions to Python AST #11605 bpo-33416: Add end positions to Python AST #11605 bpo-35224: PEP 572 Implementation #10497 bpo-33416: Document changes in PyNode_AddChild and PyParser_AddToken #14214 [3.8] bpo-33416: Document changes in PyNode_AddChild and PyParser_AddToken (GH-14214) #14215

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-06-19.00:34:46.255>
created_at = <Date 2018-05-03.10:15:25.514>
labels = ['interpreter-core', 'type-feature', '3.8']
title = 'Add endline and endcolumn to every AST node'
updated_at = <Date 2019-06-19.00:34:46.254>
user = 'https://github.com/ilevkivskyi'

bugs.python.org fields:

activity = <Date 2019-06-19.00:34:46.254>
actor = 'levkivskyi'
assignee = 'none'
closed = True
closed_date = <Date 2019-06-19.00:34:46.255>
closer = 'levkivskyi'
components = ['Interpreter Core']
creation = <Date 2018-05-03.10:15:25.514>
creator = 'levkivskyi'
dependencies = []
files = []
hgrepos = []
issue_num = 33416
keywords = ['patch', 'patch', 'patch']
message_count = 15.0
messages = ['316117', '330180', '330207', '330287', '333575', '333583', '333587', '333608', '333617', '334205', '334228', '334300', '344507', '344519', '346015']
nosy_count = 9.0
nosy_names = ['gvanrossum', 'brett.cannon', 'rhettinger', 'vstinner', 'lukasz.langa', 'Aivar.Annamaa', 'serhiy.storchaka', 'levkivskyi', 'miss-islington']
pr_nums = ['11605', '11605', '11605', '10497', '14214', '14215']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue33416'
versions = ['Python 3.8']

ilevkivskyi · 2018-05-03T10:15:25Z

Some Python tools (in particular I am interested in type checkers) will benefit from knowing where a given expression ends to indicate/highlight location of an error in the source code. Other tools and IDEs may have also some other benefits. Currently such tools need to use some hacks and/or custom parser to find the end line and end column of an expression, while it would be more straightforward to just add this information to every AST node by CPythons own parser.

This will increase memory usage, but expectation is that this effect will be minor.

What do you think?

serhiy-storchaka · 2018-11-21T08:13:35Z

I like this. This can help to generate more meaningful error messages at compile and run time.

serhiy-storchaka · 2018-11-21T15:26:15Z

This may be a duplicate of bpo-22616.

See also bpo-16795, bpo-21295, bpo-33211, bpo-34876.

ilevkivskyi · 2018-11-22T22:38:54Z

bpo-22616 is a bit different (proposing line/column ranges instead of just endline/endcolumn). I am happy to close this one in favor of bpo-22616 if we agree that we will go with endline/endcolumn.

I can't guarantee, but likely I will work on this during winter holidays, unless there are some objections, or unless someone will do this before :-)

ilevkivskyi · 2019-01-13T23:44:06Z

FYI, I started working on this. I will have PR ready end of next week.

Serhiy, I don't think we should keep both this and bpo-22616 open. Which one would you prefer to close?

rhettinger · 2019-01-14T05:33:17Z

This idea seems reasonable. Most of the AST nodes have a span and it would be nice to know what that is. I'm sure we will find use cases though I doubt that many compiler syntax errors would benefit (since a syntax error means that we don't have a completely matched grammar rule).

BTW, does this information have to be added by the parser or could there be am AST module function that deduces the end locations from the start location of next sibling node or from the parent node? If so, it may still be possible get the benefit without slowing down or complicating the parser logic.

Do we know what other languages do (carry the full span info in the AST or deduce the span after the fact)? I don't know what the best practices are in this regard.

One other thought: We should be careful not to impair existing AST capabilities that support code rewriting and movement (i.e. refactoring tools). The copy_location() and fix_missing_locations() functions would need to be updated as well.

serhiy-storchaka · 2019-01-14T07:18:02Z

It is up to you Ivan.

The end location can not be deduces from the start location of next sibling node or from the parent node. For example, the AST for the expression "foo.bar.baz" does not contain information for the end location of "foo.bar".

ilevkivskyi · 2019-01-14T11:18:39Z

I'm sure we will find use cases though I doubt that many compiler syntax errors would benefit (since a syntax error means that we don't have a completely matched grammar rule).

This is mostly useful for code analysis tools and IDEs.

BTW, does this information have to be added by the parser or could there be am AST module function that deduces the end locations from the start location of next sibling node or from the parent node?

There may be some other ways to do it, but currently I am adding both end_lineno and end_col_offset to AST, and n_end_lineno and n_end_col_offset in CST. The latter are needed to easily take care of situations like a + (b ), and other corner cases. I could imagine it is possible to live with only end position information in AST to save some memory, but then the code will be much harder to maintain. We can discuss details in the PR.

Do we know what other languages do (carry the full span info in the AST or deduce the span after the fact)?

I didn't do real research, but quick browsing shows carrying the end position info is quite common. But more importantly as Serhiy noted, deducing might be just not possible in some situations.

It is up to you Ivan.

OK, I will close the other one as superseded by this issue.

aivarannamaa · 2019-01-14T13:06:40Z

I strongly support this feature, because my IDE (https://thonny.org) needs to highlight AST nodes in the source code.

There would be many interested parties if you count the stars of this project: https://github.com/gristlabs/asttokens

ilevkivskyi · 2019-01-22T11:18:27Z

New changeset 9932a22 by Ivan Levkivskyi in branch 'master':
bpo-33416: Add end positions to Python AST (GH-11605)
9932a22

ilevkivskyi · 2019-01-22T18:54:21Z

Buildbots are green, this can be now closed.

ilevkivskyi · 2019-01-24T13:33:56Z

Re-opening to track renaming of potentially public PyNode_AddChild() and PyParser_AddToken().

ilevkivskyi · 2019-06-04T00:33:12Z

So what was the conclusion about PyCode_New()? Situation is quite similar here (except that these functions are used less often). Should we just document the changes in What's New?

gvanrossum · 2019-06-04T02:38:23Z

It seems we're keeping the new PyCode_New() signature, so we can do the same here. Just document it; a versionchanged (if there isn't one yet) and a mention in What's New sound appropriate.

miss-islington · 2019-06-19T00:33:38Z

New changeset 45da743 by Miss Islington (bot) in branch '3.8':
[3.8] bpo-33416: Document changes in PyNode_AddChild and PyParser_AddToken (GH-14214) (GH-14215)
45da743

ilevkivskyi added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels May 3, 2018

ilevkivskyi closed this as completed Jan 22, 2019

ilevkivskyi reopened this Jan 24, 2019

ilevkivskyi closed this as completed Jun 19, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add endline and endcolumn to every AST node #77597

Add endline and endcolumn to every AST node #77597

ilevkivskyi commented May 3, 2018

ilevkivskyi commented May 3, 2018

serhiy-storchaka commented Nov 21, 2018

serhiy-storchaka commented Nov 21, 2018

ilevkivskyi commented Nov 22, 2018

ilevkivskyi commented Jan 13, 2019

rhettinger commented Jan 14, 2019

serhiy-storchaka commented Jan 14, 2019

ilevkivskyi commented Jan 14, 2019

aivarannamaa mannequin commented Jan 14, 2019

ilevkivskyi commented Jan 22, 2019

ilevkivskyi commented Jan 22, 2019

ilevkivskyi commented Jan 24, 2019

ilevkivskyi commented Jun 4, 2019

gvanrossum commented Jun 4, 2019

miss-islington commented Jun 19, 2019

Add endline and endcolumn to every AST node #77597

Add endline and endcolumn to every AST node #77597

Comments

ilevkivskyi commented May 3, 2018

ilevkivskyi commented May 3, 2018

serhiy-storchaka commented Nov 21, 2018

serhiy-storchaka commented Nov 21, 2018

ilevkivskyi commented Nov 22, 2018

ilevkivskyi commented Jan 13, 2019

rhettinger commented Jan 14, 2019

serhiy-storchaka commented Jan 14, 2019

ilevkivskyi commented Jan 14, 2019

aivarannamaa mannequin commented Jan 14, 2019

ilevkivskyi commented Jan 22, 2019

ilevkivskyi commented Jan 22, 2019

ilevkivskyi commented Jan 24, 2019

ilevkivskyi commented Jun 4, 2019

gvanrossum commented Jun 4, 2019

miss-islington commented Jun 19, 2019