classification
Title: Add endline and endcolumn to every AST node
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Aivar.Annamaa, brett.cannon, gvanrossum, levkivskyi, lukasz.langa, miss-islington, rhettinger, serhiy.storchaka, vstinner
Priority: normal Keywords: patch, patch, patch

Created on 2018-05-03 10:15 by levkivskyi, last changed 2019-06-19 00:34 by levkivskyi. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 11605 merged levkivskyi, 2019-01-18 09:46
PR 11605 merged levkivskyi, 2019-01-18 09:46
PR 11605 merged levkivskyi, 2019-01-18 09:46
PR 10497 emilyemorehouse, 2019-01-22 22:56
PR 14214 merged levkivskyi, 2019-06-19 00:04
PR 14215 merged miss-islington, 2019-06-19 00:19
Messages (15)
msg316117 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2018-05-03 10:15
Some Python tools (in particular I am interested in type checkers) will benefit from knowing where a given expression ends to indicate/highlight location of an error in the source code. Other tools and IDEs may have also some other benefits. Currently such tools need to use some hacks and/or custom parser to find the end line and end column of an expression, while it would be more straightforward to just add this information to every AST node by CPythons own parser.

This will increase memory usage, but expectation is that this effect will be minor.

What do you think?
msg330180 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-21 08:13
I like this. This can help to generate more meaningful error messages at compile and run time.
msg330207 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-21 15:26
This may be a duplicate of issue22616.

See also issue16795, issue21295, issue33211, issue34876.
msg330287 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2018-11-22 22:38
issue22616 is a bit different (proposing line/column ranges instead of just endline/endcolumn). I am happy to close this one in favor of issue22616 if we agree that we will go with endline/endcolumn.

I can't guarantee, but likely I will work on this during winter holidays, unless there are some objections, or unless someone will do this before :-)
msg333575 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2019-01-13 23:44
FYI, I started working on this. I will have PR ready end of next week.

Serhiy, I don't think we should keep both this and issue22616 open. Which one would you prefer to close?
msg333583 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-01-14 05:33
This idea seems reasonable.  Most of the AST nodes have a span and it would be nice to know what that is.  I'm sure we will find use cases though I doubt that many compiler syntax errors would benefit (since a syntax error means that we don't have a completely matched grammar rule).

BTW, does this information have to be added by the parser or could there be am AST module function that deduces the end locations from the start location of next sibling node or from the parent node?  If so, it may still be possible get the benefit without slowing down or complicating the parser logic.

Do we know what other languages do (carry the full span info in the AST or deduce the span after the fact)?  I don't know what the best practices are in this regard.

One other thought:  We should be careful not to impair existing AST capabilities that support code rewriting and movement (i.e. refactoring tools).  The copy_location() and fix_missing_locations() functions would need to be updated as well.
msg333587 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-01-14 07:18
It is up to you Ivan.

The end location can not be deduces from the start location of next sibling node or from the parent node. For example, the AST for the expression "foo.bar.baz" does not contain information for the end location of "foo.bar".
msg333608 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2019-01-14 11:18
> I'm sure we will find use cases though I doubt that many compiler syntax errors would benefit (since a syntax error means that we don't have a completely matched grammar rule).

This is mostly useful for code analysis tools and IDEs.

> BTW, does this information have to be added by the parser or could there be am AST module function that deduces the end locations from the start location of next sibling node or from the parent node?

There may be some other ways to do it, but currently I am adding both `end_lineno` and `end_col_offset` to AST, and `n_end_lineno` and `n_end_col_offset` in CST. The latter are needed to easily take care of situations like `a + (b )`, and other corner cases. I could imagine it is possible to live with only end position information in AST to save some memory, but then the code will be much harder to maintain. We can discuss details in the PR.

> Do we know what other languages do (carry the full span info in the AST or deduce the span after the fact)?

I didn't do real research, but quick browsing shows carrying the end position info is quite common. But more importantly as Serhiy noted, deducing might be just not possible in some situations.

> It is up to you Ivan.

OK, I will close the other one as superseded by this issue.
msg333617 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2019-01-14 13:06
I strongly support this feature, because my IDE (https://thonny.org) needs to highlight AST nodes in the source code.

There would be many interested parties if you count the stars of this project: https://github.com/gristlabs/asttokens
msg334205 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2019-01-22 11:18
New changeset 9932a22897ef9905161dac7476e6976370e13515 by Ivan Levkivskyi in branch 'master':
bpo-33416: Add end positions to Python AST (GH-11605)
https://github.com/python/cpython/commit/9932a22897ef9905161dac7476e6976370e13515
msg334228 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2019-01-22 18:54
Buildbots are green, this can be now closed.
msg334300 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2019-01-24 13:33
Re-opening to track renaming of potentially public PyNode_AddChild() and PyParser_AddToken().
msg344507 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2019-06-04 00:33
So what was the conclusion about PyCode_New()? Situation is quite similar here (except that these functions are used less often). Should we just document the changes in What's New?
msg344519 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-06-04 02:38
It seems we're keeping the new PyCode_New() signature, so we can do the same here. Just document it; a versionchanged (if there isn't one yet) and a mention in What's New sound appropriate.
msg346015 - (view) Author: miss-islington (miss-islington) Date: 2019-06-19 00:33
New changeset 45da7437f54b4a6bdb8b4ba5a0f13f44a24eec39 by Miss Islington (bot) in branch '3.8':
[3.8] bpo-33416:  Document changes in PyNode_AddChild and PyParser_AddToken (GH-14214) (GH-14215)
https://github.com/python/cpython/commit/45da7437f54b4a6bdb8b4ba5a0f13f44a24eec39
History
Date User Action Args
2019-06-19 00:34:46levkivskyisetkeywords: patch, patch, patch
status: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-06-19 00:33:37miss-islingtonsetnosy: + miss-islington
messages: + msg346015
2019-06-19 00:19:57miss-islingtonsetpull_requests: + pull_request14054
2019-06-19 00:04:46levkivskyisetpull_requests: + pull_request14052
2019-06-04 02:38:22gvanrossumsetkeywords: patch, patch, patch

messages: + msg344519
2019-06-04 00:33:12levkivskyisetkeywords: patch, patch, patch

messages: + msg344507
2019-01-24 13:33:56levkivskyisetstatus: closed -> open


keywords: patch, patch, patch
nosy: + vstinner
messages: + msg334300
resolution: fixed -> (no value)
stage: resolved -> patch review
2019-01-22 22:56:19emilyemorehousesetpull_requests: + pull_request11441
2019-01-22 18:54:21levkivskyisetstatus: open -> closed
messages: + msg334228

keywords: patch, patch, patch
resolution: fixed
stage: patch review -> resolved
2019-01-22 11:18:26levkivskyisetmessages: + msg334205
2019-01-18 09:50:20vstinnersetkeywords: patch, patch, patch
nosy: - vstinner
2019-01-18 09:47:02levkivskyisetkeywords: + patch
stage: patch review
pull_requests: + pull_request11331
2019-01-18 09:46:44levkivskyisetkeywords: + patch
stage: (no value)
pull_requests: + pull_request11330
2019-01-18 09:46:27levkivskyisetkeywords: + patch
stage: (no value)
pull_requests: + pull_request11329
2019-01-14 13:06:40Aivar.Annamaasetnosy: + Aivar.Annamaa
messages: + msg333617
2019-01-14 11:24:42levkivskyilinkissue22616 superseder
2019-01-14 11:18:39levkivskyisetmessages: + msg333608
2019-01-14 07:18:01serhiy.storchakasetmessages: + msg333587
2019-01-14 05:33:17rhettingersetnosy: + rhettinger
messages: + msg333583
2019-01-13 23:44:05levkivskyisetmessages: + msg333575
2018-11-22 22:38:53levkivskyisetmessages: + msg330287
2018-11-21 15:26:14serhiy.storchakasetmessages: + msg330207
2018-11-21 08:44:56rhettingersetnosy: + brett.cannon
2018-11-21 08:13:34serhiy.storchakasetmessages: + msg330180
2018-05-03 10:15:25levkivskyicreate