Issue 10769: ast: provide more useful range information

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/54978

classification

Title:	ast: provide more useful range information
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.3

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	benjamin.peterson, ethan.furman, georg.brandl, kensington, r.david.murray, rhettinger, scummos, terry.reedy
Priority:	normal	Keywords:	patch

Created on 2010-12-24 14:07 by scummos, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
fix-attr-ranges.patch	kensington, 2012-04-19 16:57	Patch to change attribute offset behaviour

Messages (24)
msg124596 - (view)	Author: Sven Brauch (scummos) *	Date: 2010-12-24 14:07
Hi, I'm writing a python language support plugin for an IDE. I'm using the AST module to get information about the loaded source code, which works pretty well. However, in some cases, the information provided by the AST is simply not sufficient to do proper highlighting (or whatever); for example, class_instance.method(argument1.arg_attribute).attribute class_instance.method( argument1. arg_attribute) . attribute and class_instance.method( argument1. \ arg_attribute) \ . attribute produce exactly the same syntax tree, making it impossible to determine where e.g. "attribute" starts and ends. Although technically obviously correct, the information that the column offset for "attribute" is "0" is quite pointless. It would really be great if there could be an additional attribute for Attribute Access nodes which tells me where the attribute access the node refers to really takes place, not just "column 0". :) It would be even better if there was not only an offset, but also an "end" attribute providing information on where the node ends; that'd make this module way more useful, at least for me. Thanks and best regards, Sven
msg124601 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2010-12-24 19:02
ISTM the whole point of an Abstract Syntax Tree is to express semantics while throwing away the syntax details. The only reason any position information is kept is to support tracebacks and debugging. Perhaps the OP's request should changed to "add function to concrete parse trees to show structure while preserving all the input detail returned by tokenize."
msg124602 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-12-24 19:11
2010/12/24 Raymond Hettinger <report@bugs.python.org>: > > Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment: > > ISTM the whole point of an Abstract Syntax Tree is to express semantics while throwing away the syntax details. The only reason any position information is kept is to support tracebacks and debugging. I agree for the most part. It's also nearly impossible to preserve all the detail one would want in an abstract syntax tree. > > Perhaps the OP's request should changed to "add function to concrete parse trees to show structure while preserving all the input detail returned by tokenize." If someone can comment on what that would look like...
msg124603 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2010-12-24 19:54
If info fields are added, they need to be optional so that someone manipulating the tree (adding, rearranging, or removing nodes) doesn't have an additional burden of supplying this info before compiling into an executable code object.
msg124609 - (view)	Author: Sven Brauch (scummos) *	Date: 2010-12-24 20:59
Hi, well, but you have to agree that there is no point in setting a correct column offset for bar in foo[bar] (it's correctly set to 4 here!) and foo(bar) but not for foo.bar (there's no information provided here) For me, this looks like it was just done the easiest way -- which is okay, of course, but is not a valid reason not to change it now. It's also not that there's fifty things missing to make it fully functional for the purpose of highlighting / analyzing the code, this is about the only one. :) Best regards, Sven
msg124619 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-12-24 23:06
A request limited only to fixing the current field for attribute may get more traction than a request for a new field. Can you dig into to code to get any idea why the difference between attributes versus indexes and parameters?
msg124620 - (view)	Author: Sven Brauch (scummos) *	Date: 2010-12-24 23:17
Hi Terry, well, the current behaviour is... logical in some way, as it says "the whole expression which accesses an attribute starts at column 0", i.e. it's easy to understand why it's done like this. It just turns out that this is pretty useless... I'll try to find out what changes would be needed in the code to make the col_offset attribute useful in this case. Best regards, Sven
msg124623 - (view)	Author: Sven Brauch (scummos) *	Date: 2010-12-25 00:34
Hi, I found the reason for this behavior in the code now, it's in Python/ast.c, lines 1745 and 1746 in ast_for_power(): tmp->lineno = e->lineno; tmp->col_offset = e->col_offset; Here, the range information for the individual attributes (which is correctly set before!) is being discarded and replaced by the useless information from the expression ast. I don't see any reason for this, is there one? :) Removing those two lines doesn't seem to break anything and sets ranges correctly. Best regards, Sven
msg124624 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-12-25 00:51
2010/12/24 Sven Brauch <report@bugs.python.org>: > > Sven Brauch <svenbrauch@googlemail.com> added the comment: > > Hi, > > I found the reason for this behavior in the code now, it's in Python/ast.c, lines 1745 and 1746 in ast_for_power(): > > tmp->lineno = e->lineno; > tmp->col_offset = e->col_offset; > > Here, the range information for the individual attributes (which is correctly set before!) is being discarded and replaced by the useless information from the expression ast. > I don't see any reason for this, is there one? :) > > Removing those two lines doesn't seem to break anything and sets ranges correctly. The ranges are correct. They just aren't what you want. The attribute starts at the beginning of the power, not the ".".
msg124633 - (view)	Author: Sven Brauch (scummos) *	Date: 2010-12-25 10:49
Hi, I agree that the current behavior is not wrong or such. It's just entirely useless. For all the other types of subscripts, such as a[b][c][d] or a(b)(c)(d), the ranges of the actual subscript ASTs are also nulled, but there's an additional Name (or whatever) AST created for b, c, and d which contains their ranges. Why isn't it like this for attributes? I also don't really see a reason why those ranges shouldn't be changed for all possible cases (subscript / call / attribute). It will contain more information while not taking any more memory or such, and it will be more intuitive. You can still get the information on where the expression starts by going up the chain to the next Expression AST node, which seems much more logical to me than storing this -- rarely useful -- information redundantly in maybe dozens of nodes. After all, the ast module is meant for "end users" to base applications on it, isn't it? Otherwise, you should at least be consistent and remove that range information from the tree completely, because there's absolutely no case in which it would ever be useful (at least I cannot think of one). Best regards, Sven
msg124636 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-12-25 14:29
2010/12/25 Sven Brauch <report@bugs.python.org>: > > Sven Brauch <svenbrauch@googlemail.com> added the comment: > > Hi, > > I agree that the current behavior is not wrong or such. It's just entirely useless. > For all the other types of subscripts, such as a[b][c][d] or a(b)(c)(d), the ranges of the actual subscript ASTs are also nulled, but there's an additional Name (or whatever) AST created for b, c, and d which contains their ranges. Why isn't it like this for attributes? Because those can take any Python expression. Attributes only ever take a name. > > I also don't really see a reason why those ranges shouldn't be changed for all possible cases (subscript / call / attribute). It will contain more information while not taking any more memory or such, and it will be more intuitive. You can still get the information on where the expression starts by going up the chain to the next Expression AST node, which seems much more logical to me than storing this -- rarely useful -- information redundantly in maybe dozens of nodes. After all, the ast module is meant for "end users" to base applications on it, isn't it? Expr is just a container node. It's not supposed to store extra information. > > Otherwise, you should at least be consistent and remove that range information from the tree completely, because there's absolutely no case in which it would ever be useful (at least I cannot think of one). That's not the case here: x = foo.blah
msg124637 - (view)	Author: Sven Brauch (scummos) *	Date: 2010-12-25 14:43
Well, weather it's supposed to or not, it does contain the line number information: <ExprAst lineno="1" col_offset="0"> For your example, the AST for "foo" tells you the offset for foo. If you want to know the offset (well, "offset") for blah, why not look at foo? Currently, the information is just copied from foo to blah. Anyway, I'd like to get away from this abstract discussion which is not likely to yield a result... is there any reason not to do it like I suggested other than the philosophical "it's wrong"? Or don't you agree that the information given this way would be more useful? Best regards, Sven
msg124639 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-12-25 15:19
2010/12/25 Sven Brauch <report@bugs.python.org>: > > Sven Brauch <svenbrauch@googlemail.com> added the comment: > > Well, weather it's supposed to or not, it does contain the line number information: > <ExprAst lineno="1" col_offset="0"> > For your example, the AST for "foo" tells you the offset for foo. If you want to know the offset (well, "offset") for blah, why not look at foo? Currently, the information is just copied from foo to blah. What if it's like this, though? x = ( foo).blah > > Anyway, I'd like to get away from this abstract discussion which is not likely to yield a result... is there any reason not to do it like I suggested other than the philosophical "it's wrong"? Or don't you agree that the information given this way would be more useful? It wouldn't be any more useful than it is now. I don't think it's reasonably possible to preserve every last obscure formatting in ast.
msg124640 - (view)	Author: Sven Brauch (scummos) *	Date: 2010-12-25 15:46
Then you'd get the point where foo starts instead of the location of the opening brace. Sounds good for me. Also, I already said that, with the change I proposed, I do not see any case where relevant information would be lost. Your argument that it would not be any more useful than now is simply invalid, because I got an application here which would work with, but not without the change, and I didn't yet see one for the other side. Best regards, Sven
msg124642 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-12-25 16:47
FWIW, I find the current behavior for attributes to be surprising, to the point where at first glance it almost looks like a bug. Which is to say, I would have expected 'col' to point to the first non-whitespace column after the '.'. If there are multiple attributes, a.b.c.d, then to me, each node should point to its appropriate place, just as with a[b][c].
msg124665 - (view)	Author: Sven Brauch (scummos) *	Date: 2010-12-26 12:46
Hi, yeah Terry, that's exactly what most people whom I talked about this said (me too). Anyway, here's the patch which -- in my opinion -- fixes this behavior: --- python-orig/Python/ast.c 2010-10-19 03:22:07.000000000 +0200 +++ python-ast-fix/Python/ast.c 2010-12-26 13:25:48.000000000 +0100 @@ -1742,8 +1742,6 @@ tmp = ast_for_trailer(c, ch, e); if (!tmp) return NULL; - tmp->lineno = e->lineno; - tmp->col_offset = e->col_offset; e = tmp; } if (TYPE(CHILD(n, NCH(n) - 1)) == factor) { The offsets for "foo.bar.baz" before the patch: [1, 0, <_ast.Attribute>] [1, 0, <_ast.Attribute>, 'baz'] [1, 0, <_ast.Name>, 'bar'] [1, 0, 'foo'] ... and after the patch: [1, 0, <_ast.Attribute>] [1, 7, <_ast.Attribute>, 'baz'] [1, 3, <_ast.Name>, 'bar'] [1, 0, 'foo'] It would really be great if that could be applied. Best regards, Sven
msg124679 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-12-26 20:59
I suggest you mail python-dev or python-ideas. I find it more consistent as it stands now.
msg124680 - (view)	Author: Sven Brauch (scummos) *	Date: 2010-12-26 21:17
Okay, thank you, I'm going to do that. :) Bye, Sven
msg158735 - (view)	Author: Michael (kensington)	Date: 2012-04-19 16:57
Hi, Attached is the updated patch by Sven Brauch from the original mailing list thread bringing column offset reporting for attributes in line with everything else. The offsets for bar before the patch: foo[bar] = 4 foo(bar) = 4 foo.bar = 0 After: foo[bar] = 4 foo(bar) = 4 foo.bar = 4 With the update, are there still concerns?
msg229148 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-10-12 14:27
If there was a python-ideas or python-dev thread that resulted in consensus approval of this change, can someone post a link to it?
msg229149 - (view)	Author: Sven Brauch (scummos) *	Date: 2014-10-12 14:31
Hi, Mailing list thread: https://mail.python.org/pipermail/python-dev/2012-December/123320.html Discussion on the patch: http://bugs.python.org/issue16795 Greetings, Sven
msg229158 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-10-12 15:06
OK, so that patch was committed. Does that mean this one can be close? (I'm not familiar with the code in question.)
msg229165 - (view)	Author: Sven Brauch (scummos) *	Date: 2014-10-12 15:16
Yes, this issue can be closed.
msg229170 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-10-12 15:46
Thanks.

History
Date	User	Action	Args
2022-04-11 14:57:10	admin	set	github: 54978
2014-10-12 15:46:06	r.david.murray	set	messages: + msg229170 stage: needs patch -> resolved
2014-10-12 15:16:10	scummos	set	status: open -> closed resolution: fixed messages: + msg229165
2014-10-12 15:06:28	r.david.murray	set	messages: + msg229158
2014-10-12 14:31:12	scummos	set	messages: + msg229149
2014-10-12 14:27:42	r.david.murray	set	nosy: + r.david.murray messages: + msg229148
2014-04-05 18:18:06	ethan.furman	set	nosy: + ethan.furman
2012-04-19 16:57:59	kensington	set	files: + fix-attr-ranges.patch nosy: + kensington messages: + msg158735 keywords: + patch
2010-12-26 21:17:02	scummos	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124680
2010-12-26 20:59:26	benjamin.peterson	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124679
2010-12-26 12:46:10	scummos	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124665
2010-12-25 16:47:39	terry.reedy	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124642
2010-12-25 15:46:47	scummos	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124640
2010-12-25 15:19:16	benjamin.peterson	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124639
2010-12-25 14:43:53	scummos	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124637
2010-12-25 14:29:24	benjamin.peterson	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124636
2010-12-25 10:49:46	scummos	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124633
2010-12-25 00:51:29	benjamin.peterson	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124624
2010-12-25 00:34:25	scummos	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124623
2010-12-24 23:17:38	scummos	set	nosy: georg.brandl, rhettinger, terry.reedy, benjamin.peterson, scummos messages: + msg124620
2010-12-24 23:06:40	terry.reedy	set	nosy: + terry.reedy messages: + msg124619
2010-12-24 20:59:14	scummos	set	nosy: georg.brandl, rhettinger, benjamin.peterson, scummos messages: + msg124609
2010-12-24 19:54:39	rhettinger	set	nosy: georg.brandl, rhettinger, benjamin.peterson, scummos messages: + msg124603
2010-12-24 19:11:41	benjamin.peterson	set	nosy: georg.brandl, rhettinger, benjamin.peterson, scummos messages: + msg124602
2010-12-24 19:02:51	rhettinger	set	nosy: + rhettinger messages: + msg124601
2010-12-24 15:38:42	r.david.murray	set	nosy: + benjamin.peterson stage: needs patch versions: + Python 3.3
2010-12-24 14:07:45	scummos	create