classification
Title: Python 3.4 gives wrong col_offset for Call nodes returned from ast.parse
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Aivar.Annamaa, Mark.Shannon, benjamin.peterson, brett.cannon, flox, georg.brandl, ncoghlan, python-dev, rnovacek, scummos
Priority: normal Keywords: 3.4regression

Created on 2014-04-18 10:49 by Aivar.Annamaa, last changed 2015-10-06 10:56 by Aivar.Annamaa. This issue is now closed.

Files
File name Uploaded Description Edit
py34_ast_call_bug.py Aivar.Annamaa, 2014-04-18 10:49 Small demonstration of the bug
Messages (21)
msg216777 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2014-04-18 10:49
Following program gives correct result in Python versions older than 3.4, but incorrect result in 3.4:

----------------------
import ast
tree = ast.parse("sin(0.5)")
first_stmt = tree.body[0]
call = first_stmt.value
print("col_offset of call expression:", call.col_offset)
print("col_offset of func of the call:", call.func.col_offset)
-----------------------

it should print:
col_offset of call expression: 0
col_offset of func of the call: 0

but in 3.4 it prints:
col_offset of call expression: 3
col_offset of func of the call: 0
msg216778 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2014-04-18 10:58
... also, lineno is wrong for both Call and call's func, when func and arguments are on different lines:

import ast
tree = ast.parse("(sin\n(0.5))")
first_stmt = tree.body[0]
call = first_stmt.value
print("col_offset of call expression:", call.col_offset)
print("col_offset of func of the call:", call.func.col_offset)
print("lineno of call expression:", call.lineno)
print("lineno of func of the call:", call.lineno)

# lineno-s should be 1 for both call and func
msg216821 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-04-19 00:38
I suspect this was an intentional result of #16795.
msg216846 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2014-04-19 06:14
Regarding #16795, the documentation says "The lineno is the line number of source text and the col_offset is the UTF-8 byte offset of the first token that generated the node", not that lineno and col_offset indicate a suitable position to mention in the error messages related to this node.

IMO lineno and col_offset should stay as predictable means for finding the (beginning of) source text of the node. In error reporting code one could inspect the situation and compute locations suitable for this.

Alternatively, these attributes could be left for purposes mentioned in #16795 and parser developers could introduce new attributes in ast nodes which indicate both start and end positions of corresponding source. (Hopefully this would resolve also #18374 and #16806)
msg221360 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2014-06-23 14:34
Just found out that ast.Attribute in Python 3.4 has similar problem
msg235245 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2015-02-02 12:21
This is caused by https://hg.python.org/cpython/rev/7c5c678e4164/
which is a supposed fix for http://bugs.python.org/issue16795
which claims to make "some changes to AST to make it more useful for static language analysis", seemingly by breaking all existing static analysis tools.

Could we just revert https://hg.python.org/cpython/rev/7c5c678e4164/ ?
msg235246 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2015-02-02 12:23
It is now very hard to determine accurate locations for an expression such as (x+y).attr as the column offset of leftmost subexpression of the expression is not the same as the column offset of the location.
msg235261 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2015-02-02 14:41
This also breaks the col_offset for subscripts like x[y] and, of course any statement with one of these expressions as its leftmost sub-expression.
msg235266 - (view) Author: Roundup Robot (python-dev) Date: 2015-02-02 15:53
New changeset 7d1c32ddc432 by Benjamin Peterson in branch '3.4':
revert lineno and col_offset changes from #16795 (closes #21295)
https://hg.python.org/cpython/rev/7d1c32ddc432

New changeset 8ab6b404248c by Benjamin Peterson in branch 'default':
merge 3.4 (#21295)
https://hg.python.org/cpython/rev/8ab6b404248c
msg237577 - (view) Author: Sven Brauch (scummos) * Date: 2015-03-08 22:28
Why did you not CC me in this discussion? It is not very nice to have this behaviour changed back from what I relied upon in a minor version without notice.

Which regression was effectively caused by this patch, except for the documentation being out of date?
msg237581 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2015-03-08 22:44
You are on the nosy list. You should have got sent an email.

This bug is the regression.
https://hg.python.org/cpython/rev/7c5c678e4164/ resulted in incorrect column offsets for many compound expressions.
msg237585 - (view) Author: Sven Brauch (scummos) * Date: 2015-03-09 00:39
Hmm, strange, I did not receive any emails.

"Incorrect" by what definition of incorrect? The word does not really help to clarify the issue you see with this change, since the behaviour was changed on purpose. What is the (preferably real-world) application which is broken by this change?
msg237670 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2015-03-09 16:02
The column offset has always been the offset of the start of the expression. Therefore the expression `x.y` should have the same offset as the sub-expresssion `x`.
Likewise for calls, `f(args)` should have the same offset as the `f` sub expression.

Our static analysis tool is a real-world use case:
http://semmle.com/2014/06/semmle-analysis-now-includes-python/

Presumably the submitter of this issue also had a real would use case.
msg237671 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2015-03-09 16:09
Yes, I also need col_offset to work as advertised because of a real world use case: Thonny (http://thonny.cs.ut.ee/) is a visual Python debugger which highlights the (sub)expression about to be evaluated.
msg237672 - (view) Author: Sven Brauch (scummos) * Date: 2015-03-09 16:15
But if you need the start of the full expression, can't you just go up in the "parent" chain until the parent is not an expression any more?

Could additional API be introduced which provides the value I am looking for as well as the one you need?

I was not on the nosy list by the way, I just put myself there after I commented. And that was after 3.4.3, after I noticed my software was suddenly broken by a patch release of python.
msg237675 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2015-03-09 16:44
How do I get the start of `(x+y).bit_length()` in 
`total += (x+y).bit_length()`?
With your change, I can't get it from `x`, `x+y`, or from the whole statement.

The primary purpose of the locations are for tracebacks, not for static tools.
Also, most tools need to support earlier versions of Python and consistency between versions is the most important thing.

A third-party parser that supported full, accurate locations would be great, but I don't think the builtin parser is the place for it.
msg251522 - (view) Author: Radek Novacek (rnovacek) Date: 2015-09-24 13:35
I've ran the tests from first and second comment using python 3.5.0 and it seems it produces correct results:

>>> import ast
>>> tree = ast.parse("sin(0.5)")
>>> first_stmt = tree.body[0]
>>> call = first_stmt.value
>>> print("col_offset of call expression:", call.col_offset)
col_offset of call expression: 0
>>> print("col_offset of func of the call:", call.func.col_offset)
col_offset of func of the call: 0

>>> tree = ast.parse("(sin\n(0.5))")
>>> first_stmt = tree.body[0]
>>> call = first_stmt.value
>>> print("col_offset of call expression:", call.col_offset)
col_offset of call expression: 1
>>> print("col_offset of func of the call:", call.func.col_offset)
col_offset of func of the call: 1
>>> print("lineno of call expression:", call.lineno)
lineno of call expression: 1
>>> print("lineno of func of the call:", call.lineno)
lineno of func of the call: 1
msg252380 - (view) Author: Radek Novacek (rnovacek) Date: 2015-10-06 08:42
There is still problem with col_offset is some situations, for example col_offset of the ast.Attribute should be 4 but is 0 instead:

>>> for x in ast.walk(ast.parse('foo.bar')):
...     if hasattr(x, 'col_offset'):
...         print("%s: %d" % (x, x.col_offset))
... 
<_ast.Expr object at 0x7fcdc84722b0>: 0
<_ast.Attribute object at 0x7fcdc84723c8>: 0
<_ast.Name object at 0x7fcdc8472438>: 0

Is there any solution to this problem? It causes problems in python support in KDevelop (kdev-python).
msg252381 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2015-10-06 09:00
Radek, the source corresponding to Attribute node does start at col 0 in your example
msg252384 - (view) Author: Radek Novacek (rnovacek) Date: 2015-10-06 10:37
Aivar, I have to admit that my knowledge of this is limited, but as I understand it, the attribute is "bar" in the "foo.bar" expression.

I can get beginning of the assignment by 
>>> ast.parse('foo.bar').body[0].value.value.col_offset
0

But how can I get position of the 'bar'? My guess is this:
>>> ast.parse('foo.bar').body[0].value.col_offset
but it still returns 0.

Why this two col_offsets returns the same value? How can I get the position of 'bar' in 'foo.bar'?
msg252386 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2015-10-06 10:56
ast.Attribute node actually means "the atribute of something", ie. the node includes this "something" as subnode. 

> How can I get the position of 'bar' in 'foo.bar'?

I don't know a good way for this, because bar is not an AST node for Python. If Python AST nodes included the information about where a node ends in source, I would take the ending col of node.value (foo in your example), and added 2. 

In my own program (http://thonny.cs.ut.ee, it's a Python IDE for beginners) I'm using a really contrived algorithm for determining the end positions of nodes. See function mark_text_ranges here: https://bitbucket.org/plas/thonny/src/b8860704c99d47760ffacfaa335d2f8772721ba4/thonny/ast_utils.py?at=master&fileviewer=file-view-default

I'm not happy with my solution, but I don't know any other ways.
History
Date User Action Args
2015-10-06 10:56:58Aivar.Annamaasetmessages: + msg252386
2015-10-06 10:37:16rnovaceksetmessages: + msg252384
2015-10-06 09:00:15Aivar.Annamaasetmessages: + msg252381
2015-10-06 08:42:22rnovaceksetmessages: + msg252380
2015-09-24 13:35:33rnovaceksetnosy: + rnovacek
messages: + msg251522
2015-03-09 16:44:39Mark.Shannonsetmessages: + msg237675
2015-03-09 16:15:55scummossetmessages: + msg237672
2015-03-09 16:09:33Aivar.Annamaasetmessages: + msg237671
2015-03-09 16:02:08Mark.Shannonsetmessages: + msg237670
2015-03-09 00:39:26scummossetmessages: + msg237585
2015-03-08 22:44:49Mark.Shannonsetmessages: + msg237581
2015-03-08 22:28:17scummossetnosy: + scummos
messages: + msg237577
2015-02-02 15:53:31python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg235266

resolution: fixed
stage: resolved
2015-02-02 14:41:41Mark.Shannonsetmessages: + msg235261
2015-02-02 12:23:40Mark.Shannonsetmessages: + msg235246
2015-02-02 12:21:32Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg235245
2014-06-23 14:34:36Aivar.Annamaasetmessages: + msg221360
2014-04-19 06:14:02Aivar.Annamaasetmessages: + msg216846
2014-04-19 00:38:25benjamin.petersonsetmessages: + msg216821
2014-04-19 00:30:17terry.reedysetnosy: + brett.cannon, georg.brandl, ncoghlan, benjamin.peterson
2014-04-18 20:18:46floxsetkeywords: + 3.4regression
nosy: + flox
type: behavior
2014-04-18 10:58:03Aivar.Annamaasetmessages: + msg216778
2014-04-18 10:49:40Aivar.Annamaacreate