Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.4 gives wrong col_offset for Call nodes returned from ast.parse #65494

Closed
aivarannamaa mannequin opened this issue Apr 18, 2014 · 21 comments
Closed

Python 3.4 gives wrong col_offset for Call nodes returned from ast.parse #65494

aivarannamaa mannequin opened this issue Apr 18, 2014 · 21 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@aivarannamaa
Copy link
Mannequin

aivarannamaa mannequin commented Apr 18, 2014

BPO 21295
Nosy @brettcannon, @birkenfeld, @ncoghlan, @benjaminp, @florentx, @markshannon, @aivarannamaa
Files
  • py34_ast_call_bug.py: Small demonstration of the bug
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-02-02.15:53:31.829>
    created_at = <Date 2014-04-18.10:49:40.566>
    labels = ['interpreter-core', 'type-bug']
    title = 'Python 3.4 gives wrong col_offset for Call nodes returned from ast.parse'
    updated_at = <Date 2015-10-06.10:56:58.976>
    user = 'https://github.com/aivarannamaa'

    bugs.python.org fields:

    activity = <Date 2015-10-06.10:56:58.976>
    actor = 'Aivar.Annamaa'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-02-02.15:53:31.829>
    closer = 'python-dev'
    components = ['Interpreter Core']
    creation = <Date 2014-04-18.10:49:40.566>
    creator = 'Aivar.Annamaa'
    dependencies = []
    files = ['34962']
    hgrepos = []
    issue_num = 21295
    keywords = ['3.4regression']
    message_count = 21.0
    messages = ['216777', '216778', '216821', '216846', '221360', '235245', '235246', '235261', '235266', '237577', '237581', '237585', '237670', '237671', '237672', '237675', '251522', '252380', '252381', '252384', '252386']
    nosy_count = 10.0
    nosy_names = ['brett.cannon', 'georg.brandl', 'ncoghlan', 'benjamin.peterson', 'flox', 'Mark.Shannon', 'scummos', 'python-dev', 'Aivar.Annamaa', 'rnovacek']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue21295'
    versions = ['Python 3.4']

    @aivarannamaa
    Copy link
    Mannequin Author

    aivarannamaa mannequin commented Apr 18, 2014

    Following program gives correct result in Python versions older than 3.4, but incorrect result in 3.4:

    ----------------------

    import ast
    tree = ast.parse("sin(0.5)")
    first_stmt = tree.body[0]
    call = first_stmt.value
    print("col_offset of call expression:", call.col_offset)
    print("col_offset of func of the call:", call.func.col_offset)

    it should print:
    col_offset of call expression: 0
    col_offset of func of the call: 0

    but in 3.4 it prints:
    col_offset of call expression: 3
    col_offset of func of the call: 0

    @aivarannamaa aivarannamaa mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Apr 18, 2014
    @aivarannamaa
    Copy link
    Mannequin Author

    aivarannamaa mannequin commented Apr 18, 2014

    ... also, lineno is wrong for both Call and call's func, when func and arguments are on different lines:

    import ast
    tree = ast.parse("(sin\n(0.5))")
    first_stmt = tree.body[0]
    call = first_stmt.value
    print("col_offset of call expression:", call.col_offset)
    print("col_offset of func of the call:", call.func.col_offset)
    print("lineno of call expression:", call.lineno)
    print("lineno of func of the call:", call.lineno)

    # lineno-s should be 1 for both call and func

    @florentx florentx mannequin added the type-bug An unexpected behavior, bug, or error label Apr 18, 2014
    @benjaminp
    Copy link
    Contributor

    I suspect this was an intentional result of bpo-16795.

    @aivarannamaa
    Copy link
    Mannequin Author

    aivarannamaa mannequin commented Apr 19, 2014

    Regarding bpo-16795, the documentation says "The lineno is the line number of source text and the col_offset is the UTF-8 byte offset of the first token that generated the node", not that lineno and col_offset indicate a suitable position to mention in the error messages related to this node.

    IMO lineno and col_offset should stay as predictable means for finding the (beginning of) source text of the node. In error reporting code one could inspect the situation and compute locations suitable for this.

    Alternatively, these attributes could be left for purposes mentioned in bpo-16795 and parser developers could introduce new attributes in ast nodes which indicate both start and end positions of corresponding source. (Hopefully this would resolve also bpo-18374 and bpo-16806)

    @aivarannamaa
    Copy link
    Mannequin Author

    aivarannamaa mannequin commented Jun 23, 2014

    Just found out that ast.Attribute in Python 3.4 has similar problem

    @markshannon
    Copy link
    Member

    This is caused by https://hg.python.org/cpython/rev/7c5c678e4164/
    which is a supposed fix for http://bugs.python.org/issue16795
    which claims to make "some changes to AST to make it more useful for static language analysis", seemingly by breaking all existing static analysis tools.

    Could we just revert https://hg.python.org/cpython/rev/7c5c678e4164/ ?

    @markshannon
    Copy link
    Member

    It is now very hard to determine accurate locations for an expression such as (x+y).attr as the column offset of leftmost subexpression of the expression is not the same as the column offset of the location.

    @markshannon
    Copy link
    Member

    This also breaks the col_offset for subscripts like x[y] and, of course any statement with one of these expressions as its leftmost sub-expression.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 2, 2015

    New changeset 7d1c32ddc432 by Benjamin Peterson in branch '3.4':
    revert lineno and col_offset changes from bpo-16795 (closes bpo-21295)
    https://hg.python.org/cpython/rev/7d1c32ddc432

    New changeset 8ab6b404248c by Benjamin Peterson in branch 'default':
    merge 3.4 (bpo-21295)
    https://hg.python.org/cpython/rev/8ab6b404248c

    @python-dev python-dev mannequin closed this as completed Feb 2, 2015
    @scummos
    Copy link
    Mannequin

    scummos mannequin commented Mar 8, 2015

    Why did you not CC me in this discussion? It is not very nice to have this behaviour changed back from what I relied upon in a minor version without notice.

    Which regression was effectively caused by this patch, except for the documentation being out of date?

    @markshannon
    Copy link
    Member

    You are on the nosy list. You should have got sent an email.

    This bug is the regression.
    https://hg.python.org/cpython/rev/7c5c678e4164/ resulted in incorrect column offsets for many compound expressions.

    @scummos
    Copy link
    Mannequin

    scummos mannequin commented Mar 9, 2015

    Hmm, strange, I did not receive any emails.

    "Incorrect" by what definition of incorrect? The word does not really help to clarify the issue you see with this change, since the behaviour was changed on purpose. What is the (preferably real-world) application which is broken by this change?

    @markshannon
    Copy link
    Member

    The column offset has always been the offset of the start of the expression. Therefore the expression x.y should have the same offset as the sub-expresssion x.
    Likewise for calls, f(args) should have the same offset as the f sub expression.

    Our static analysis tool is a real-world use case:
    http://semmle.com/2014/06/semmle-analysis-now-includes-python/

    Presumably the submitter of this issue also had a real would use case.

    @aivarannamaa
    Copy link
    Mannequin Author

    aivarannamaa mannequin commented Mar 9, 2015

    Yes, I also need col_offset to work as advertised because of a real world use case: Thonny (http://thonny.cs.ut.ee/) is a visual Python debugger which highlights the (sub)expression about to be evaluated.

    @scummos
    Copy link
    Mannequin

    scummos mannequin commented Mar 9, 2015

    But if you need the start of the full expression, can't you just go up in the "parent" chain until the parent is not an expression any more?

    Could additional API be introduced which provides the value I am looking for as well as the one you need?

    I was not on the nosy list by the way, I just put myself there after I commented. And that was after 3.4.3, after I noticed my software was suddenly broken by a patch release of python.

    @markshannon
    Copy link
    Member

    How do I get the start of (x+y).bit_length() in
    total += (x+y).bit_length()?
    With your change, I can't get it from x, x+y, or from the whole statement.

    The primary purpose of the locations are for tracebacks, not for static tools.
    Also, most tools need to support earlier versions of Python and consistency between versions is the most important thing.

    A third-party parser that supported full, accurate locations would be great, but I don't think the builtin parser is the place for it.

    @rnovacek
    Copy link
    Mannequin

    rnovacek mannequin commented Sep 24, 2015

    I've ran the tests from first and second comment using python 3.5.0 and it seems it produces correct results:

    >>> import ast
    >>> tree = ast.parse("sin(0.5)")
    >>> first_stmt = tree.body[0]
    >>> call = first_stmt.value
    >>> print("col_offset of call expression:", call.col_offset)
    col_offset of call expression: 0
    >>> print("col_offset of func of the call:", call.func.col_offset)
    col_offset of func of the call: 0
    
    >>> tree = ast.parse("(sin\n(0.5))")
    >>> first_stmt = tree.body[0]
    >>> call = first_stmt.value
    >>> print("col_offset of call expression:", call.col_offset)
    col_offset of call expression: 1
    >>> print("col_offset of func of the call:", call.func.col_offset)
    col_offset of func of the call: 1
    >>> print("lineno of call expression:", call.lineno)
    lineno of call expression: 1
    >>> print("lineno of func of the call:", call.lineno)
    lineno of func of the call: 1

    @rnovacek
    Copy link
    Mannequin

    rnovacek mannequin commented Oct 6, 2015

    There is still problem with col_offset is some situations, for example col_offset of the ast.Attribute should be 4 but is 0 instead:

    >>> for x in ast.walk(ast.parse('foo.bar')):
    ...     if hasattr(x, 'col_offset'):
    ...         print("%s: %d" % (x, x.col_offset))
    ... 
    <_ast.Expr object at 0x7fcdc84722b0>: 0
    <_ast.Attribute object at 0x7fcdc84723c8>: 0
    <_ast.Name object at 0x7fcdc8472438>: 0

    Is there any solution to this problem? It causes problems in python support in KDevelop (kdev-python).

    @aivarannamaa
    Copy link
    Mannequin Author

    aivarannamaa mannequin commented Oct 6, 2015

    Radek, the source corresponding to Attribute node does start at col 0 in your example

    @rnovacek
    Copy link
    Mannequin

    rnovacek mannequin commented Oct 6, 2015

    Aivar, I have to admit that my knowledge of this is limited, but as I understand it, the attribute is "bar" in the "foo.bar" expression.

    I can get beginning of the assignment by 
    >>> ast.parse('foo.bar').body[0].value.value.col_offset
    0
    
    But how can I get position of the 'bar'? My guess is this:
    >>> ast.parse('foo.bar').body[0].value.col_offset
    but it still returns 0.

    Why this two col_offsets returns the same value? How can I get the position of 'bar' in 'foo.bar'?

    @aivarannamaa
    Copy link
    Mannequin Author

    aivarannamaa mannequin commented Oct 6, 2015

    ast.Attribute node actually means "the atribute of something", ie. the node includes this "something" as subnode.

    How can I get the position of 'bar' in 'foo.bar'?

    I don't know a good way for this, because bar is not an AST node for Python. If Python AST nodes included the information about where a node ends in source, I would take the ending col of node.value (foo in your example), and added 2.

    In my own program (http://thonny.cs.ut.ee, it's a Python IDE for beginners) I'm using a really contrived algorithm for determining the end positions of nodes. See function mark_text_ranges here: https://bitbucket.org/plas/thonny/src/b8860704c99d47760ffacfaa335d2f8772721ba4/thonny/ast_utils.py?at=master&fileviewer=file-view-default

    I'm not happy with my solution, but I don't know any other ways.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants