Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

col_offset is -1 and lineno is wrong for multiline string expressions #61010

Closed
carstenkleinaxn-softwarede mannequin opened this issue Dec 29, 2012 · 16 comments
Closed
Labels
3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@carstenkleinaxn-softwarede
Copy link
Mannequin

BPO 16806
Nosy @terryjreedy, @vstinner, @benjaminp, @methane, @aivarannamaa, @asottile, @pablogsal
PRs
  • bpo-30465: Fix lineno and col_offset in fstring AST nodes #1800
  • bpo-16806: Fix lineno and col_offset for multi-line string tokens. #10021
  • bpo-39031: Include elif keyword when producing lineno/col-offset info for if_stmt #17582
  • Files
  • issue16806.diff: Test case and patch resolving the issue
  • python2.7.3.diff: Patch for Python 2.7.3
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-01-08.21:12:49.552>
    created_at = <Date 2012-12-29.00:08:22.621>
    labels = ['interpreter-core', '3.8', 'type-feature', '3.7']
    title = 'col_offset is -1 and lineno is wrong for multiline string expressions'
    updated_at = <Date 2020-01-08.21:12:49.539>
    user = 'https://bugs.python.org/carstenkleinaxn-softwarede'

    bugs.python.org fields:

    activity = <Date 2020-01-08.21:12:49.539>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-01-08.21:12:49.552>
    closer = 'pablogsal'
    components = ['Interpreter Core']
    creation = <Date 2012-12-29.00:08:22.621>
    creator = 'carsten.klein@axn-software.de'
    dependencies = []
    files = ['28478', '28499']
    hgrepos = []
    issue_num = 16806
    keywords = ['patch']
    message_count = 16.0
    messages = ['178444', '178445', '178452', '178483', '178614', '179096', '179097', '235448', '294339', '298016', '313126', '333537', '333538', '333544', '348010', '359632']
    nosy_count = 10.0
    nosy_names = ['terry.reedy', 'vstinner', 'benjamin.peterson', 'methane', 'carsten.klein@axn-software.de', 'Aivar.Annamaa', 'karamanolev', 'asottile', 'Anthony Sottile', 'pablogsal']
    pr_nums = ['1800', '10021', '17582']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16806'
    versions = ['Python 3.7', 'Python 3.8']

    @carstenkleinaxn-softwarede
    Copy link
    Mannequin Author

    Given an input module such as

    class klass(object):
        """multi line comment
        continued on this line
        """
    """single line comment"""
    
    """
    Another multi
    line
    comment"""
    

    and implementing a custom ast.NodeVisitor such as

    import as
    
    class CustomVisitor(ast.NodeVisitor):
    
        def visit_ClassDef(self, node):
    
            for childNode in node.body:
    
                self.visit(childNode)
    
        def visit_Expr(self, node):
    
            print(node.col_offset)
            print(node.value.col_offset)

    and feeding it the compiled ast from the module above

    f = open('./module.py')
    source = f.read()
    node = ast.parse(source, mode = 'exec')
    visitor = CustomVisitor()
    visitor.visit(node)

    should yield -1/-1 for the docstring that is the first
    child node expression of the classdef body.

    it will, however, yield the correct col_offset of 4/4 for
    the single line docstring following the first one.

    the multi line docstring following that will again
    yield a -1/-1 col_offset.

    It believe that this behaviour is not correct and instead
    the col_offset should be 4 for both the expression node
    and its str value.

    @carstenkleinaxn-softwarede carstenkleinaxn-softwarede mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Dec 29, 2012
    @carstenkleinaxn-softwarede
    Copy link
    Mannequin Author

    Please note that, regardless of the indent level, the col_offset for multi line str expressions will always be -1.

    @carstenkleinaxn-softwarede
    Copy link
    Mannequin Author

    In addition, the reported lineno will be set to the last line of the multi line string instead of the first line where parsing the parse began parsing the string.

    @carstenkleinaxn-softwarede
    Copy link
    Mannequin Author

    Please see the attached patch that will resolve the issue. It also includes a test case in test_ast.py.

    What the patch does is as follows:

    • tok_state is extended by two fields, namely first_lineno
      and multi_line_start

    • first_lineno will be set by tok_get as soon as the beginning
      of a STRING is detected and it will be set to the current line
      tok->lineno.

    • multi_line_start is the beginning of the first line of a string

    • in parsetok we now distinguish between STRING nodes and other
      nodes. in case of STRING nodes, we will use the values of the
      above fields for determining the actual lineno and the col_offset,
      otherwise tok->col_offset and tok->lineno will be used when
      creating the token.

    The included test case ensures that the col_offset and lineno of
    multi line strings is calculated correctly.

    @carstenkleinaxn-softwarede carstenkleinaxn-softwarede mannequin changed the title col_offset is -1 for multiline string expressions resembling docstrings col_offset is -1 and lineno is wrong for multiline string expressions Dec 29, 2012
    @carstenkleinaxn-softwarede
    Copy link
    Mannequin Author

    I have created a patch for Python 2.7.3 that fixes the issue for that release, too.

    @terryjreedy
    Copy link
    Member

    If this is really an 'enhancement', it will only go in 3.4. If it is a bug/behavior issue, then it should be marked as such and 2.7,3.2,3.3 selected. I have not read the doc and messages well enough to know, so I leave that to you and Benjamin.

    The patch includes a test. It needs a patch to Misc/ACKS to add Carsten Klein between Reid Kleckner and Bastian Kleineidam

    @benjaminp
    Copy link
    Contributor

    I left comments on Rietveld a few days ago.

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Feb 5, 2015

    Any updates on this? I'm running into this as well (still a problem in 3.4)

    Python 3.4.2 (default, Oct 11 2014, 17:59:27) 
    [GCC 4.4.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import ast
    >>> ast.parse("""'''foo\n'''""").body[0].value.col_offset
    -1
    

    @karamanolev
    Copy link
    Mannequin

    karamanolev mannequin commented May 24, 2017

    What's the status on this? Anything preventing it getting fixed? Still the same in 3.6.1:

    >>> import ast
    >>> ast.parse("""'''foo\n'''""").body[0].value.col_offset
    -1

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Jul 10, 2017

    pypy seems to have this right (though I don't know enough about their internals to know if cpython can benefit from their patch)

    $ venvpypy/bin/pythonPython 2.7.10 (3260adbeba4a, Apr 19 2016, 17:42:20)
    [PyPy 5.1.0 with GCC 4.8.4] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>>> import ast, astpretty
    >>>> astpretty.pprint(ast.parse('"""\n"""'))
    Module(
        body=[
            Expr(
                lineno=1,
                col_offset=0,
                value=Str(lineno=1, col_offset=0, s='\n'),
            ),
        ],
    )

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Mar 2, 2018

    Still a problem in 3.7:

    $ python3.7
    Python 3.7.0b2 (default, Feb 28 2018, 06:59:18) 
    [GCC 5.4.0 20160609] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import ast
    >>> ast.parse("""x = '''foo\n'''""").body[-1].value
    <_ast.Str object at 0x7fcde6898358>
    >>> ast.parse("""x = '''foo\n'''""").body[-1].value.col_offset
    -1

    @asottile asottile mannequin added 3.7 (EOL) end of life 3.8 only security fixes labels Mar 2, 2018
    @methane
    Copy link
    Member

    methane commented Jan 13, 2019

    Should we backport this to 3.7?
    AST changes including bugfix affects existing software unexpectedly.

    @methane
    Copy link
    Member

    methane commented Jan 13, 2019

    New changeset 995d9b9 by INADA Naoki (Anthony Sottile) in branch 'master':
    bpo-16806: Fix lineno and col_offset for multi-line string tokens (GH-10021)
    995d9b9

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Jan 13, 2019

    I agree -- probably safer to not backport to 3.7 in case someone is relying on this behaviour.

    @vstinner
    Copy link
    Member

    commit 995d9b9 introduced a regression: bpo-37603: parsetok(): Assertion `(intptr_t)(int)(a - line_start) == (a - line_start)' failed, when running get-pip.py.

    @pablogsal
    Copy link
    Member

    commit 995d9b9 introduced a regression: bpo-37603: parsetok(): Assertion `(intptr_t)(int)(a - line_start) == (a - line_start)' failed, when running get-pip.py.

    Fixed in https://bugs.python.org/issue39209

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants